Menu
I was wondering if a script updating all the sentences that were modified before the crash was coming soon.
I see some people starting adopting sentences that I had adopted in the past after I corrected them and correctly linking/unlinking them. The problem is that adopting without proofreading or prooflinking decrease the actual quality of the corpus.
I don't have the time (nor the will) to re-monitor hundreds of sentences that I corrected in the past so it would be very nice if an adequate script was on his way. Once the sentences are corrected, adoption and untagging can be re-operated without paying attention to the correctness of the sentence and its translations.
It depends on your definition of soon, but I'm starting to working on it. I don't have a clear solution yet how this will be handled so I can't say for sure yet how soon this can be done.
In any case, related GitHub issue is: https://github.com/Tatoeba/tatoeba2/issues/1484
CK made a script to list sentences that need to be reverted. Could anyone have a quick look and see if there's anything wrong in this file:
https://github.com/Tatoeba/tato...2017-09-11.zip
What I would do is basically replace the text/owner of the sentences by the ones in this file.
The file looks okay to me. Especially for the change of owner :) I've started to do it again and it was painful ^^
Also, we may not want to replace the text if a modification has occurred since the recovery I guess.
Would there be a possibility to do the linking / unlinking to other sentences too ?
With the file you got, the indexes of the sentences that need to be checked are given, so we could just check what modifications they went through between the critical dates.
In any case, thank you for the work.
I checked Ukrainian sentences on the list - the file looks good to me. Thanks CK for creating that.
Sentences modifications/ownership should be restored now, based on CK's file. Can you have a quick check if things are okay on your side?
Hmm... Actually never mind, I'm already seeing some inconsistencies.
Edit: So I reverted the replacement of sentences/ownership. We'll have to review the script that generates this file, it probably shouldn't take into account sentences that were modified after June 10th.
Alright, CK fixed the files. I've made the changes in the database.
Can you have a look now? Hopefully it's alright.
Thank you.
Ownership has been restored.
Linking / unlinking is not restored, which is a problem for hundreds of sentences whose translations had been corrected when adopted. For instance: https://tatoeba.org/eng/sentences/show/127776
A suggestion on how to handle that? Isn't there a way to simply reapply the linking / unlinking operated between March and June on sentences contained in the file? Or to avoid conflicts, reapply what occured from March to now (some may have been reapplied manually already).
Good news, more than one year later, we made progress on restoring links:
https://github.com/Tatoeba/tatoeba2/issues/1724
One of our members, Yorwba, has found that the following 2949 links should be deleted:
https://raw.githubusercontent.c..._to_delete.csv
And the following 131153 links should be re-added:
https://raw.githubusercontent.c...to_restore.csv
I would greatly appreciate if a few people could look at the lists and check if they see anything wrong in there.
Good news, good news.
What would you like us to do exactly? Without writing some scripts, looking at those raw files seems a bit difficult.
That being said, do you want us to pick up some links at random, and check their validity?
Yes, pick up links at random and check if they indeed have to be deleted/re-added.
If someone wants to go further and write a script to check a larger amount of links, that's of course better.
I did what you suggested - picked a dozen of random Ukrainian sentences and checked the links, they seem to be all fine.
I also checked in Excel there are no pairs of sentences that are both linked and unlinked in the two files, so, I guess, they should be fine to be processed.