clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search
Aiji
2017-09-14 02:39 - 2017-09-14 02:40
I was wondering if a script updating all the sentences that were modified before the crash was coming soon.
I see some people starting adopting sentences that I had adopted in the past after I corrected them and correctly linking/unlinking them. The problem is that adopting without proofreading or prooflinking decrease the actual quality of the corpus.
I don't have the time (nor the will) to re-monitor hundreds of sentences that I corrected in the past so it would be very nice if an adequate script was on his way. Once the sentences are corrected, adoption and untagging can be re-operated without paying attention to the correctness of the sentence and its translations.
hide replies
TRANG
2017-09-17 16:53
It depends on your definition of soon, but I'm starting to working on it. I don't have a clear solution yet how this will be handled so I can't say for sure yet how soon this can be done.

In any case, related GitHub issue is: https://github.com/Tatoeba/tatoeba2/issues/1484
TRANG
2017-10-08 18:21
CK made a script to list sentences that need to be reverted. Could anyone have a quick look and see if there's anything wrong in this file:
https://github.com/Tatoeba/tato...2017-09-11.zip

What I would do is basically replace the text/owner of the sentences by the ones in this file.
hide replies
Aiji
2017-10-09 01:55
The file looks okay to me. Especially for the change of owner :) I've started to do it again and it was painful ^^
Also, we may not want to replace the text if a modification has occurred since the recovery I guess.

Would there be a possibility to do the linking / unlinking to other sentences too ?
With the file you got, the indexes of the sentences that need to be checked are given, so we could just check what modifications they went through between the critical dates.

In any case, thank you for the work.
deniko
2017-10-09 08:41
I checked Ukrainian sentences on the list - the file looks good to me. Thanks CK for creating that.
TRANG
2017-10-15 18:11
Sentences modifications/ownership should be restored now, based on CK's file. Can you have a quick check if things are okay on your side?
hide replies
TRANG
2017-10-15 18:19 - 2017-10-15 18:37
Hmm... Actually never mind, I'm already seeing some inconsistencies.

Edit: So I reverted the replacement of sentences/ownership. We'll have to review the script that generates this file, it probably shouldn't take into account sentences that were modified after June 10th.
TRANG
2017-10-22 17:43
Alright, CK fixed the files. I've made the changes in the database.

Can you have a look now? Hopefully it's alright.
hide replies
Aiji
2017-10-23 02:29
Thank you.
Ownership has been restored.
Linking / unlinking is not restored, which is a problem for hundreds of sentences whose translations had been corrected when adopted. For instance: https://tatoeba.org/eng/sentences/show/127776
A suggestion on how to handle that? Isn't there a way to simply reapply the linking / unlinking operated between March and June on sentences contained in the file? Or to avoid conflicts, reapply what occured from March to now (some may have been reapplied manually already).
hide replies
TRANG
9 days ago
Good news, more than one year later, we made progress on restoring links:
https://github.com/Tatoeba/tatoeba2/issues/1724

One of our members, Yorwba, has found that the following 2949 links should be deleted:
https://raw.githubusercontent.c..._to_delete.csv

And the following 131153 links should be re-added:
https://raw.githubusercontent.c...to_restore.csv

I would greatly appreciate if a few people could look at the lists and check if they see anything wrong in there.
hide replies
Aiji
8 days ago
Good news, good news.

What would you like us to do exactly? Without writing some scripts, looking at those raw files seems a bit difficult.

That being said, do you want us to pick up some links at random, and check their validity?
hide replies
TRANG
8 days ago
Yes, pick up links at random and check if they indeed have to be deleted/re-added.

If someone wants to go further and write a script to check a larger amount of links, that's of course better.
hide replies
CK
CK
8 days ago
Here are the same links, showing the sentence text and clickable links.

http://study.aitech.ac.jp/replayed/

I split the HTML files into 400 lines each.

To quickly check a lot of these, on a Mac, hold down shift+command as you click the links to open in a new tab, and then command-w to close that tab and return to the page with the links. There are similar keyboard shortcuts for other operating systems, which you probably already know for the system you are using.
deniko
7 days ago
I did what you suggested - picked a dozen of random Ukrainian sentences and checked the links, they seem to be all fine.

I also checked in Excel there are no pairs of sentences that are both linked and unlinked in the two files, so, I guess, they should be fine to be processed.