menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
Aiji Aiji September 14, 2017, edited September 14, 2017 September 14, 2017 at 2:39:37 AM UTC, edited September 14, 2017 at 2:40:29 AM UTC link Permalink

I was wondering if a script updating all the sentences that were modified before the crash was coming soon.
I see some people starting adopting sentences that I had adopted in the past after I corrected them and correctly linking/unlinking them. The problem is that adopting without proofreading or prooflinking decrease the actual quality of the corpus.
I don't have the time (nor the will) to re-monitor hundreds of sentences that I corrected in the past so it would be very nice if an adequate script was on his way. Once the sentences are corrected, adoption and untagging can be re-operated without paying attention to the correctness of the sentence and its translations.

{{vm.hiddenReplies[28429] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG September 17, 2017 September 17, 2017 at 4:53:18 PM UTC link Permalink

It depends on your definition of soon, but I'm starting to working on it. I don't have a clear solution yet how this will be handled so I can't say for sure yet how soon this can be done.

In any case, related GitHub issue is: https://github.com/Tatoeba/tatoeba2/issues/1484

TRANG TRANG October 8, 2017 October 8, 2017 at 6:21:07 PM UTC link Permalink

CK made a script to list sentences that need to be reverted. Could anyone have a quick look and see if there's anything wrong in this file:
https://github.com/Tatoeba/tato...2017-09-11.zip

What I would do is basically replace the text/owner of the sentences by the ones in this file.

{{vm.hiddenReplies[28515] ? 'expand_more' : 'expand_less'}} hide replies show replies
Aiji Aiji October 9, 2017 October 9, 2017 at 1:55:12 AM UTC link Permalink

The file looks okay to me. Especially for the change of owner :) I've started to do it again and it was painful ^^
Also, we may not want to replace the text if a modification has occurred since the recovery I guess.

Would there be a possibility to do the linking / unlinking to other sentences too ?
With the file you got, the indexes of the sentences that need to be checked are given, so we could just check what modifications they went through between the critical dates.

In any case, thank you for the work.

deniko deniko October 9, 2017 October 9, 2017 at 8:41:50 AM UTC link Permalink

I checked Ukrainian sentences on the list - the file looks good to me. Thanks CK for creating that.

TRANG TRANG October 15, 2017 October 15, 2017 at 6:11:34 PM UTC link Permalink

Sentences modifications/ownership should be restored now, based on CK's file. Can you have a quick check if things are okay on your side?

{{vm.hiddenReplies[28562] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG October 15, 2017, edited October 15, 2017 October 15, 2017 at 6:19:28 PM UTC, edited October 15, 2017 at 6:37:24 PM UTC link Permalink

Hmm... Actually never mind, I'm already seeing some inconsistencies.

Edit: So I reverted the replacement of sentences/ownership. We'll have to review the script that generates this file, it probably shouldn't take into account sentences that were modified after June 10th.

TRANG TRANG October 22, 2017 October 22, 2017 at 5:43:20 PM UTC link Permalink

Alright, CK fixed the files. I've made the changes in the database.

Can you have a look now? Hopefully it's alright.

{{vm.hiddenReplies[28579] ? 'expand_more' : 'expand_less'}} hide replies show replies
Aiji Aiji October 23, 2017 October 23, 2017 at 2:29:10 AM UTC link Permalink

Thank you.
Ownership has been restored.
Linking / unlinking is not restored, which is a problem for hundreds of sentences whose translations had been corrected when adopted. For instance: https://tatoeba.org/eng/sentences/show/127776
A suggestion on how to handle that? Isn't there a way to simply reapply the linking / unlinking operated between March and June on sentences contained in the file? Or to avoid conflicts, reapply what occured from March to now (some may have been reapplied manually already).

{{vm.hiddenReplies[28584] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG December 1, 2018 December 1, 2018 at 3:55:00 PM UTC link Permalink

Good news, more than one year later, we made progress on restoring links:
https://github.com/Tatoeba/tatoeba2/issues/1724

One of our members, Yorwba, has found that the following 2949 links should be deleted:
https://raw.githubusercontent.c..._to_delete.csv

And the following 131153 links should be re-added:
https://raw.githubusercontent.c...to_restore.csv

I would greatly appreciate if a few people could look at the lists and check if they see anything wrong in there.

{{vm.hiddenReplies[30911] ? 'expand_more' : 'expand_less'}} hide replies show replies
Aiji Aiji December 2, 2018 December 2, 2018 at 2:17:32 PM UTC link Permalink

Good news, good news.

What would you like us to do exactly? Without writing some scripts, looking at those raw files seems a bit difficult.

That being said, do you want us to pick up some links at random, and check their validity?

{{vm.hiddenReplies[30912] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG December 2, 2018 December 2, 2018 at 3:24:58 PM UTC link Permalink

Yes, pick up links at random and check if they indeed have to be deleted/re-added.

If someone wants to go further and write a script to check a larger amount of links, that's of course better.

{{vm.hiddenReplies[30913] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko December 3, 2018 December 3, 2018 at 11:46:08 AM UTC link Permalink

I did what you suggested - picked a dozen of random Ukrainian sentences and checked the links, they seem to be all fine.

I also checked in Excel there are no pairs of sentences that are both linked and unlinked in the two files, so, I guess, they should be fine to be processed.