In order to develop a proofreading feature that will be relevant and useful to all, I would like to hear corpus maintainers feedback on how you proofread the corpus. I don't want to bias your answer so I won't ask anything in particular, but could you describe your usual process (do you check only "@change", do you go through latest contributions, do you only proofread while translating, etc.). Do not be afraid to give details, any piece of information is useful. "Proofreading" can refer to both finding problematic sentences and correcting them.
Afin de développer une fonction de correction de corpus qui serait pertinente et utile, j'aimerais entendre les retours des chargés de corpus. Je ne veux pas influencer vogtre réponse donc je ne demanderai rien en particulier, mais est-ce que vous pourriez décrire votre façon de faire habituelle ? (est-ce que vous ne regardez que les "@change", regardez les dernière contributions, corrigez seulement pendant que vous traduisez, etc.). N'ayez pas peur de donner plein de détails, toute information sera utile. Cette fois, "correction de corpus" réfère à la fois à trouver des phrases problématiques et les corriger.
Merci beaucoup. Thank you very much.
for the Italian corpus, generally I have different ways to find and correct sentences with mistakes: sometimes I go through the latest contributions when I see another user contributing Italian sentences, some other times I browse this page: https://tatoeba.org/ita/users/for_language/ita ; some other times I use this query while I'm tagging: https://tatoeba.org/ita/sentenc...&sort_reverse= ; some other times I search for typos I made and corrected in new sentences to see if they appear also in other sentences; some other times I check the "@change" tag; I also go through orphan sentences
I proofread the Turkish one by doing the fast track on Clozemaster...(since I am learning Turkish).
I don't proofread but I set up a proofreading process for our Turkish members, which didn't really take off, but it's still interesting to share it.
The Turkish corpus is unfortunately full of sentences that may just be the result of machine translation. The problematic sentences belong to a corpus maintainer who is no longer active now. There was some discussions on how to deal with these sentences.
We eventually agreed that we would deactivate the corpus maintainer's account and I would create a list from this member. The list would be collaborative so that other Turkish members could remove sentences from the list. The list would also exclude any sentence that has an "OK" review.
The Turkish members would then check the sentences from this list.
- If the sentence is fine, they would mark it as OK and remove it from the list.
- If the sentence is not fine, they can adopt the sentence and correct the mistake, then remove the sentence from the list.
Whenever the list would be nearly empty, I would refresh it by adding another set of sentences not marked OK. And the cycle goes on.
I have only refreshed the list once and it seems there is not much activity on that front anymore. I wouldn't be able to say why it stopped. Was it because the process was not efficient enough, or was it because the Turkish members found another way to increase the quality of the Turkish sentences, or was it too depressing to deal with so many bad quality sentences...
Minulla ei ole oikeutta muokata lauseita, mutta käyn niitä kyllä läpi:
1. Käyn aina välillä läpi valikosta löytyvät tunnisteet: @change, @needs native check, @check. Samoin käyn läpi orvot lauseet aina välillä.
2. Selaan satunnaisia lauseita.
Because my answer is long, I posted it in a separate thread:
Thanks for asking for our feedback.
I try to proofread all new sentences. If they are OK, I mark them as OK. If I find a mistake, I leave a note and add the tag @change. After two weeks I check the sentences with this tag again.
I actually use several ways to do so
Method 1 - I follow the link provided by the tag @change pasted on my profile - https://tatoeba.org/eng/tags/sh..._with_tag/561. For each sentence, I check whether there's an ongoing discussion if there's a typo (in Portuguese, no is "não" instead of "nao", for example).
Method 2- When I study sentences, I find a mistake. I use the "my reviews" feature and after a couple of days or weeks, I go to My profile, my reviews and I first review the Outdated reviews (maybe the typo/problem has been fixed. In this case, I remove the mark (and the tag). If everything is OK there, I go to "sentences marked as not OK" and check the user's logs (has that user gone? has that user left Tatoeba? Is that sentence too troublesome I can't fix that myself? etc)
Method 3 - I also type some words incorrectly on purpose in the advanced search (like "haves", "kompputer", Austraia", etc) to check if there are typos on the database
Method #4 - I check a user sentence, leave comments on problematic sentences, tag and mark as "not OK" (not available in the new design yet. I'm using a private list for now)
I previously shared my opinion here.
We're also using the vocabulary feature and the advanced search to detect spelling errors in the Turkish corpus.