menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
Thanuir Thanuir June 28, 2019 June 28, 2019 at 8:17:37 AM UTC link Permalink

Feature suggestion: Sort sentences by what should be translated first.

Rationale: When translating sentences from a given language, it would be nice to get them in an order such that translating the sentences highest up on the list would presumably be more useful than translating those that are farther down.

Ideally, this should be a one-button option, like "Suggest sentences to translate". Conceivably this might also make life easier for newcomers, as they would have a clear thing to start with.

Details:

The suggestion algorithm should be language-neutral.

I think the following priorities would be useful. They are not in any particular order.

1. The rated language competence of the sentence owner.
2. The rated language competence of the sentence author.
3. Original sentence is better than a translation. (Reasoning: Original sentences are more likely to be something used in the language, rather than an attempt to express a foreign concept or structure in the language.)
4. Sentence with audio is better than a sentence without audio.
5. Owned sentence is better than an orphan.
6. Normal sentence is better than one redded out.
7. Highly rated sentence is better than lowly rated one. Order: The most ok ratings and no others; the most okay ratings when compared to number of unsure ratings and no "not OK" ratings; no ratings; the best ratio of OK to "not OK" ratings; the least unsure ratings; best ratio of "unsure" ratings to "not OK" ratings; fewest "not OK" ratings.
8. Sentence tagged OK is better than sentences with no quality tags is better than sentences with "check" or "needs native check" is better than "change" is better than "delete".
(Maybe I am forgetting some quality indicators.)

Sentences with many translations are likely to be higher quality or more useful, but translating sentences with no translations is likely to contribute more to the diversity of the corpus. One of these conditions might be added, too. (I would prefer "no translations".)

Or maybe the following would be a relevant hierarchy:

a. Sentences with no transations to any language.
b. Sentences without indirect or direct translations to the target language.
c. Sentences without direct translations to the target language.
d. Sentences with as few as possible direct translations to the target language. Maybe break ties with the number of indirect translations.

...

With English, I can fairly easily find sentences that satisfy many of the quality criteria. CK also has several tools for this. With smaller languages it is more tricky, as there might, for example, not be any sentences with audio, there might be many non-native contributions, maybe most sentences are translated or maybe almost none are. Hence, if one wants to find high quality or useful sentences to translate, it takes lots of experimentation to find them. If there are only few and one translates them, one needs more experimentation to find the next patch.

...

So, the idea would be to have a single button that gives, say, a hundred or a thousand high quality sentences in order of quality. The order does not have to include all the points above, but it should include a sufficient amount to prioritize useful sentences.

More importantly, the list should never be empty, if the language has even a single sentence, or maybe even a single untranslated sentence.

...

Making this would require value judgements from the community or whoever makes the thing. This might cause arguments about what kind of sentences are the most valuable. Maybe it would not be worth it.

This is just a suggestion; feel free to ignore.

{{vm.hiddenReplies[32118] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US June 29, 2019 June 29, 2019 at 1:59:10 PM UTC link Permalink

> Making this would require value judgements from the community or whoever makes the thing. This might cause arguments about what kind of sentences are the most valuable. Maybe it would not be worth it.

I think it's worthwhile for people to think about which sentences they find most valuable for their own purposes. But coming to a consensus or even a majority decision would be counterproductive. Not only would you not be able to get different people to agree, you wouldn't be able to get one person to agree with this ranking all the time. People use different criteria for different purposes. Is it a sentence that has been translated fifty times better than one that has not been translated at all? Yes, if your desire is to see how a phrase (typically a very common one) differs across languages. No, if your desire is to see a more complicated sentence (which is likely to be translated less often) that provides more context for a particular word. Is a sentence with audio better than one without? Yes, if you want to listen to audio. But when I don't want to listen to audio (as is generally the case for me), a search that favors audio gives worse results because they're more homogeneous (since the sentences that received audio were generally chosen by one person) and often shorter than I would like.

As I have said before, I favor making default searches as free from assumptions as possible. Adding assumptions decreases the diversity of the sentences that people see (and translate), and increases processing time.

By the way, I have never found a problem with the overall quality of English sentences in the Tatoeba corpus as a whole, so the concern over low-quality sentences has never resonated with me. I certainly come across poor individual sentences, which I flag or correct, but I never get the sense that they predominate in our collection of sentences. Occasionally I've done informal experiments where I've measured the fraction of sentences with mistakes that come up in a default or random search, and I've generally been impressed by the low proportion.

{{vm.hiddenReplies[32122] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir June 29, 2019 June 29, 2019 at 2:11:03 PM UTC link Permalink

Note that I am not suggesting doing anything to the search function, but rather giving a simple way for someone to just get good and useful sentences to translate. This should not affect search results in any way.