I'd like to suggest something: Tatoeba is now very comprehensive, and it boasts thousands of examples in lots of entries, but I've seen that many "orphan" sentences are not translated because they're either very complex, or very bookish. I think that Tatoeba should strive for perfection, that is, it should strive for quality over quantity.
If this is a resource intended for language learners of all stages, it should get rid of those "orphan" sentences altogether, and also, we should peruse sentences that are similar to others (that convey the exact same idea), and also we should get rid of sentences that contain inflamatory comments, and also sentences that are very lengthy.
It makes no sense to me to keep adding hundreds of sentences everyday, while there are thousands of other sentences that are still not translated. We should focus on those untranslated sentences before moving on to keep adding sentences. And the sentences that we should add, must fill a linguistic gap not yet found in this corpus. Yes, because this is a multilingual corpus, and we should see it that way, and treat it that way.
Anyways, tell me what do you think, that's just my opinion on this beautiful site.
Best regards, Eremon.
Aina pitää muistaa, että Tatoeba on vapaaehtoisprojekti. Yleensä on vahingollista yrittää kieltää ihmisiä tekemästä jotain. Sen sijaan kannattaa inspiroida heitä tuottamaan jotain hyödyllistä.
Erityisesti, korpus on jo valmiiksi aika valtava, eikä ainakaan minulla ole hyvää käsitystä siitä, mitä se jo sisältää ja mitä ei. Niinpä lisään ja käännän sellaisia lauseita, mitkä sattuvat kiinnostamaan juuri sillä hetkellä.
Ehdotan siis, että lisäät itse sellaista sisältöä mitä haluat ja annat muiden lisätä sellaista sisältöä, mitä he haluavat. Jos haluat rohkaista muita tiettyyn toimintaan, tarjoa heille työkaluja ja ohjeita siihen suuntaan. Inspiroi, älä rajoita.
Tässä pisimmät englanninkieliset lauseet ilman käännöstä:
Tässä pisimmät espanjankieliset ilman käännöstä:
Noiden kääntäminen olisi suuri palvelus Tatoeballe; se on usein haastavaa ja vie aikaa.
I agree with you on the need to help contributors participate more effectively in the development of Tatoeba. On the other hand, it seems important to me that everyone remains totally free to choose the sentences he translates.
> "We should focus on those untranslated sentences before moving on to keep adding sentences."
In my opinion, translators who would like to optimize their impact on the quality of the corpus should translate as a priority:
- sentences that have not yet been translated into their language but are already translated into many other languages. In addition to the direct translation, the translator then adds numerous indirect translations to the corpus at the same time.
- sentences that contain words that are highly searched for on Tatoeba but have not yet been translated.
> "And the sentences that we should add, must fill a linguistic gap not yet found in this corpus."
I'm currently trying to develop a tool that allows to identify words that are often searched on Tatoeba but not yet represented in the corpus: https://tatominer.imfast.io. Feel free to use it and to share some feedback.
The ideal would be to be able to identify the sentences that would most need to be translated and the words that would most need to be added directly from the site's interface. For example, on the https://tatoeba.org/eng/activit...late_sentences page, one could add to the sorting options "containing untranslated words first" and "popular sentences first".
I hope that among the developers who plan to participate in the code event, some, like me, will be interested in creating this kind of new features.
> If this is a resource intended for language learners of all stages
It is not. However, some people like to use it as such.
Other than that, what you present is a personal opinion and I wholeheartedly encourage you to pursue your vision. However, I encourage everybody to do the same, because I think we will reach better diversity that way compared to the situation where everybody does the same.
> I've seen that many "orphan" sentences are not translated because they're
> either very complex, or very bookish
Do you have any concrete examples?
As far as I can tell, there are only 650 orphan sentences that are not translated at all.
There are in total over 190,000 orphan sentences, so the proportion of untranslated orphan sentences is quite low.
Deleting orphan sentences that have a translation is not an easy choice. You'd have to make sure that there are other sentences that are translated to the same language(s) and cover *all* the vocabulary and the linguistic properties of the sentences that you choose to delete.
For instance you may have a complex/bookish sentence with the word "orphan", but if it's the *only* sentence that has a translation in Indonesian let's say, then deleting this orphan sentence will completely remove the chance for someone to find results when searching for "orphan" from English to Indonesian.
Generally speaking, it is good to remember that everyone has different needs. A sentence may be useless to you but very useful to someone else.