* Tatoeba Most Translated Sentences Charts * charts have been updated:
This (new?) one is an interesting idea.
Sentences translated by the largest number of members
Love this one!
I also like the third table with the most popular sentences. It's funny to note that although 29% of English sentences contain the word "Tom", only 1% of the most popular sentences contain this word.
Tom is popular, but the interjections (hello; bonjour), short and single sentences (Go!, What?), well-known phrases (How are you?; Je ne sais pas.) are the most popular ones amongst translators. It's not a big surprise.
Ask Tom. (short sentence); Hello, Tom. (interjection); My name is Tom. (well-known phrase)
Ez így van. De vannak vegyes mondatok is; pl.: Hello, Tom. Hogy hívnak, Tom? Ez kolbász, Tom. Vagy valami hasonlók.
Ja, hogy ez a legnépszerűbb mondatokhoz íródott!
@Cabo You have a point. It would be interesting to compile the rankings of the most popular sentences in the last month alone. This would allow us to see how the situation evolves now that the most obvious sentences you mention are already widely covered.
Update of my previous post:
Original sentences within a given calendar week that attracted the most (direct and indirect) translations in the following 3 weeks.
The main point is that translations are to be in different languages and by different users. So, when a user translates their own sentence into different languages, these translations won’t count. Nonetheless, they can serve as a base for new (indirect) translations by other users.
I thought this little “competion” could give an idea which kind of recently added sentences members find interesting to translate. (hm…)
Week 31: 3 sentences with 5 translations
Sentence #8950897 [eng] (CK)
#8951011 [asm] Mohsin_Ali, #8951125 [rus] marafon, #8954378 [epo] Verdastelo, #8954395 [nld] martinod, #8958525 [lit] glavsaltulo
Sentence #8952548 [fra] (Julien_PDC)
#8969377 [rus] marafon, #8969471 [epo] soweli_Elepanto, #8969947 [deu] Esperantostern, #8971291 [ido] Idopauline, #8984793 [lit] glavsaltulo
Sentence #8952549 [fra] (Julien_PDC)
#8969376 [rus] marafon, #8969472 [epo] soweli_Elepanto, #8969946 [deu] Esperantostern, #8971294 [ile] Idopauline, #8984795 [lit] glavsaltulo
Week 32: 2 sentences with 8 translations
Sentence #8961864 [eng] (CK)
#8961903 [ces] Ergulis, #8962236 [ron] elenacristina260, #8967653 [rus] marafon, #8967884 [epo] Verdastelo, #8968842 [deu] Pfirsichbaeumchen, #8968854 [lit] glavsaltulo, #8969452 [jpn] small_snow, #8971264 [ido] Idopauline
Sentence #8962885 [eng] (CK)
#8963583 [epo] Verdastelo, #8963758 [rus] marafon, #8974936 [lit] glavsaltulo, #8979760 [por] JGEN, #9001683 [tur] soliloquist, #9001840 [deu] Pfirsichbaeumchen, #9003914 [spa] Shishir, #9003916 [ces] Ergulis
I like the idea of measuring popularity over a few weeks time frame to spot the "weekly trending sentences".
On the other hand, it seems to me that it would be fairer to compare only the sentences of the same language and therefore not to take into account the indirect translations.
One more update on “Most successful sentences per calendar week”:
Recently added original sentences that attracted the most translations in different languages by different members.
Sentence #8967901 [eng] (CK) We got lost in the cave.
Sentence #8973283 [por] (Ricardo14) Sou do Brasil, e você?
Sentence #8977198 [por] (Ricardo14) É um prazer imenso.
Sentence #8978634 [eng] (CK) Tom's house is over there.
Sentence #8984009 [eng] (CK) Tom isn't a socialist.
Sentence #8986800 [por] (Ricardo14) Eu falo grego, e você?
Sentence #8983956 [eng] (CK) Tom isn't colorblind.
Sentence #8985089 [por] (Ricardo14) Eu quero morar em Lisboa.
Sentence #8986338 [por] (Ricardo14) Carlos é espanhol e a esposa dele é alemã.
Sentence #8986803 [por] (Ricardo14) Eu falo um pouco de alemão.
What would this look like if scaled by the number of translations the sentences in a given language usually get?
That is, divide the number of English translation of a given sentence by the average number of translations English sentences have, and so on for other languages.
This would tell how popular a particular sentence is, but would control for the fact that there are many more people translating from English than many other languages, for example.
Hi, my name is Tango, I just joint tatoeba to contribute a translation for our native language, I read on how I can add it in this article https://en.wiki.tatoeba.org/art...nguage-request but on my profile I can't seem to find a way to add it. any help would be very apprecitated.
Hi Tango and welcome! :)
You will have to wait that your language request is implemented and deployed before you can add the language in your profile. This can take a week or two.
Thank you Trang for the quick update, will look forward for it to be integrated.
Most often, when I perform an exact search for a Japanese sentence I have found at Clozemaster, the search doesn't find anything. However, if I set "Is orphan" and "Is unapproved" to Any, the sentence will be found. I find it very irritating that I have to go through that extra step, especially if the first search is lengthy.
Try this one: https://tatoeba.org/eng/sentenc...rom=und&to=und
I can possibly understand that Tatoeba does not want to pollute the search results with dodgy sentences, but when there is an exact match I think an exception should be made.
Another issue is why there are so many Japanese sentences that seem to be mostly forgotten, having no owner and sometimes with very "imaginative" English translations. Might this partly be because they are hidden by the search interface?
Bookmark this URL.
It's a pre-filled advanced search form with what you want.
The pre-filled search form option is something that was added recently.
Thanks, that helps a bit, but not when I issue the search directly from Clozemaster.
I suggest posting a request at Clozemaster for them to change the parameters of the query issued to include at least orphans (unapproved might be more problematic) when searching for a full sentence. Searching for a single word is probably a different story, since you would probably get enough good matches without relaxing the criteria.
Many of our Japanese sentences come from the Tanaka Corpus, and we've had too few Japanese-speaking members to fix all the problems with it.
Are you sure that all of the sentences originate from here?
Or maybe the sentence was rewritten but on clozemaster the infomation hasn't changed.
I also not found one of its English sentence pair so far. (in first 100 most used words list)
Judging from my experience, I am quite sure. I don't think Clozemaster uses any other corpus than Tatoeba, and Clozemaster does not rewrite sentences, to my knowledge.
It seems that Clozemaster fetched all its Japanese sentences from Tatoeba some five to eight years ago, so any changes made after that are not reflected in its corpus. On the other hand, not a lot seems to have happened with these sentences here at Tatoeba in the meantime, anyway. But if you don't find a particular sentence, it might have been deleted.
** Stats & Graphs **
Tatoeba Stats, Graphs & Charts have been updated:
51 usernames from sharptoothed's Tatoeba User Activity Chart from last week had both a tcnt/scnt over 1 and over 1,000 native language sentences.
aldar, alexmarcelo, arh, Balamax, bandeirante, bill, brauchinet, bunbuku, CH, CK, CM, CN, danepo, deniko, diegohn, dotheduyet1999, elenacristina260, Esperantostern, felix63, felvideki, gillux, GrizaLeono, Hybrid, lbdx, LeeSooHa, Luiaard, manese, marafon, MarijnKp, martinod, Micsmithel, morbrorper, mraz, Ninja, Nylez, Objectivesea, odexed, ondo, PaulP, Pfirsichbaeumchen, po_slovensky, Ricardo14, sacredceltic, Selena777, sharptoothed, shekitten, Shishir, Silja, small_snow, Tepan, Yorwba
Source URL: https://tatoeba.j-langtools.com...=1&chtype=alla
tcnt/scnt is this guy:
If it's 1.5 this means 1000 of your sentences have 1500 translations directly linked to it (on average).
If it's <1 this means you add a lot of sentences without translations.
This number only means that how many sentences are connected your sentences.
"If it's <1 this means you add a lot of sentences without translations."
If I write only one sentence (and not translate it), but 8 other contributor tranlates that, then my tcnt/scnt number is 8.
So this is a measure of
1. how popular your language is (as a foreign language)
2. how convenient (easy, short, simple) your sentences are to translate
3. how many sentences you write as translations of others, rather than as original sentences.
I would say that it is not particularly virtuous to aim for a high number here.
I agree with you.
I don't quite understand how tcnt/scnt is determined.
The user below has 3 sentences all of which are linked to a translation, so I would expect their tcnt/scnt to be 1, but it is 0.3 on the chart.
And this user's ratio is 0.8 on the chart although their all sentences are linked to a translation.
What am I missing?
It looks like "Count only native contributions" checkbox is checked. Uncheck it and you'll see what you expected. I can't explain it right now, sorry. Maybe there's a bug in my scripts. I'll try to find it out.
Thanks. I'm sorry to have bothered you.
The user glavsaltulo has 31,220 sentences (31,193 of them are in their native language, Lithuanian).
And only 4 of their sentences are untranslated.
But their tcnt/scnt is 0.50 on the chart when 'Count only native contributions' is checked.
So there must be something that decreases this number other than untranslated and non-native sentences.
Unchecking that checkbox gives a more accurate number as you mentioned.
When that checkbox is checked only sentences created by natives counted. That is, if a member has 3 sentences and all of them are translated once but only one sentence belongs to a native then the ratio will be 1/3.
"Count only native contributions" is unchecked by default and saved in cookies, as far as I remember. This checkbox is for those who needs statistics for native contributors.
They are displayed right-to-left, but I think your question is whether they should be right-justified rather than left-justified, correct?
Yes, indeed. Thank you!
Check out the new "Browse by language" page: https://dev.tatoeba.org/sentences/index (note that it displays differently for guests and logged-in users).
Related issue: https://github.com/Tatoeba/tatoeba2/issues/2157
Feedback is welcome. Also, a lot of code changed under the hood, so I’d be glad if you could check if the rest of the website is working normally.
I really like it. I see a lot of influence from Wikipedia, but there's nothing wrong with that. I particularly like the display of languages by number of sentences (100,000+, 10,000+, etc.). That might be especially helpful in motivating people to add sentences in order to move their language from one group to another (for instance, another 3,000+ sentences will get Polish into the 100,000+ group).
Regarding the text "0+ sentences": Since you have a "1+ sentence" category, it should suffice to show "0 sentences". But I'm not sure why we have so many languages with no sentences. Don't we require sentences to be added for a language before we support the language? I suppose that we could have languages that have a small number of sentences that are then all deleted, for one reason or another. But displaying languages with zero sentences could just lead people to wonder why we don't support every known language, since we apparently don't even require sentences for the language before we display it.
Thanks! I’m glad you like it too. :-)
Right, the "0+ sentences" is weird. It won’t appear on tatoeba.org because all languages have at least one sentence. It does appear on dev.tatoeba.org though because we keep the list of supported languages updated without adding new sentences. I’m going change it into "0 sentences" nonetheless.
I see. That makes sense.
Thanks again for this page gillux, I'm looking forward to having this deployed on prod :)
The issues I've noticed:
1) The "unknown" language has a blank icon and clicking on it leads to an error.
2) The South Levantine Arabic icon is wrongly sized (but you have already noted that).
3) I also thought for a moment that some languages were missing an icon. For with "Tahaggart Tamahaq", if you don't know that it's one language you could be thinking that Tahaggart is one language and Tamahaq is another language and wondering why the second one doesn't have an icon.
If you could make the space between languages larger, or reduce the line-height of the language names, it would make it clear which string corresponds to the same language.
I fixed 1) and 2) already.
As for 3), I borrowed this way of displaying text from Wikipedia. It’s less of a problem for them because they localize each language names. I could change the display like you suggested, but this will make the text unaligned with other columns. I like how straight and organized it looks right now.
I’m not sure how to go about this but I’d rather try to approach the problem differently.
The new "Browse by language" page is now deployed on tatoeba.org.
This project is really great! <3
Thank you! :)
There are now 777,777 sentences on List 907. 513,553 (66%) of these have audio.
This is the list of good proofread English sentences that I use on my projects. http://www.manythings.org/corpus/tatoeba.html
Bilingual sentence pairs made up of these sentences and sentences by native speakers contributing to the Tatoeba Project can be downloaded from http://www.manythings.org/anki/ .
Screenshot showing the 777,777 number.
For comparison, here are the number of sentences for the 2nd and 3rd ranked languages on tatoeba.org.
Russian = 798,683
Italian = 767,143
Link to List 907.
Codidact languages site: https://languages.codidact.com/
This is an open source Q&A-community in the style of Stack exchange, but non-commercial and also the source code is open. This particular instance I linked to is about languages; maybe people here find it to be of interest.
Cool! Thank you!