menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
gillux gillux September 27, 2018, edited September 27, 2018 September 27, 2018 at 12:02:49 AM UTC, edited September 27, 2018 at 12:57:24 AM UTC link Permalink

Note to contributors: I’ve improved the language autodetection feature, so it should work better now. It should also become more accurate over time.

Long story:

For those who don’t know, when you add a new sentence and select "autodetect" for the language, there is a tool called Tatodetect that guesses the language of your sentence. Tatodetect works by making a statistical analysis of the Tatoeba corpus to learn what words are used in what languages. So basically the more sentences there is in a given language, the more accurately Tatodetect can autodetect it.

However, there was a limitation: Tatodetect can not learn from new sentences unless it performs a new (costly) analysis of the corpus. As a result, we had to manually start new analyses of the corpus every now and then, so that Tatodetect could learn from newly added sentences. The last analysis was from June 2017. I ran a new one today and I automated this process. The corpus is now going to be re-analysed on a weekly basis.

{{vm.hiddenReplies[30008] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 September 27, 2018 September 27, 2018 at 12:14:05 AM UTC link Permalink

That's a such great news! Thank you so **really** much, gillux!

alexmarcelo alexmarcelo September 27, 2018 September 27, 2018 at 12:57:49 AM UTC link Permalink

Great. This will be very useful, especially for Latin.