menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
Silja Silja January 24, 2015 January 24, 2015 at 10:57:58 PM UTC link Permalink

Lately the language detection has marked quite a few of my Finnish sentences as English in the first place. Some of those sentences are these:

Tom justiinsa teki niin. http://tatoeba.org/fin/sentences/show/3785040
Nouse ylös ja taistele. http://tatoeba.org/fin/sentences/show/3785043
Tom hikoili. http://tatoeba.org/fin/sentences/show/3785774
No onko edes siistiä? http://tatoeba.org/fin/sentences/show/3786584
Onko Tom vielä hereillä? http://tatoeba.org/fin/sentences/show/3789015
Onko Tom edelleen hereillä? http://tatoeba.org/fin/sentences/show/3789014
Puhuvatko he ranskaa? http://tatoeba.org/fin/sentences/show/3790314
Emme me puhu ranskaa. http://tatoeba.org/eng/sentences/show/3793029
Ole varovainen. Älä heitä pois noita papereita. http://tatoeba.org/eng/sentences/show/3794756

Yes, there are some words that could also be English in those sentences (no, me), but otherwise I really can't understand why these are detected as English. If I remember correctly, the language detector needs to be updated from to time, so that it "learns" better what kind of combination of letters should be detected as which language. Has this update been made recently?

I'm not complaning, because it's really something like 1 out of 100 sentences that are detected wrongly and it's no big deal to correct them manually, but I'm just curious. :)

{{vm.hiddenReplies[21606] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux January 24, 2015 January 24, 2015 at 11:46:36 PM UTC link Permalink

> Has this update been made recently?
Yes, on the 17th of November, 2014.

I’m not familiar with the language detection tool so I can’t tell you much about its weaknesses.