menu
Tatoeba
language
En em enskrivañ Kevreañ
language Brezhoneg
menu
Tatoeba

chevron_right En em enskrivañ

chevron_right Kevreañ

Merdeiñ

chevron_right Diskouez ur frazenn dre zegouezh

chevron_right Diskouez dre yezh

chevron_right Diskouez dre listenn

chevron_right Diskouez dre valiz

chevron_right Diskouez an aodio

Kumuniezh

chevron_right Moger

chevron_right Listenn an holl Izili

chevron_right Yezhoù an Izili

chevron_right Komzerien a-vihanik

search
clear
swap_horiz
search
Lepotdeterre Lepotdeterre 1 Gouere 2015, aozet 1 Gouere 2015 1 Gouere 2015 da 05:59:42 UTC, aozet 1 Gouere 2015 da 06:00:16 UTC link Liamm-peurbadus

Why doesn't the auto-detect recognise Macedonian? Whenever I try to use it for a Macedonian sentence, it classifies it as Bulgarian. Now, how, pray tell, could a sentence containing the љ, њ, ј, ќ, ѓ and/or џ be considered Bulgarian, when those letters are fully absent in Bulgarian? Isn't the basic step of language recognition looking at its alphabet, before going into the words themselves?

This doesn't really matter to me, as I don't use auto-detect - I just select Macedonian manually after logging in, and subsequently, it stays as my default language until I log out, which I generally don't do for entire weeks. However, the fact that the auto-detect feature is endowed with such an absurd imperfection makes me uneasy.

{{vm.hiddenReplies[23304] ? 'expand_more' : 'expand_less'}} kuzhat ar respontoù diskouez ar respontoù
User55521 User55521 1 Gouere 2015 1 Gouere 2015 da 06:26:50 UTC link Liamm-peurbadus

I believe auto-detect is learning from the existing sentences, but it’s not doing that real-time: it needs to re-learn to take the new sentences into account. Probabely when the auto-detect data was generated last time, we had much more Bulgarian sentences than Macedonian.

> Isn't the basic step of language
> recognition looking at its alphabet,
> before going into the words themselves?

Not neccessarily. There are many algorithms available.

gillux gillux 1 Gouere 2015, aozet 1 Gouere 2015 1 Gouere 2015 da 06:36:12 UTC, aozet 1 Gouere 2015 da 06:36:39 UTC link Liamm-peurbadus

> Why doesn't the auto-detect recognise Macedonian?

Because the autodetection algorithm is based on the Tatoeba corpus itself, and we still need to update it manually once in a while so that it takes new sentences into account. Three month ago, there were less than 200 sentences in Macedonian, which was not enough for the algorithm to work. Now you added about 50 000 Macedonian sentences, it will certainly work better once we update it. I’ll let you know when it’s done.

gillux gillux 1 Gouere 2015 1 Gouere 2015 da 07:11:38 UTC link Liamm-peurbadus

It should work better now.

{{vm.hiddenReplies[23307] ? 'expand_more' : 'expand_less'}} kuzhat ar respontoù diskouez ar respontoù
Guybrush88 Guybrush88 1 Gouere 2015 1 Gouere 2015 da 07:47:41 UTC link Liamm-peurbadus

actually it seems that it uses the language that is used the most by users. I just added #4322143 and #4322166 to test it and it firstly recognized those sentences as Italian

{{vm.hiddenReplies[23308] ? 'expand_more' : 'expand_less'}} kuzhat ar respontoù diskouez ar respontoù
Lepotdeterre Lepotdeterre 1 Gouere 2015 1 Gouere 2015 da 09:43:51 UTC link Liamm-peurbadus

But I've never posted anything in Bulgarian, so that's clearly not the only factor.

Lepotdeterre Lepotdeterre 1 Gouere 2015 1 Gouere 2015 da 09:44:01 UTC link Liamm-peurbadus

Thank you.