clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search

الحائط (5311 موضوعًا)

DostKaplan
قبل 5 أيام - قبل 5 أيام
English --> French

https://tatoeba.org/eng/sentenc...rom=eng&to=fra

I see French and Turkish results.

But when I choose, English --> Any Language

https://tatoeba.org/eng/sentenc...rom=eng&to=und

The French result is missing. In fact, there should at least be a result in German, too.
أخفِ الردود
AlanF_US
قبل 5 أيام
You have the following items in your "Languages" list in your settings:

tur, ara, eng, zsm

When you do a search with French as your "To" language, you are shown results with French as well. But when you do a search with "Any language" as your "To" language, you will only see results in the languages in your list. If you want to see results in other languages as well, you should add them to your "Languages" list, or you should clear out that list. But if you do not clear out the list, you should remember that it will affect your results.
أخفِ الردود
DostKaplan
أمس
I find the behavior with respect to the "To" language and the languages in my "Languages" list somewhat inconsistent and confusing. The behavior is not apparent to the user. Perhaps have a checkbox below the "To" language:

[ ] Include languages in my Languages list

Otherwise, if the languages in my "Languages" are to be used for the results, then at least be consistent about it. If I search for a very common word like "love", I see results using the "To" language as well as the languages in my "Languages" list, as you said. But if I search for a less common word like "profound", and my "To" language is "Malay" (zsm) which _also_ happens to be one of the languages in my "Languages" list (tur, ara, eng, zsm), I get zero results even though there are translations for Turkish. If the standard behavior is that languages in my "Languages" are to be used for the results, then I would expect to see results if any one of the languages in my "Languages" list has results regardless of whether the "To" language has results (but _especially_ if my "To" language happens to be one of the languages in my "Languages" list). As it is, that's not what's happening.
أخفِ الردود
AlanF_US
قبل ساعة واحدة
I see your point. However, even though there is an apparent inconsistency from one standpoint, the behavior is consistent from another.

The purpose of the "To" language is to limit hits to sentences that have translations in that language. By contrast, the purpose of the language list in your settings is to control the display of those sentences. Thus, if you search for sentences with "love" with Malay as your "To" language, you will get sentences that have (direct or indirect) translations of "love" (as well as "lovely", "loves", etc.) into Malay. As a bonus, you will see translations into Turkish and Arabic, BUT ONLY FOR THOSE SENTENCES THAT HAVE TRANSLATIONS INTO MALAY. By contrast, if you search for sentences with "profound" with Malay as your "To" language, and there are no matches in Malay, you won't get any hits. The fact that there are matches in Turkish doesn't matter.
corvard
قبل 3 أيام
Hey kiddies, I have an idea!
The competition among female volunteer translators!
They are supposed to post their best bikini selfies!
أخفِ الردود
Thanuir
قبل 3 أيام
Yeah, let us not go there.
Orava
قبل ساعتين
Why not men in bikini?
sacredceltic
قبل 17 يومًا
Bonjour.

J'avais déjà signalé ici il y a quelques temps que je trouvais que l'identification automatique de langues semblait moins bien fonctionner qu'auparavant.

Là, je viens d'introduire la phrase « La stupidité n'est pas une excuse » et elle a été identifiée comme de l'italien.
Ca semble extravagant.
Dans mon souvenir, le système d'identification mis en place par sysko, reposait sur une analyse statistique de sections de 3 ou 4 lettres des phrases de chaque langue, permettant d'établir un score de probabilité.
J'avais participé à l'évaluation de ce système, et il était très efficace, en tout cas pour les langues pour lesquelles Tatoeba disposait d'une quantité suffisante pour servir de base statistique.
Or si on décompose ma phrase en sections de 3 ou 4 lettres, on tombe sur des sections qui sont quasiment toutes hautement improbables en italien :
"La stupidità non è una scusa"
J'ai donc bien peur que le système ait été changé pour un système bien moins performant qu'auparavant...
أخفِ الردود
gillux
قبل 17 يومًا
Je suis conscient que c’est frustrant, mais sache nous n’avons pas oublié ce problème, il est noté sur Github [1]. Mais merci de nous le rappeler!

C’est toujours le même algorithme de sysko qui détecte les langues, donc cela doit venir de la base de données sur la laquelle il s’appuie. J’avais tenté de la mettre à jour, mais ça n’avait pas résolu le problème. Je vais investiguer ça prochainement et je te tiendrai au courant si j’ai besoin de ton aide pour tester.

[1] https://github.com/Tatoeba/tatoeba2/issues/1731
أخفِ الردود
sacredceltic
قبل 17 يومًا
Je serais ravi de participer à tous les tests en la matière.
Je trouve ce sujet passionnant et je me demande si d'autres services, sur Internet, ont ce besoin. Je pense que c'est une besoin émergeant. Et il faudrait envisager de recourir à de l'IA, du "deep learning", de la même manière qu'on apprend à des systèmes à reconnaître des visages.
Je pense qu'il serait intéressant de demander à Google de l'assistance en la matière. Je suis convaincu qu'ils prêteraient leur aide, car ils doivent considérer le projet Tatoeba comme intéressant et prometteur, et donc avec bienveillance.
Mais, en attendant, je trouvais la piste prise par sysko très intéressante. D'ailleurs, les probabilités, c'est la base de ce sur quoi s'appuie le "deep learning" pour dire qu'une photo est celle d'un visage ou pas.
La connaissance est rarement algorithmique. Elle est plutôt cumulative.
أخفِ الردود
gillux
قبل 4 ساعات
Comme promis, je me suis penché sur le problème.

J’ai construis un jeu de données en extrayant 10% du corpus actuel au hasard et j’ai utilisé ça comme base de travail. J’ai divisé cette base en deux parties, 90% pour entraîner le modèle, et 10% pour tester le modèle entraîné. Avec l’algorithme de détection actuel, j’ai constaté un taux de réussite de 94%, ce qui n’est pas trop mal. J’ai tout de même réécrit l’algorithme, parce qu’il me paraissait un peu mal fichu. Après pas mal de peaufinages, je suis parvenu à un taux de réussite de 97%. J’ai installé ça sur https://dev.tatoeba.org/, je t’invite à tester.

Il faut garder en tête que de notre point de vue d’être humain, l’algorithme peut paraître assez stupide quand il se trompe, mais ça ne veut pas dire qu’il est nul.

Si on constate qu’il n’arrive pas à détecter correctement « La stupidité n'est pas une excuse », on peut logiquement penser qu’il est plutôt mauvais. S’il se plante sur une phrase aussi « facile » (à nos yeux), qu’en sera-t-il avec une phrase plus ambiguë ?

Or, ce n’est pas parce qu’il échoue sur une phrase facile qu’il échouera aussi sur une phrase difficile. L’algorithme ne regarde pas les mots, tout repose sur de simples statistiques de co-occurences de caractères (pas encore de "deep learning", désolé ;-)). Pour l’algorithme, les phrases les plus difficiles ne sont pas les plus ambiguës, mais celles qui contiennent des suites de caractères pour lesquelles ses statistiques sont mauvaises.

Bref, pour se faire une idée de la qualité de l’algorithme, il faut regarder comment il se débrouille dans l’ensemble, et dans toutes les langues.

En regardant là où l’algorithme a du mal, j’ai noté que le Berbère est souvent confondu avec le Kabyle (et vice-versa), ce qui rend ces langues relativement mal reconnues par l’algorithme, malgré la quantité de données dont nous disposons pour elles. Je me demande à quel point elles sont proches. Pareil pour le Russe qui est parfois confondu avec l’Ukrainien (et vice-versa), là aussi je me demande à quel point ces langues sont proches. Il y a aussi les langues latines qui sont parfois reconnues comme de l’interlingua, ce qui n’est pas si étonnant vu que l’interlingua est directement basé sur les langues latines.
أخفِ الردود
sacredceltic
قبل 3 ساعات
Merci de t’en être occupé.
Je comprends que les 10% du corpus, ce n’est qu’un essai.
Le plus, le mieux. Idéalement il faudrait utiliser la totalité du corpus, en tout cas les phrases écrites par des locuteurs dont c’est la langue natale ( enfin celle qu’ils déclarent avoir...)
Et il faudrait une mécanique qui permette sa réactualisation permanente (sysko avait prévu ce truc, dans ma mémoire, mais ça n'avait peut-être jamais été mis en œuvre...)
Ça n’a pas besoin d’être du temps réel. Un bon traitement hebdomadaire ferait l’affaire.
Donc sans être du « profond », ce serait quand même de l’apprentissage...
CK
CK
قبل 3 أيام
** Stats - 2019-04-20 - Native Speaker Sentence Counts **

http://tatoeba.byethost3.com/stats-190420.html

These week I also added in a column showing constructed and dead language contributions, and the default sort is different.
أخفِ الردود
corvard
قبل 3 أيام
What are the directions to interpret this data?
deniko
أمس
What's the meaning of the figure "Native Minus Other Non-Native"?

Did you intend it to be All Sentences Minus Other Non-Native? Or, which is the same, Native PLUS Dead & Constructed? I would definitely understood the figure and why it makes sense to sort by it.
أخفِ الردود
CK
CK
أمس
> What's the meaning of the figure "Native Minus Other Non-Native"?

It's the number in the column labeled "Native" minus the number in the column labeled "Other Non-Native."
أخفِ الردود
deniko
أمس
Sherlock Holmes and Doctor Watson flying hot air balloon. They get lost, so they have to land. They have no idea where they are. Sherlock Holmes asks passing by guy: "Excuse me sir, but where are we ?" Guy says: "You are in hot air balloon" and he walks away.

After some thinking Sherlock says : "That person was a mathematician". "How did you know that ?" asks Doctor Watson. Sherlock Holmes replies: "Well first he gave a very precise answer, and second - the answer was completely useless"

أخفِ الردود
CK
CK
أمس
>Did you intend it to be All Sentences Minus Other Non-Native?

No.

>Or, which is the same, Native PLUS Dead & Constructed?

No.

>I would definitely understood the figure and why it makes sense to sort by it.

I don't understand what you mean by this.

mraz
أمس
deniko magyarul:

Sherlock Holmes és doktor Watson hőlégballonnal repülnek.
Eltévednek, így le kell ereszkedniük. Fogalmuk sincs, hogy hol vannak.
Sherlock Holmes megkérdez egy arra haladó férfit:
- Elnézést uram, de hol vagyunk?
A férfi azt mondja: Ti egy hőlégballonon vagytok. - és elmegy.
Kis gondolkodás után Sherlock azt mondja:
- Ez a személy matematikus volt.
- Honnan tudod? - kérdezi doktor Watson.
Sherlock Holmes válaszol:
- Először nagyon pontos választ adott, másodszor a válasz teljesen haszontalan volt.
Thanuir
أمس - قبل 14 ساعةً
If one considers contributions in a native language as positive, and contributions in a non-native language as negative, and does not care about contributions in dead or constructed languages, then the the score reflects how good a contributor one is: It increases as one adds native sentences and decreases as one adds non-native sentences, but does not change with contributions to dead or constructed languages.
أخفِ الردود
shekitten
قبل 21 ساعةً
"than the the score reflects how good a contributor one is"

This is a value judgment, and one that not everyone shares. I personally prefer to just know exactly what is factored into the score and come to my own conclusion.
أخفِ الردود
CK
CK
قبل 19 ساعةً
> I personally prefer to just know exactly what is factored ...

Exactly. That's why I included all the data.

Personally, though, based on some of the non-native sentences I've seen in the languages I know, perhaps the calculation should possibly be (native - (2 x non-native)).

It is worrying that we have had some members massively contribute in a non-native language, not even bothering to translate those sentences into their own native language. This doesn't seem to meet the aims of the project to me.
Thanuir
قبل 14 ساعةً
Someone asked about interpretation of the data. That is the most straightforward one. I do not necessarily agree with it.
أخفِ الردود
CK
CK
قبل 12 ساعةً - قبل 12 ساعةً
I, too, don't necessarily agree with the "minus" idea for each member, but as a general rule, it sort of applies. It definitely applies to the perceived trustworthiness of sentences.

For example, many European speakers of languages with not so many speakers are very good at English. Many of Sharptoothed's English sentences are ones that he and I did together when we proofread and edited an old public domain English-Russian dictionary. At that time, the only way to import sentences was with one member's username.

A number of our long-term members with a very high level of English still primarily, or only, contribute in their own languages to help encourage other members to also contribute in their strongest language, which helps our project more. I think this is the culture that we should be hoping to develop here.
أخفِ الردود
Impersonator
قبل 12 ساعةً - قبل 12 ساعةً
> contribute in their strongest language, which helps our
> project more. I think this is the culture that we should
> be hoping to develop here.

I think this is deeply harmful for the project, because for smaller languages, there might be very few speakers who speak the language as their strongest language. It discourages contributions in these languages.

E.g. the might be some speakers who speak Lower Sorbian better than German, but they are from the older generation and very unlikely to contribute to Tatoeba. Speakers who *could* contribute to Tatoeba would almost certainly speak German as their strongest language, so this policy discourages them from contributing in Lower Sorbian and encourages them to contribute in German.



I understand that the 'use your strongest language' policy kinda-works for English, but please consider the broader picture. *Most* languages in the world are smaller than English, and by designing policies around English and a handful of 'bigger' languages, you're harming *most* languages in the world. Please reconsider.
أخفِ الردود
CK
CK
قبل 11 ساعةً - قبل 11 ساعةً
So, maybe instead of just "languages without native speakers" (dead and constructed), I should also include a column for "languages with few native speakers" (dying languages, or languages that for some other reason have only a few native speakers). That seems like it might be a sensible thing to do.

Can anybody send me a list of such languages? Just the 3-letter code used here would be enough. https://tatoeba.org/eng/private_messages/write/CK

أخفِ الردود
Rockaround
قبل 11 ساعةً
Even so, there is still a large part of choosing the languages that is subjective. If we compare Swedish and Quechua for instance, they have about the same number of native speakers (more or less 10 million people). Swedish is very much alive, and present everywhere online (although there aren't any active contributor currently on tatoeba), whereas Quechua is slowly getting crushed by Spanish, for different reasons, and its native speakers are unlikely to have access to a computer, even less to understand how it works.

I tend to agree with your arguments, especially after some truly terrible sentences were added to French (from a bot, and from a well-known "team"), but I don't think there is a clear line between a positive and a negative contribution.
Impersonator
قبل 7 ساعات
Thanks for your answer!

> Can anybody send me a list of such languages?

Wikipedia has a List of endangered languages, which could be an useful starting point, but each subpage has a different format and it's hard to parse automatically:
https://en.wikipedia.org/wiki/L...ered_languages :((
CK
CK
أمس - أمس
** All English Audio by CK - Randomly-selected **

http://www.manythings.org/rndaudio/eng/all.html

This loads an external file that contains all 401,794 English sentences with audio by CK in the 2019-04-21 exported data.

This may take a long time on the first load, but after that, each randomly-selected list should be fairly fast.

This works on my computer, but if your computer is not so powerful, it may not work.

You will be able to listen to each audio at 3 different speeds.

If this doesn't work for you, you can get a similar feeling with smaller chunks of data at http://www.manythings.org/rndaudio/ .
CK
CK
قبل 18 يومًا
** Audio Milestone **

https://tatoeba.org/eng/audio/index
Sentences with audio (total 555,555)
أخفِ الردود
Seael
قبل 3 أيام
Great!
:)
corvard
قبل 3 أيام
Bamboocha!
mraz
قبل 4 أيام - قبل 4 أيام
Ĝojan Paskon! mraz

#1444042
أخفِ الردود
Ricardo14
قبل 3 أيام
Feliz Páscoa!


Happy Easter!
shekitten
قبل 3 أيام
Feliĉan Pesaĥon! חג שמח!
CK
CK
قبل 4 أيام
Get a random selection of untranslated English sentences with audio.

https://tatoeba.org/eng/sentenc...filter=exclude

There are currently 13,656 English sentences with audio that have not been translated into any language.

If you would prefer to see the most-recently created sentences first, see the link at the top of my profile. https://tatoeba.org/user/profile/CK