Thread #4293 - Tatoeba

Regardons les faits d'abord: En Hongrie l'espéranto a la dix-huitième place parmi les langues maitrisées, http://www.nepszamlalas.hu/eng/...ad01_13_0.html . Actuellement il y a plus de 135.000 articles dans la wikipedia en espéranto http://eo.wikipedia.org ce qui fait la vingt-troisième place en comparaison avec les autres versions, http://stats.wikimedia.org/EO/Sitemap.htm . Les chinois donnent des informations au monde en une dixaine de langues dont l'espéranto, http://esperanto.china.org.cn . Donc il y a pas mal de langues nationales qui se trouvent derrière l'espéranto...

Comme en général les gens qui parlent l'espéranto parlent aussi beaucoup d'autres langues (probablement ils sont plus polyglottes que les gens des autres communautés linguistiques), il est normal qu'ils s'intéressent à un projet comme Tatoeba. Je ne vois pas de désavantage pour le projet.

hide replies show replies

U2FS December 8, 2010 December 8, 2010 at 10:55:15 PM UTC

flag

Report

link

Permalink

Ce qui me gène avec l'espéranto est qu'il n'est représentatif que d'un nombre restreint de langues (langues romanes, germaniques, slaves, grec et isolats, et langues agglutinantes pour la structure) tant sur le substrat que sur la morphologie. Or à chaque langue sont liés des mécanismes cognitifs particuliers (façon de se repérer dans l'espace...).

Sur les 3000 à 7000 langues que l'on recense actuellement l'esperanto en représenterait on va dire une centaine ? Et avec ca on voudrait le promouvoir comme langue universelle ? C'est vraiment faire très peu cas des 95 % de langues existantes.

hide replies show replies

ludoviko December 9, 2010 December 9, 2010 at 9:52:49 AM UTC

flag

Report

link

Permalink

Moi aussi j'aimerais avoir une langue basée sur plus de langues. Mais il semble qu'avec chaque langue ajoutée il devient plus difficile de l'apprendre, pour tout le monde.

L'espéranto est loin d'être une solution idéale - seulement la meilleure connue (ou une des meilleures parmi les langues construites). L'espéranto est beaucoup plus proche à pas mal de langues que, par exemple, l'anglais, le francais ou l'allemand. Donc de ce point de vue, il est préférable comme candidat de langue universelle aux langues nationales.

Et il est bien clair qu'on peut apprendre l'espéranto dans un tiers du temps nécessaire pour le même niveau dans une langue nationale. Donc avec le même temps on parle l'espéranto beaucoup mieux. C'est pourquoi qu'il y a pas mal de chinois, japonais ou vietnamiens etc. qui apprennent l'espéranto.

salikh December 10, 2010 December 10, 2010 at 6:22:11 AM UTC

flag

Report

link

Permalink

If you'd like to remove duplicates in esperanto, I prepared a quick list of the phrases which are exact duplicates.
Please see http://www.is.titech.ac.jp/~zakirov8/epo-dup.html .
There is also a link to a list in tsv format.

The only issue I see is that many of the duplicate phrases already have lots of translations, so what we need is not only deleting of duplicates, but also relinking pf the translations.

hide replies show replies

ludoviko December 10, 2010 December 10, 2010 at 11:03:29 AM UTC

flag

Report

link

Permalink

It is nice to remove duplicates - but wouldn't it be even nicer to think, why they are created?

Usually I think people just don't see that there is already an Esperanto translation, because it is an indirect translation of second or more degree - so they go ahead and translate.

If it is possible to create a script for duplicate sentences, wouldn't it be possible to create something to show every translation already in the translation chain? This would reduce the work for eliminating duplicates to nearly nothing...

hide replies show replies

sysko December 10, 2010 December 10, 2010 at 2:16:15 PM UTC

flag

Report

link

Permalink

unfortunately as discussed before, the reason we can't show the whole translation graph is because normal database system are really bad to make this kind of operation. So the best we can do with all the possible optimization is a 2degree depth chain, with the current system.
In theory it would be possible, but it would be slow as hell.

That's the reason why we've started to build our own database server for our specific need, to permit this.
So in the future it will be possible
http://static.tatoeba.org/425123.html (it's a page shot of the version I have on my computer, don't pay attention to how ugly it is) as you can see there we view every translations, whatever the degree of depth.
And anyway our database will be able to detect duplicate on the fly.:)

hide replies show replies

sysko December 10, 2010 December 10, 2010 at 2:17:31 PM UTC

flag

Report

link

Permalink

so yep it possible, and the script was easier to do, and was done as a temporary solution, waiting we finish this new version.

ludoviko December 10, 2010 December 10, 2010 at 7:13:31 PM UTC

flag

Report

link

Permalink

This looks nice. Maybe it could be used only by those who want to translate, if it is slow.

hide replies show replies

sysko December 10, 2010 December 10, 2010 at 8:07:40 PM UTC

flag

Report

link

Permalink

in fact what i've shown has been made with the new version
it's a hell to code with normal database, and unfortunately we only have one server, so if the server take 10 seconds to generate my page, during this 10 seconds people who don't care will still also need to wait 10 seconds.

hide replies show replies

sysko December 10, 2010 December 10, 2010 at 8:08:46 PM UTC

flag

Report

link

Permalink

so by a collateral effect it will affect not only the performance of those who wants.

hide replies show replies

ludoviko December 10, 2010 December 10, 2010 at 8:58:23 PM UTC

flag

Report

link

Permalink

The famous collateral effect :-|
OK, I see. So we shall wait for the new database.
And in the meantime, maybe we could spread the enthusiasm about putting more translation links. They help a bit.

ludoviko December 11, 2010 December 11, 2010 at 1:43:48 PM UTC

flag

Report

link

Permalink

* Total number of sentences linked *

How about indicating the (approximate) total number of sentences linked? This could be calculated once a day/week/month and, maybe, would be of some help. So on http://tatoeba.org/eng/sentences/show/93453 we would see "+2 hidden translations" (below) or "(There is a) total of 4 translations" (above). (As shows http://tatoeba.org/eng/sentences/show/333724 )

Everyone who wants to translate would know there are already some hidden translations - so, be careful, look them up before risking to add a duplicate which will be deleted later anyway.

hide replies show replies

ludoviko December 12, 2010 December 12, 2010 at 2:11:46 AM UTC

flag

Report

link

Permalink

* Identification of translation chain and language *

How about assigning a second identification to every sentence which denotes the translation chain (graph) and the language? So in the example http://tatoeba.org/eng/sentences/show/93453 the first sentence, the Japanese one, would get the identification 93453-jpn, the second, the English one, would get 93453-eng, the French one 93453-fra, the Chinese one 93453-cmn and the last, the Esperanto one 93453-epo. A second Esperanto translation would get 93453-epo2.

If then, before translating, in a first step, everyone had to inform the system about the planned target language, the system could show, if a translation already exists (or two translations...) and show it or them. If the second identification would already be assigned, this databank procedure would not last long.

Perhaps it would take a bit of work to assign these second identifications - but it would more or less eliminate the problem with duplicates.

Somehow this procedure would mean doing the time consuming search procedure for the complete translation graph in the database once and later just taking the stored result.

hide replies show replies

sysko December 12, 2010 December 12, 2010 at 4:38:22 AM UTC

flag

Report

link

Permalink

this system would be a hell to maintain

1 - computers are fast to deal with numbers, but become slow when it comes to deal with characters
2 - it's easy to done if it was all about tree, but unfortunately we're dealing with graph, so your proposition bring the following problems
* we will need to update it when we delete a sentence
* the same when we mix to graph, by adding a link
and moreover it will still doesn't solve the problem which is traversing the graph, as you will still need to traverse it to discover there is already a epo2 and so

hide replies show replies

sysko December 12, 2010 December 12, 2010 at 4:40:43 AM UTC

flag

Report

link

Permalink

to be honnest before you propose other solution
we're thinking about it for one year, and there's no simple solution to this problem with the current architecture, and as we're few developpers, I prefer to focus my free time on the new version rather than trying to find and develop a new one, which will only increase the time before we get this new version which will solve in a smart way these problems

hide replies show replies

ludoviko December 12, 2010 December 12, 2010 at 8:09:53 AM UTC

flag

Report

link

Permalink

OK, let's wait for the new version. Thank you for your explanations.

ludoviko January 17, 2011 January 17, 2011 at 12:37:53 AM UTC

flag

Report

link

Permalink

- How is the programming progressing?
- Is there a solution in sight about the problem of the hidden translations?
- If you are not enough programmers, should we try to find programmers for Tatoeba?

ludoviko December 5, 2010 December 5, 2010 at 11:17:58 PM UTC

flag

Report

link

Permalink

D'ailleurs la « langue-jouet » a l'habitude de dépasser les autres langues. Quand on a publié l'espéranto en 1887, il y avait environ cinq gens qui parlaient cette langue; l'espéranto était donc une des dernières d'environ 7000 langues à ce temps. Aujourd'hui en général on trouve l'espéranto sur une place parmis les premiers 15 à 35 langues, parfois parmis les premiers 50.

Donc l'espéranto a déjà dépassé plus de 6900 langues pendant seulement 123 années. J'ai l'impression qu'il n'y a pas eu une autre langue dans toute l'histoire de l'humanité qui a fait un tel progrès pendant seulement un siècle.

hide replies show replies

aandrusiak December 5, 2010 December 5, 2010 at 11:26:16 PM UTC

flag

Report

link

Permalink

Heureusement, celle langue artificielle ne deviendra jamais une langue nationale d'un pays, au moins si tous les espérantophiles ne s'assemblent et n'achètent une ile pour y vivre et parler leur langue pour la déclarer la langue nationale de leur Espéranto-Paradis.

hide replies show replies

Pharamp December 5, 2010 December 5, 2010 at 11:30:40 PM UTC

flag

Report

link

Permalink

http://en.wikipedia.org/wiki/Re...of_Rose_Island

ludoviko December 5, 2010 December 5, 2010 at 11:42:18 PM UTC

flag

Report

link

Permalink

http://en.wikipedia.org/wiki/Moresnet

Hans07 December 11, 2010 December 11, 2010 at 8:35:46 PM UTC

flag

Report

link

Permalink

Nous n'avons besoin d'une langue nationale en plus. Nous avons besoin d'une langue pour la communication internationale. Cette langue doit ètre plus facile que les langues nationales. Moi j'ai appris l'anglais pendant 8 ans et le resultat n´ etait tres bien. En Europe nous dépensons beaucoup pour traduiser et étudier. Des miliards. L' espéranto est tres facile (10 fois plus facile!) et il est neutre.

hide replies show replies

aandrusiak December 11, 2010 December 11, 2010 at 8:41:07 PM UTC

flag

Report

link

Permalink

Il faut surtout pas precher votre langue facile. Cela prouve une fois de plus le caractère sectaire de ce mouvement.

hide replies show replies

ludoviko December 12, 2010 December 12, 2010 at 1:49:19 AM UTC

flag

Report

link

Permalink

Je suis d'accord qu'il n'est pas toujours une bonne idée de prêcher l'espéranto.

A part ça, il vaut la peine de faire une distinction entre la communauté des gens qui parlent l'espéranto et le mouvement espérantiste - et même dans ceci entre des gens qui proposent l'espéranto d'une manière modéré et d'autres qui le proposent d'une manière presque exagérée qui évoque le comportement d'une secte.

Je sais que le monde serait beaucoup plus facile à comprendre si on savait que tous les habitants du pays A étaient intelligents, ceux du pays B méchants et tous les gens du pays C gentils - mais ce n'est pas la réalité. De même les gens qui parlent l'espéranto ont des charactères assez différents...

Manfredo December 8, 2010 December 8, 2010 at 4:06:45 PM UTC

flag

Report

link

Permalink

Ich denke nicht, dass es eine Schande ist. Aber es zeigt doch, dass man Esperanto nicht unterschätzen soll. :-)

hide replies show replies

aandrusiak December 8, 2010 December 8, 2010 at 5:31:22 PM UTC

flag

Report

link

Permalink

What's unterschätzen?

hide replies show replies

ludoviko December 8, 2010 December 8, 2010 at 6:57:39 PM UTC

flag

Report

link

Permalink

unterschätzen: sous-estimer

ludoviko December 16, 2010 December 16, 2010 at 10:52:33 PM UTC

flag

Report

link

Permalink

En français: Je ne pense pas que ça soit une honte. Mais quand même cela montre qu'il ne faut pas sous-estimer l'espéranto. :-)

Menu

Need some help?

Developers

About