Wall (7,146 threads)
Tips
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
frpzzd
7 days ago
EugeneGS
7 days ago
frpzzd
7 days ago
EugeneGS
8 days ago
frpzzd
8 days ago
gillux
8 days ago
frpzzd
10 days ago
sharptoothed
11 days ago
marafon
12 days ago
Pfirsichbaeumchen
12 days ago

Some notable figures from the past few days:
- Esperanto overtook French in number of sentences. It is more than a third of the way to the top spot.
- Russian reached 20,000 sentences ^^
- Dutch overtook Arabic again; Hungarian and Hebrew are both coming up fast.
- Swedish finally broke out of a slump and reached 1,000 sentences, and Persian is very close to that mark.

Thanks for keeping track of this! ^^

L'italien ausshi a dépassé 10000 phrases mais pershonne ne le voit :P

J'ai vu ! Mais je n'ai pas rendu notice au mur. Mais je penshais que quelqu'un l'avait encore fait ^^
Même que l'espagnol avait dépassé le polonais. ^^

Je prie les francophones d'accélérer leurs efforts, parce que sinon, les espérantophiles vont vous dépasser. Ca sera une vraie honte de voir une langue-jouet dépassant le français.

Ich denke nicht, dass es eine Schande ist. Aber es zeigt doch, dass man Esperanto nicht unterschätzen soll. :-)

What's unterschätzen?

unterschätzen: sous-estimer

En français: Je ne pense pas que ça soit une honte. Mais quand même cela montre qu'il ne faut pas sous-estimer l'espéranto. :-)

je vais d'abord retirer les doublons en ésperanto et après on en reparle :p

If you'd like to remove duplicates in esperanto, I prepared a quick list of the phrases which are exact duplicates.
Please see http://www.is.titech.ac.jp/~zakirov8/epo-dup.html .
There is also a link to a list in tsv format.
The only issue I see is that many of the duplicate phrases already have lots of translations, so what we need is not only deleting of duplicates, but also relinking pf the translations.

It is nice to remove duplicates - but wouldn't it be even nicer to think, why they are created?
Usually I think people just don't see that there is already an Esperanto translation, because it is an indirect translation of second or more degree - so they go ahead and translate.
If it is possible to create a script for duplicate sentences, wouldn't it be possible to create something to show every translation already in the translation chain? This would reduce the work for eliminating duplicates to nearly nothing...

unfortunately as discussed before, the reason we can't show the whole translation graph is because normal database system are really bad to make this kind of operation. So the best we can do with all the possible optimization is a 2degree depth chain, with the current system.
In theory it would be possible, but it would be slow as hell.
That's the reason why we've started to build our own database server for our specific need, to permit this.
So in the future it will be possible
http://static.tatoeba.org/425123.html (it's a page shot of the version I have on my computer, don't pay attention to how ugly it is) as you can see there we view every translations, whatever the degree of depth.
And anyway our database will be able to detect duplicate on the fly.:)

* Total number of sentences linked *
How about indicating the (approximate) total number of sentences linked? This could be calculated once a day/week/month and, maybe, would be of some help. So on http://tatoeba.org/eng/sentences/show/93453 we would see "+2 hidden translations" (below) or "(There is a) total of 4 translations" (above). (As shows http://tatoeba.org/eng/sentences/show/333724 )
Everyone who wants to translate would know there are already some hidden translations - so, be careful, look them up before risking to add a duplicate which will be deleted later anyway.

* Identification of translation chain and language *
How about assigning a second identification to every sentence which denotes the translation chain (graph) and the language? So in the example http://tatoeba.org/eng/sentences/show/93453 the first sentence, the Japanese one, would get the identification 93453-jpn, the second, the English one, would get 93453-eng, the French one 93453-fra, the Chinese one 93453-cmn and the last, the Esperanto one 93453-epo. A second Esperanto translation would get 93453-epo2.
If then, before translating, in a first step, everyone had to inform the system about the planned target language, the system could show, if a translation already exists (or two translations...) and show it or them. If the second identification would already be assigned, this databank procedure would not last long.
Perhaps it would take a bit of work to assign these second identifications - but it would more or less eliminate the problem with duplicates.
Somehow this procedure would mean doing the time consuming search procedure for the complete translation graph in the database once and later just taking the stored result.

this system would be a hell to maintain
1 - computers are fast to deal with numbers, but become slow when it comes to deal with characters
2 - it's easy to done if it was all about tree, but unfortunately we're dealing with graph, so your proposition bring the following problems
* we will need to update it when we delete a sentence
* the same when we mix to graph, by adding a link
and moreover it will still doesn't solve the problem which is traversing the graph, as you will still need to traverse it to discover there is already a epo2 and so

to be honnest before you propose other solution
we're thinking about it for one year, and there's no simple solution to this problem with the current architecture, and as we're few developpers, I prefer to focus my free time on the new version rather than trying to find and develop a new one, which will only increase the time before we get this new version which will solve in a smart way these problems

OK, let's wait for the new version. Thank you for your explanations.

- How is the programming progressing?
- Is there a solution in sight about the problem of the hidden translations?
- If you are not enough programmers, should we try to find programmers for Tatoeba?

This looks nice. Maybe it could be used only by those who want to translate, if it is slow.

in fact what i've shown has been made with the new version
it's a hell to code with normal database, and unfortunately we only have one server, so if the server take 10 seconds to generate my page, during this 10 seconds people who don't care will still also need to wait 10 seconds.

so by a collateral effect it will affect not only the performance of those who wants.

The famous collateral effect :-|
OK, I see. So we shall wait for the new database.
And in the meantime, maybe we could spread the enthusiasm about putting more translation links. They help a bit.

so yep it possible, and the script was easier to do, and was done as a temporary solution, waiting we finish this new version.

y en a?

3000 ^^

Ah bon, ça me donne de l'espoir ;)

C'est de l'espoir pour quatre jours, puisque l'espéranto a déjà 1600 phrases en plus que le français - et actuellement on ajoute environ 400 phrases en espéranto par jour.

Dommage qu'une langue artificielle soit sur le projet plus répandue qu'une lange vivante (alors que c'est loin d'etre le cas dans le monde réel). Cela en fait met en question le sérieux du projet Tatoeba.

Regardons les faits d'abord: En Hongrie l'espéranto a la dix-huitième place parmi les langues maitrisées, http://www.nepszamlalas.hu/eng/...ad01_13_0.html . Actuellement il y a plus de 135.000 articles dans la wikipedia en espéranto http://eo.wikipedia.org ce qui fait la vingt-troisième place en comparaison avec les autres versions, http://stats.wikimedia.org/EO/Sitemap.htm . Les chinois donnent des informations au monde en une dixaine de langues dont l'espéranto, http://esperanto.china.org.cn . Donc il y a pas mal de langues nationales qui se trouvent derrière l'espéranto...
Comme en général les gens qui parlent l'espéranto parlent aussi beaucoup d'autres langues (probablement ils sont plus polyglottes que les gens des autres communautés linguistiques), il est normal qu'ils s'intéressent à un projet comme Tatoeba. Je ne vois pas de désavantage pour le projet.

Ce qui me gène avec l'espéranto est qu'il n'est représentatif que d'un nombre restreint de langues (langues romanes, germaniques, slaves, grec et isolats, et langues agglutinantes pour la structure) tant sur le substrat que sur la morphologie. Or à chaque langue sont liés des mécanismes cognitifs particuliers (façon de se repérer dans l'espace...).
Sur les 3000 à 7000 langues que l'on recense actuellement l'esperanto en représenterait on va dire une centaine ? Et avec ca on voudrait le promouvoir comme langue universelle ? C'est vraiment faire très peu cas des 95 % de langues existantes.

Moi aussi j'aimerais avoir une langue basée sur plus de langues. Mais il semble qu'avec chaque langue ajoutée il devient plus difficile de l'apprendre, pour tout le monde.
L'espéranto est loin d'être une solution idéale - seulement la meilleure connue (ou une des meilleures parmi les langues construites). L'espéranto est beaucoup plus proche à pas mal de langues que, par exemple, l'anglais, le francais ou l'allemand. Donc de ce point de vue, il est préférable comme candidat de langue universelle aux langues nationales.
Et il est bien clair qu'on peut apprendre l'espéranto dans un tiers du temps nécessaire pour le même niveau dans une langue nationale. Donc avec le même temps on parle l'espéranto beaucoup mieux. C'est pourquoi qu'il y a pas mal de chinois, japonais ou vietnamiens etc. qui apprennent l'espéranto.

Tu plaisantes ou tu es sérieux, aandrusiak?

D'ailleurs la « langue-jouet » a l'habitude de dépasser les autres langues. Quand on a publié l'espéranto en 1887, il y avait environ cinq gens qui parlaient cette langue; l'espéranto était donc une des dernières d'environ 7000 langues à ce temps. Aujourd'hui en général on trouve l'espéranto sur une place parmis les premiers 15 à 35 langues, parfois parmis les premiers 50.
Donc l'espéranto a déjà dépassé plus de 6900 langues pendant seulement 123 années. J'ai l'impression qu'il n'y a pas eu une autre langue dans toute l'histoire de l'humanité qui a fait un tel progrès pendant seulement un siècle.

Heureusement, celle langue artificielle ne deviendra jamais une langue nationale d'un pays, au moins si tous les espérantophiles ne s'assemblent et n'achètent une ile pour y vivre et parler leur langue pour la déclarer la langue nationale de leur Espéranto-Paradis.

Nous n'avons besoin d'une langue nationale en plus. Nous avons besoin d'une langue pour la communication internationale. Cette langue doit ètre plus facile que les langues nationales. Moi j'ai appris l'anglais pendant 8 ans et le resultat n´ etait tres bien. En Europe nous dépensons beaucoup pour traduiser et étudier. Des miliards. L' espéranto est tres facile (10 fois plus facile!) et il est neutre.

Il faut surtout pas precher votre langue facile. Cela prouve une fois de plus le caractère sectaire de ce mouvement.

Je suis d'accord qu'il n'est pas toujours une bonne idée de prêcher l'espéranto.
A part ça, il vaut la peine de faire une distinction entre la communauté des gens qui parlent l'espéranto et le mouvement espérantiste - et même dans ceci entre des gens qui proposent l'espéranto d'une manière modéré et d'autres qui le proposent d'une manière presque exagérée qui évoque le comportement d'une secte.
Je sais que le monde serait beaucoup plus facile à comprendre si on savait que tous les habitants du pays A étaient intelligents, ceux du pays B méchants et tous les gens du pays C gentils - mais ce n'est pas la réalité. De même les gens qui parlent l'espéranto ont des charactères assez différents...



ДО СВИаДНИЯ

Bug?
Home page http://tatoeba.org/eng/home
More latest comments (show more...) http://tatoeba.org/eng/sentence_comments/index
Filter by language http://tatoeba.org/eng/sentence_comments/index/hun
(2) second page on the top link http://tatoeba.org/eng/sentence...dex/hun/page:2
Press End key/Go down, (3) third page or any
http://tatoeba.org/eng/sentence...s/index/page:3
...Language filter now missing. The bottom links are not updated according to the language filter.
Sry if it is already posted, or the Wall is not the best place to submit this.

It's a bug. Thanks for reporting :) It will be fixed soon.

[not needed anymore- removed by CK]

I have a Mac, and it seems to work well. But I'm really not sure if the rendering is 100% correct as I can't read the script.
Anyway CK, you need a font!
http://sites.google.com/site/macmalayalam/
http://www.prokerala.com/malayalam/

[not needed anymore- removed by CK]

I think it's just because you don't have the right font, because on computers (I don't know exactly on Mac, but on linux/windows/etc. this is the case) the behaviour with caracters rendering is the following
1 try to display the character with the font specified by the software
2 if the font is not present or the character can't be render by this font, then there's a set of rules to use some fallback fonts
3 if no font can render this caracter then display a box
so even if the css was using a font which has no Malayalam characters, your OS would have used an other which has.

Maybe this table replies you (sorry if I didn't get completely the meaning of your question... :)
http://en.wikipedia.org/wiki/He...isting_support

In what way are they not displaying correctly? The language appears to be Malayalam, which uses its own script. On my Android phone, it's all boxes. On my XP PC, I see the letters, but can't be sure if they are connected correctly without learning more. I don't have a Mac so I can't directly answer your question.

it's unicode encoding, maybe you don't have the right fonts for malayalam ?

They display fine for me. (Firefox, Windows 7)

Tag auto-completion script turned off?
It looks like the auto-completion script when entering tags on sentences seems to have gone. I'd actually got used to it as well.

what about now ?

Looks like it's working now.

Kiel skribi chapelitajn literojn chi-tie? Mi provis kun sx, sed tio restis sen transigo en la ghustan chapelitan literon en Esperanto. hans

Ekzistas multaj programoj kaj helpiloj. Iom ĉi tie:
http://esperantilo.org/
http://members.aon.at/aldone/konvertileto.html
https://addons.mozilla.org/de/firefox/addon/3684/
https://addons.mozilla.org/de/firefox/addon/4016/
http://de.wikipedia.org/wiki/Es...g#Das_X-System
http://www.akueck.de/eoskribo.htm
http://www.apple.com/downloads/...ardlayout.html
http://www.esperanto.mv.ru/Ek/
Mi esperas ke tiu helpas vin.
Se vi uzas Linukso (kiel ekzemple Ubunto) vi ne havus tiel problemojn. ;)

Kaj transliterator por firefox:
https://addons.mozilla.org/de/firefox/addon/883/
Por multaj lingvoj i.a. Esperanto

[not needed anymore- removed by CK]

Mi volas aldoni, ke tradukado de proverboj estas malsama ol tradukado de simplaj frazoj, kiun ni plejparte faras ĉi tie, do estas interesa (eĉ se iom malfacila) ekzerco pri proverboj.
Bonvolu ankaŭ uzi la liston "Proverbaro Esperanta" por trovi tradukendaĵojn:
http://tatoeba.org/epo/sentences_lists/edit/153

I've just added 1000 new mp3 for Dutch sentences recorded by a kind user, Ramses (his nickname here)
I really want to thank him to provide us so many new recordings
so I think he deserve a little ads for his website,
http://www.spanish-only.com/
http://tatoeba.org/eng/sentence...nly-with-audio

Just and idea:
Tatoeba could integrate a spaced repetition system so that you could study the sentences. You can already do this by exporting a list to Anki, but it's clumsy and doesn't detect changes to the sentences. A built-in system would be able to catch updates and keep your progress associated with your Tatoeba account.
It would be nice if you could add either a sentence or a link to the list to be studied. This way, when you are starting (or for really hard sentences), you could have a translation, but eventually, you would be able to move away from that, which will help you comprehend the sentence more like a native speaker would.
Although I have little experience with PHP, I would be interested in helping code such a system. :-)

For this, I think in a while our project will start to be enough important for that we can directly go ask the guys from anki, to see if it's possible to make something together, as anyway they're also a free project so I think that wouldn't be hard to convince them.
As swift with the API, we can perfeclty imagine an anki plugin which do it this way
* add a special tag [tatoeba:42] , and the plugin will replace it and display the sentence number 42 calling the API (and using a cache system for offline use), so this way you will always have the last up to date version of the sentence, which will be easier and smarter to code than reinventing the wheel
For the export system, it's for creating deck, and even if it's clumsy at least it exists, and which permit you to not have your data "jailed" into tatoeba.

Actually, I think that Anki even has an on-line interface to stacks that you have synced with the server. It would probably be less work to write something for Anki to interface with Tateoba, than writing a new SRS on Tatoeba. Reinventing wheels and all that.

yep they have, that was part of my thinking, once we have decide on a special tag for tatoeba (as they've done [sound: pathforthesound] for playing mp3 and) it should be not too hard to plug it into the api (and as the api server we've made is open source they can even run once and there's a script to sync it once a week with our database, waiting for realtime sync and an official api server)

How is the API structured? What kinds of queries can you make? And how long until it is up and running? :-)

With the new database's API, anyone can use information in the Tatoeba corpus this way. No idea when it might be on line, though.