Wall (6,960 threads)
Tips
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
Pfirsichbaeumchen
4 hours ago
morbrorper
9 hours ago
marafon
5 days ago
CK
5 days ago
sharptoothed
11 days ago
Cangarejo
11 days ago
Cangarejo
14 days ago
Thanuir
14 days ago
ondo
15 days ago
ddnktr
15 days ago
* How many sentences should someone write in a language he or she does not really master? *
On http://blog.tatoeba.org/2010/02...n-tatoeba.html we can read:
"However, if you are not translating into your native language (which you can), you are forgiven for not writing native-like sentences. It's a collaborative project after all, and someday (hopefully) a native speaker will come accross your contribution and see if it sounds right to them or not."
I am not happy with encouraging people to write sentences in other languages than their native(s). It is ok, if it is some sentences, but it creates problems, if someone writes lots of sentences in foreign languages he or she does not really master. This often means lots of errors and then others have to use their time to correct them. This is ok, if it is five or ten sentences, but not for dozens.
So I would suggest to change the text in the introduction quoted above - at least to discourage the writing of lots of sentences in a foreign language.
I think there is still enough work left for every native by translating into his own language, by adopting and correcting sentences.
Mi tute samopinias kun vi.
"Mi tute samopinias kun vi."
Kun kiu ? Pri kio ?
How about this:
"However, if you are not translating into your native language (which you can, but it is not encouraged), you are forgiven for not writing native-like sentences. It's a collaborative project after all, and someday (hopefully) a native speaker will come across your contribution and see if it sounds right to them or not."
?
I'm not sure it will change much though, because people who translate a lot into a foreign language and make a lot of mistakes are most likely people who haven't read the contributor's guide =/
I personally don't think the problem comes from translating into a foreign language. It's first of all a problem of skills and of how much people care. Some people translate very poorly even into their own language. That's because they don't have the skills, or they are not putting enough efforts, or they are not rigorous enough. On the other hand, some people contribute excellent translations into languages they've learned for only a few years (perhaps even a few months). That's because they are good at languages, and they care, and they take the time to make sure they provide good contributions.
So nativity is not really the main issue in my opinion. The real issues would be:
1) The notifications. Because being notified of your mistakes is really crucial, and the only way to be notified quickly and efficiently so far is through email. But sometimes users provide a non-working email, or an email they don't check often, or the notification goes in their spam, or the notification system is broken. It can be very frustrating for those who suggested corrections, to feel like the owner doesn't care correcting his mistakes.
2) The "reward". People can see how many sentences there are in each language, people can see how many new sentences are added each day. So there's a "reward" for quantity... but nothing for quality. We need to display the number of orphan sentences left, we need to display the number of sentences corrected each day.
3) The permissions. We need to give users permissions more progressively. For instance, at the beginning, a user would only be able to comment sentences, to point out mistakes and suggest corrections. Once they have successfully suggested corrections for 10 sentences in a certain language, they will be allowed to translate sentences into that language.
The new text contains:
"someday (hopefully) a native speaker will come across your contribution and see if it sounds right to them or not."
I do not feel happy with the idea, that "some day" it will be corrected. I would very much prefer a system of rather quick control. I think Tatoeba shouldn't say about its own project, that there is no problem with wrong sentences for a long time...
Maybe at the beginning, when contributing in a foreign language, people should put something like a native control tag.
On the other hand, when we see one strange sentence, we should control, if there is a series. That is what I did today...
About the rest, I prefer others comment first.
Well, you have quick control when the community around the language is active, like Esperanto :)
But it's impossible to have quick control with languages that have very few contributors (take Uyghur for instance, or even Japanese) and the "someday" applies mostly to those languages.
I can replace "someday" by "sooner or later", if you feel it's more appropriate.
=> "sooner or later a native speaker will come across your contribution"
"sooner or later a native speaker will come across your contribution"
I am sorry, I am not in favour of that. I would much prefer:
"If you write in a foreign language, please make sure a native speaker checks your sentence."
I think we do have a responsibility that wrong sentences stay here only during a very short delay. And every contributor should understand that he or she is responsible as well.
Okay, I understand your point. How about:
"If you are not translating into your native language (which you can), you are forgiven for not writing native-like sentences. But in this case, please make sure you find a native speaker to check your sentences so that your mistake get corrected more quickly."
?
That sounds good to me. Maybe "... so that your possible mistakes get corrected more quickly."
Then the "language exchange project" of saeb (below) is perfectly ok.
When we add sentences we see: "Important! You are about to add a translation to the sentence above..."
Maybe there should be also something like: "When you are adding sentences you are not sure about (because e.g. it is a foreign language for you), please make sure you find a native speaker to check your sentences."
I think the main attraction to tatoeba right now is the potential for learning. If we start stopping users from adding sentences in languages they don't fully master, we'll be taking away an integral part of why some people are here...for example qahwa has added almost 170 sentences in arabic and I've corrected a good portion of them (we've got some sort of lang exchange going)...if ppl like her are not allowed to add sentences, I'm afraid they're just gonna leave (and we really need [japanese] natives!)
but I generally agree...moderators should at least be able to mass-tag a user's sentences in a certain language with NNC
The NNC idea is nice, even if I don't know what it means. Maybe you should use the long form. Probably it is not "North Nottinghamshire College" nor "National Navigation Company" nor "Nearly New Car" nor "National Nutrition Council" nor "National Numismatic Certification"... http://en.wikipedia.org/wiki/NNC and Google :-)
Trying to become serious again: I don't want to discourage people from writing in foreign languages. I just want to discourage early learners from writing more than the others are able and willing to correct. So, please go on with your language exchange - this is just wonderful. If a wrong sentence stays there for a day, no problem. If it's rather months, I don't feel happy.
oh sorry^^' I was referring to the @Needs Native Check tag that's already in use
>If a wrong sentence stays there for a day, no problem. >If it's rather months, I don't feel happy.
The thing is tatoeba has a whole jap-eng corpus(150k sent.) chock full of mistakes, archaic, and flat out unnatural sentences that sometimes are beyond hope.
We've got a massive backlog of @delete/@change/@NNC tags that need to be dealt with...quality has been an issue for years(?) now...IMHO a collective effort of systematic 'white-listing' of sentences by adding OK and a hypothetical 'native-created' tags is the way to go...discouraging learners IMHO is just barking at the issue^^'
this might be out of place but CK's 'crusade'(as a handsome contributor called it ;P) needs more support
I supported it some days ago by writing to the Japanese Esperanto mailing list (in Esperanto) and encouraging them to join the Tatoeba project. It seems they discussed it and added more links to Tatoeba - but I don't understand Japanese... :-(
I'm personally for people only translating between sentences they are proficient in. If people want to integrate this into their language learning by teaming up with others who check their sentences, then that's fine.
The emphasis should, however, be on accurate sentences. I don't think that will drive users away, but I base that mostly on the highly dubious assumption that everyone thinks like me.
I use Tatoeba to look up sentences in a language I'm learning, but contribute in languages that I'm proficient in. Most of my contributions are in Icelandic which I would like to support to have a good presence here, for the benefit of other language learners; despite the fact that it won't benefit me directly.
But I don't think I'm alone in thinking that way. I think there is plenty of empirical data that indicates people are not only willing to contribute, but actively interested in contributing, to projects that benefit a larger group. So, no; I don't think we need to lure people in by allowing them to post half-baked sentences in the belief that someone will catch their mistakes.
PS. It should be mentioned that such a collaborative feedback service already exists: lang-8.com. Furthermore, it would be easy to set up a second site through the API with a dedicated learning environment centred around translating sentences for Tatoeba.
A question to moderators: What should we do with punctuation variation?
In some languages, such as Russian, punctuation can be used to transmit subtle differences in meaning, mainly not of the phrase itself, but of the intention of the speaker. It can be that the meaning is sufficiently different to warrant adding multiple variants with subtly different translations.
A harder question is about Esperanto, where punctuation is intentionally underspecified, and so speakers of the different language backgrounds tend to use punctuation differently, so there are cases where difference in punctuation comes not from the difference in the intended meaning, but from the difference in speaker background.
What should we do about it? Leave the variant that came up first? Argue about "more desirable" punctuation (can be hard to find a compromise)? Add almost duplicate frazes, which differ only in punctuation?
These are hard questions, indeed. I'm in favour of leaving it up to the contributor. Should one opt for a single sentence, the alternative variants can be mentioned in comments. If one adds multiple sentences, these can be linked to in the comments (and later by qualified links when we get the new database).
I've done both and set up tags to identify sentences with that information in the comments. See tags that start with "cmt…" and "lnk…".
[not needed anymore- removed by CK]
I see the rationale for a rating system, but don't think it would solve the perceived problem.
Consider a language that is spoken in an area large enough to permit different variations to have formed (which happens to be the case in every language I know). Knowing which variant is more common isn't going to help the people speaking accurately in areas where less common, but equally valid, variants are used.
Furthermore, the collection of Tatoeba members can hardly be used as a representative sample of usage.
I'm therefore in favour of a simple OK/not-OK system where OK sentences can be further qualified with tags (e.g. local variants, archaic/colloquial/slang/etc.).
I've discussed with sysko the idea of notes on sentences (separate from comments) and shared pages where one can link to from multiple sentences. These could be used to further describe usage. This is a lot more work to set up, but a single-dimensional rating system would imply more information than it can convey.
As a similar issue I think that a usefulness rating would be helpful. I'm particularly thinking about some of the Tanaka Corpus sentences.
For example
"It is an apple" is a perfectly good English sentence but says little about apples.
"Eve bit into the apple with a crunch as the juice ran down her face." contains more potential information about apples but might not be the best sort of example in a dictionary.
I think it's worth exploring the usefulness of a given example to second language learners through a rating system as most of Tatoeba's sentences are destined for dictionary or learning resources.
(apologies if I've missed other threads about this, but I find topics on the Wall by accident. I'd like to bang the drum for a forum again :)
For duplicates, I'm going to test an other solution to handle this in a semi-automatic way
I will replicate the tatoeba database on my personnal computer
I will run on it a slower but safer script (that I couldn't run here without slowing down tatoeba.org during some hours) that will output all the modification that need to be done on the database
I will run this output script on tatoeba.org
so this way it should be ok.
Some notable figures from the past few days:
- Esperanto overtook French in number of sentences. It is more than a third of the way to the top spot.
- Russian reached 20,000 sentences ^^
- Dutch overtook Arabic again; Hungarian and Hebrew are both coming up fast.
- Swedish finally broke out of a slump and reached 1,000 sentences, and Persian is very close to that mark.
Thanks for keeping track of this! ^^
L'italien ausshi a dépassé 10000 phrases mais pershonne ne le voit :P
J'ai vu ! Mais je n'ai pas rendu notice au mur. Mais je penshais que quelqu'un l'avait encore fait ^^
Même que l'espagnol avait dépassé le polonais. ^^
Je prie les francophones d'accélérer leurs efforts, parce que sinon, les espérantophiles vont vous dépasser. Ca sera une vraie honte de voir une langue-jouet dépassant le français.
Ich denke nicht, dass es eine Schande ist. Aber es zeigt doch, dass man Esperanto nicht unterschätzen soll. :-)
What's unterschätzen?
unterschätzen: sous-estimer
En français: Je ne pense pas que ça soit une honte. Mais quand même cela montre qu'il ne faut pas sous-estimer l'espéranto. :-)
je vais d'abord retirer les doublons en ésperanto et après on en reparle :p
If you'd like to remove duplicates in esperanto, I prepared a quick list of the phrases which are exact duplicates.
Please see http://www.is.titech.ac.jp/~zakirov8/epo-dup.html .
There is also a link to a list in tsv format.
The only issue I see is that many of the duplicate phrases already have lots of translations, so what we need is not only deleting of duplicates, but also relinking pf the translations.
It is nice to remove duplicates - but wouldn't it be even nicer to think, why they are created?
Usually I think people just don't see that there is already an Esperanto translation, because it is an indirect translation of second or more degree - so they go ahead and translate.
If it is possible to create a script for duplicate sentences, wouldn't it be possible to create something to show every translation already in the translation chain? This would reduce the work for eliminating duplicates to nearly nothing...
unfortunately as discussed before, the reason we can't show the whole translation graph is because normal database system are really bad to make this kind of operation. So the best we can do with all the possible optimization is a 2degree depth chain, with the current system.
In theory it would be possible, but it would be slow as hell.
That's the reason why we've started to build our own database server for our specific need, to permit this.
So in the future it will be possible
http://static.tatoeba.org/425123.html (it's a page shot of the version I have on my computer, don't pay attention to how ugly it is) as you can see there we view every translations, whatever the degree of depth.
And anyway our database will be able to detect duplicate on the fly.:)
* Total number of sentences linked *
How about indicating the (approximate) total number of sentences linked? This could be calculated once a day/week/month and, maybe, would be of some help. So on http://tatoeba.org/eng/sentences/show/93453 we would see "+2 hidden translations" (below) or "(There is a) total of 4 translations" (above). (As shows http://tatoeba.org/eng/sentences/show/333724 )
Everyone who wants to translate would know there are already some hidden translations - so, be careful, look them up before risking to add a duplicate which will be deleted later anyway.
* Identification of translation chain and language *
How about assigning a second identification to every sentence which denotes the translation chain (graph) and the language? So in the example http://tatoeba.org/eng/sentences/show/93453 the first sentence, the Japanese one, would get the identification 93453-jpn, the second, the English one, would get 93453-eng, the French one 93453-fra, the Chinese one 93453-cmn and the last, the Esperanto one 93453-epo. A second Esperanto translation would get 93453-epo2.
If then, before translating, in a first step, everyone had to inform the system about the planned target language, the system could show, if a translation already exists (or two translations...) and show it or them. If the second identification would already be assigned, this databank procedure would not last long.
Perhaps it would take a bit of work to assign these second identifications - but it would more or less eliminate the problem with duplicates.
Somehow this procedure would mean doing the time consuming search procedure for the complete translation graph in the database once and later just taking the stored result.
this system would be a hell to maintain
1 - computers are fast to deal with numbers, but become slow when it comes to deal with characters
2 - it's easy to done if it was all about tree, but unfortunately we're dealing with graph, so your proposition bring the following problems
* we will need to update it when we delete a sentence
* the same when we mix to graph, by adding a link
and moreover it will still doesn't solve the problem which is traversing the graph, as you will still need to traverse it to discover there is already a epo2 and so
to be honnest before you propose other solution
we're thinking about it for one year, and there's no simple solution to this problem with the current architecture, and as we're few developpers, I prefer to focus my free time on the new version rather than trying to find and develop a new one, which will only increase the time before we get this new version which will solve in a smart way these problems
OK, let's wait for the new version. Thank you for your explanations.
- How is the programming progressing?
- Is there a solution in sight about the problem of the hidden translations?
- If you are not enough programmers, should we try to find programmers for Tatoeba?
This looks nice. Maybe it could be used only by those who want to translate, if it is slow.
in fact what i've shown has been made with the new version
it's a hell to code with normal database, and unfortunately we only have one server, so if the server take 10 seconds to generate my page, during this 10 seconds people who don't care will still also need to wait 10 seconds.
so by a collateral effect it will affect not only the performance of those who wants.
The famous collateral effect :-|
OK, I see. So we shall wait for the new database.
And in the meantime, maybe we could spread the enthusiasm about putting more translation links. They help a bit.
so yep it possible, and the script was easier to do, and was done as a temporary solution, waiting we finish this new version.
y en a?
3000 ^^
Ah bon, ça me donne de l'espoir ;)
C'est de l'espoir pour quatre jours, puisque l'espéranto a déjà 1600 phrases en plus que le français - et actuellement on ajoute environ 400 phrases en espéranto par jour.
Dommage qu'une langue artificielle soit sur le projet plus répandue qu'une lange vivante (alors que c'est loin d'etre le cas dans le monde réel). Cela en fait met en question le sérieux du projet Tatoeba.
Regardons les faits d'abord: En Hongrie l'espéranto a la dix-huitième place parmi les langues maitrisées, http://www.nepszamlalas.hu/eng/...ad01_13_0.html . Actuellement il y a plus de 135.000 articles dans la wikipedia en espéranto http://eo.wikipedia.org ce qui fait la vingt-troisième place en comparaison avec les autres versions, http://stats.wikimedia.org/EO/Sitemap.htm . Les chinois donnent des informations au monde en une dixaine de langues dont l'espéranto, http://esperanto.china.org.cn . Donc il y a pas mal de langues nationales qui se trouvent derrière l'espéranto...
Comme en général les gens qui parlent l'espéranto parlent aussi beaucoup d'autres langues (probablement ils sont plus polyglottes que les gens des autres communautés linguistiques), il est normal qu'ils s'intéressent à un projet comme Tatoeba. Je ne vois pas de désavantage pour le projet.
Ce qui me gène avec l'espéranto est qu'il n'est représentatif que d'un nombre restreint de langues (langues romanes, germaniques, slaves, grec et isolats, et langues agglutinantes pour la structure) tant sur le substrat que sur la morphologie. Or à chaque langue sont liés des mécanismes cognitifs particuliers (façon de se repérer dans l'espace...).
Sur les 3000 à 7000 langues que l'on recense actuellement l'esperanto en représenterait on va dire une centaine ? Et avec ca on voudrait le promouvoir comme langue universelle ? C'est vraiment faire très peu cas des 95 % de langues existantes.
Moi aussi j'aimerais avoir une langue basée sur plus de langues. Mais il semble qu'avec chaque langue ajoutée il devient plus difficile de l'apprendre, pour tout le monde.
L'espéranto est loin d'être une solution idéale - seulement la meilleure connue (ou une des meilleures parmi les langues construites). L'espéranto est beaucoup plus proche à pas mal de langues que, par exemple, l'anglais, le francais ou l'allemand. Donc de ce point de vue, il est préférable comme candidat de langue universelle aux langues nationales.
Et il est bien clair qu'on peut apprendre l'espéranto dans un tiers du temps nécessaire pour le même niveau dans une langue nationale. Donc avec le même temps on parle l'espéranto beaucoup mieux. C'est pourquoi qu'il y a pas mal de chinois, japonais ou vietnamiens etc. qui apprennent l'espéranto.
Tu plaisantes ou tu es sérieux, aandrusiak?
D'ailleurs la « langue-jouet » a l'habitude de dépasser les autres langues. Quand on a publié l'espéranto en 1887, il y avait environ cinq gens qui parlaient cette langue; l'espéranto était donc une des dernières d'environ 7000 langues à ce temps. Aujourd'hui en général on trouve l'espéranto sur une place parmis les premiers 15 à 35 langues, parfois parmis les premiers 50.
Donc l'espéranto a déjà dépassé plus de 6900 langues pendant seulement 123 années. J'ai l'impression qu'il n'y a pas eu une autre langue dans toute l'histoire de l'humanité qui a fait un tel progrès pendant seulement un siècle.
Heureusement, celle langue artificielle ne deviendra jamais une langue nationale d'un pays, au moins si tous les espérantophiles ne s'assemblent et n'achètent une ile pour y vivre et parler leur langue pour la déclarer la langue nationale de leur Espéranto-Paradis.
Nous n'avons besoin d'une langue nationale en plus. Nous avons besoin d'une langue pour la communication internationale. Cette langue doit ètre plus facile que les langues nationales. Moi j'ai appris l'anglais pendant 8 ans et le resultat n´ etait tres bien. En Europe nous dépensons beaucoup pour traduiser et étudier. Des miliards. L' espéranto est tres facile (10 fois plus facile!) et il est neutre.
Il faut surtout pas precher votre langue facile. Cela prouve une fois de plus le caractère sectaire de ce mouvement.
Je suis d'accord qu'il n'est pas toujours une bonne idée de prêcher l'espéranto.
A part ça, il vaut la peine de faire une distinction entre la communauté des gens qui parlent l'espéranto et le mouvement espérantiste - et même dans ceci entre des gens qui proposent l'espéranto d'une manière modéré et d'autres qui le proposent d'une manière presque exagérée qui évoque le comportement d'une secte.
Je sais que le monde serait beaucoup plus facile à comprendre si on savait que tous les habitants du pays A étaient intelligents, ceux du pays B méchants et tous les gens du pays C gentils - mais ce n'est pas la réalité. De même les gens qui parlent l'espéranto ont des charactères assez différents...
ДО СВИаДНИЯ
Bug?
Home page http://tatoeba.org/eng/home
More latest comments (show more...) http://tatoeba.org/eng/sentence_comments/index
Filter by language http://tatoeba.org/eng/sentence_comments/index/hun
(2) second page on the top link http://tatoeba.org/eng/sentence...dex/hun/page:2
Press End key/Go down, (3) third page or any
http://tatoeba.org/eng/sentence...s/index/page:3
...Language filter now missing. The bottom links are not updated according to the language filter.
Sry if it is already posted, or the Wall is not the best place to submit this.
It's a bug. Thanks for reporting :) It will be fixed soon.
[not needed anymore- removed by CK]
I have a Mac, and it seems to work well. But I'm really not sure if the rendering is 100% correct as I can't read the script.
Anyway CK, you need a font!
http://sites.google.com/site/macmalayalam/
http://www.prokerala.com/malayalam/
[not needed anymore- removed by CK]
I think it's just because you don't have the right font, because on computers (I don't know exactly on Mac, but on linux/windows/etc. this is the case) the behaviour with caracters rendering is the following
1 try to display the character with the font specified by the software
2 if the font is not present or the character can't be render by this font, then there's a set of rules to use some fallback fonts
3 if no font can render this caracter then display a box
so even if the css was using a font which has no Malayalam characters, your OS would have used an other which has.
Maybe this table replies you (sorry if I didn't get completely the meaning of your question... :)
http://en.wikipedia.org/wiki/He...isting_support
In what way are they not displaying correctly? The language appears to be Malayalam, which uses its own script. On my Android phone, it's all boxes. On my XP PC, I see the letters, but can't be sure if they are connected correctly without learning more. I don't have a Mac so I can't directly answer your question.
it's unicode encoding, maybe you don't have the right fonts for malayalam ?
They display fine for me. (Firefox, Windows 7)
Tag auto-completion script turned off?
It looks like the auto-completion script when entering tags on sentences seems to have gone. I'd actually got used to it as well.