Wall - Tatoeba

Wall (7,122 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages

subdirectory_arrow_right

sharptoothed

5 days ago

subdirectory_arrow_right

sharptoothed

5 days ago

subdirectory_arrow_right

TATAR1

5 days ago

subdirectory_arrow_right

AlanF_US

6 days ago

feedback

sharptoothed

7 days ago

subdirectory_arrow_right

Shanaz

10 days ago

subdirectory_arrow_right

Qaztat

10 days ago

subdirectory_arrow_right

TATAR1

10 days ago

feedback

Tartar

10 days ago

subdirectory_arrow_right

menaud

13 days ago

sweetanavaleria15 October 18, 2011 October 18, 2011 at 4:48:43 AM UTC

link

Permalink

can i learn the writng and pronuncation of every language?

hide replies show replies

sacredceltic October 18, 2011 October 18, 2011 at 9:21:17 AM UTC

link

Permalink

and even the writi*ng

jono1001 October 18, 2011 October 18, 2011 at 8:31:52 AM UTC

link

Permalink

I have some new Blog entries have a look --> http://chinesemandarinlearner.blogspot.com/

bober October 16, 2011 October 16, 2011 at 2:41:15 PM UTC

link

Permalink

Doesn anybody know what software is used to turn Chinese sentences into pinyin? Why isn't pinyin and furigana available to download together with Chinese and Japanese sentences?

I'm working on a Chinese and Japanese online dictionary and this would be extremely useful. Thanks.

hide replies show replies

sysko October 16, 2011 October 16, 2011 at 6:13:29 PM UTC

link

Permalink

As said in one of the topic below, for the pinyin, it's a software I've made myself for the need of Tatoeba, it's independant from Tatoeba itself and so can be used separately. Though I plan to make it opensource, I still didn't find time to put it on github or any public face, and anyway the code has for the moment no documentation at all.

Romanization in general (not only for Chinese and Japanese, but also Shanghainese, Cantonese etc.) are not available to download because they are generated on the fly (I know that's not "efficient", but it's due to some legacy code)

In the future it will be possible to have it along with the sentence itself, but not yet.

hide replies show replies

bober October 16, 2011 October 16, 2011 at 10:45:47 PM UTC

link

Permalink

Ok, I'm gonna write it myself. I forgot that's why I started working on that dictionary, to solve this kind of problems. Thanks anyway.

Great job with Tatoeba, I'm glad the Tanaka Corpus gets updated. I link to your pages within every example sentence on Tangorin. Keep up the good work.

hide replies show replies

sysko October 17, 2011 October 17, 2011 at 12:48:28 PM UTC

link

Permalink

actually if you feel like compiling it yourself, I can mail you an archive with the source of software i've coded, it would be a pity to reinvent the wheel especially as in the end i will make it opensource too, it's simply I have a really hard time these days to find free time (and as tatoeba brings me no money, I unfortunately can manage it only in my freetime)

hide replies show replies

bober October 17, 2011 October 17, 2011 at 1:08:08 PM UTC

link

Permalink

That would be awesome. bober@tangorin.com

Can you tell me if you're script takes under consideration the changes in tones that depend on word's position in a sentence? Or is it just pulling readings from CEDICT and replacing words?

I know how it feels working on something that doesn't contribute to your income. Although I have one small ad on Tangorin it just recently started bringing enough revenue to pay for hosting.

Nero October 16, 2011 October 16, 2011 at 3:46:34 AM UTC

link

Permalink

What is the rule here for using macrons in Latin?

hide replies show replies

cntrational October 16, 2011 October 16, 2011 at 7:25:52 AM UTC

link

Permalink

I think it would be alright to use diacritical marks, if you wanted to.

hide replies show replies

Nero October 16, 2011 October 16, 2011 at 8:40:46 AM UTC

link

Permalink

I'm going to. It will be beneficial to the corpus I believe.

jakov October 18, 2011 October 18, 2011 at 2:04:32 PM UTC

link

Permalink

I've never learned it like that at school in austria, but i feel that it would be okay if it's easily removable. I guess for the latin its only adding macrons on top of the normal letters, so i guess a "remove all macrons" script should be easily feasable.

alexmarcelo October 16, 2011 October 16, 2011 at 4:39:40 AM UTC

link

Permalink

I don't use.

http://tatoeba.org/eng/sentence...lexmarcelo/lat

Diacritical marks don't exist in Latin, they were introduced to ease pronunciation. I don't really recommend.

hide replies show replies

Nero October 16, 2011 October 16, 2011 at 4:45:15 AM UTC

link

Permalink

That's why I think they would be helpful here, because this site is intended for beginners.

hide replies show replies

CK October 16, 2011, edited October 30, 2019 October 16, 2011 at 11:58:36 AM UTC, edited October 30, 2019 at 5:26:57 AM UTC

link

Permalink

[not needed anymore- removed by CK]

hide replies show replies

Nero October 16, 2011 October 16, 2011 at 5:12:12 PM UTC

link

Permalink

Yes you are right. But we are constructing a multi-lingual sentence dictionary, and Latin dictionaries contain these macrons.

However, I did realise a problem with bloating of the corpus that could exist.

If I add "Salvē." and someone else adds, "Salve.", we have two duplicates that won't be automatically joined.

hide replies show replies

Vortarulo October 16, 2011 October 16, 2011 at 5:56:47 PM UTC

link

Permalink

> Yes you are right. But we are constructing a multi-lingual sentence dictionary, and Latin dictionaries contain these macrons.

Same goes for other languages like Russian, where dictionaries usually mark the accent (because it's quite important for pronunciation there), but no one writes it that way.
And this could be said about Arabic too, where at least many dictionaries indicate the vowels. Dictionaries of Serbocroatian and Baltic dictionaries also often contain the tonal accents, but no one ever writes them out in texts. I think these are comparable cases.

Also: I'd prefer to see macrons in Latin too, but that makes it a little difficult for some people to add; plus, many people who do know Latin ignore the lenghts altogether and for many words wouldn't know where they have long or short vowels if not in the "important" endings.

hide replies show replies

Nero October 16, 2011 October 16, 2011 at 6:12:53 PM UTC

link

Permalink

Yes, it would be nice if there were some automatic system like for Chinese.

alexmarcelo October 16, 2011 October 16, 2011 at 5:05:40 AM UTC

link

Permalink

Hmm... so how about Arabic? Beginners can't read without the short vowels. We would have to include them, too.

بِنْت = بنت

That wouldn't be easy...

hide replies show replies

Nero October 16, 2011 October 16, 2011 at 5:09:02 AM UTC

link

Permalink

We could do it in Latin and not in Arabic.

hide replies show replies

alexmarcelo October 16, 2011 October 16, 2011 at 5:19:58 AM UTC

link

Permalink

I think it should be optional. The reason? Many people know Latin, but cannot pronounce it, so these people wouldn't be able to participate.

hide replies show replies

Nero October 16, 2011 October 16, 2011 at 5:22:30 AM UTC

link

Permalink

I agree. Although these people should try it and they could learn how to pronounce it.

hide replies show replies

alexmarcelo October 16, 2011 October 16, 2011 at 5:29:30 AM UTC

link

Permalink

Is everyone allowed to record sentences in
- Latin?
- Esperanto?
- Toki Pona?

It would be interesting to have audio for these languages.

hide replies show replies

Vortarulo October 16, 2011 October 16, 2011 at 11:48:56 AM UTC

link

Permalink

As for Latin, one should be careful. There are dozens of different "national" pronunciations, like the Italians pronouncing Cicero as /ˈʧiʧero/ and funny things like that. Germans say /ˈʦɪʦɐʁoː/, French people /siseˈʀo/, I think.

If we were to record extinct languages like Latin, I think we should first agree on the kind of pronunciation. Best thing by far in my opinion is the reconstructed, classical pronunciation, where the name was /ˈkɪkɛroː/ or /ˈkikeroː/ (I'd have to read up on the vowel qualities again).
I have a whole book on that topic, so I could help with questions, but I think Wikipedia is quite helpful there, too.

hide replies show replies

alexmarcelo October 16, 2011 October 16, 2011 at 5:56:00 PM UTC

link

Permalink

I agree. Classical Latin was used during the Golden Age of Latin literature (that's what I use). Although many people don't like this reconstructed pronunciation, it would be the best choice.

hide replies show replies

Demetrius October 21, 2011 October 21, 2011 at 11:23:40 AM UTC

link

Permalink

The problem with reconstructed pronounciation is that it differes from scholar to scholar. For example, is _sonus medius_ in l*bet, opt*mus, lacr*ma, max*mus [ɨ], [ʉ], or [y]?

It is unlikely that we will ever know this. From our sources we just know it’s _medius_...

(I tend to think it’s unlikely to be [y] since it was rarely spelled with y. But I may be biased towards [ɨ] since it exists in my mother tongue. ^^")

hide replies show replies

Vortarulo October 21, 2011 October 21, 2011 at 1:25:18 PM UTC

link

Permalink

Well, some description of an ancient scholar about the sounds of Latin isn't the only source we have. Personal, I believe that [ɪ] might be meant, but I don't know it either.

sysko October 16, 2011 October 16, 2011 at 6:15:46 PM UTC

link

Permalink

When I was in High School, in Latin class we pronounced it /ˈkɪkɛroː/ or /ˈkikeroː/ (I'm not good at IPA), I don't know where you get that French people would pronounced it /siseˈʀo/, maybe some French who have never learnt latin?

hide replies show replies

Vortarulo October 16, 2011 October 16, 2011 at 6:29:06 PM UTC

link

Permalink

Oh?
Well, I vaguely remembered a recording back in Encarta Encyclopedia, where "Cicero" was pronounced by a German, Italian and French person. I remember the French speaker was a woman and pronounced it like "Sisséreaut" (or so). ;)

But maybe that's not standard and maybe it only wanted to show how different people WOULD pronounce the word "Cicero" when the saw it.

hide replies show replies

sysko October 16, 2011 October 16, 2011 at 6:39:33 PM UTC

link

Permalink

Actually nowdays we write it "Cicéron", which sound more or less like you said, but I'm pretty sure in Latin class it is pronounced "correctly". But yep I get what you wanted to explain :)

For other "without native" languages, they certainly have an IPA, for them, as long as it pronounced "correctly" even with a local accent that's okay, but I think for this it would be more "judged" by other speakers

hide replies show replies

Vortarulo October 16, 2011 October 16, 2011 at 6:48:19 PM UTC

link

Permalink

Now *that* makes sense! While repeating the *.wav file in my head, I remember it had a nasal sound at the end, which I found so weird and misplaced that I ignored it. Yeah, that file there definitely had Cicéron.

So yeah, Esperanto, Lojban, Toki Pona and also Klingon could be pronounced by anyone capable of pronouncing the words correctly. Users should also be strict and complain when someone's accent makes him pronounce different consonants (as I, for instance, sometimes pronounce Esperanto "sed" like "zet" due to my German accent, when I don't pay attention). I agree. ;)

I could pronounce Klingon quite well... but my voice isn't deep enough, haha. :D

hide replies show replies

alexmarcelo October 17, 2011 October 17, 2011 at 11:44:16 PM UTC

link

Permalink

Seriously, it would be amazing to have recordings for Klingon! I think you should record some! ;)

I'll send sysko some Latin recordings, so you tell me what you think. If you like them, I can record more.

Demetrius October 21, 2011 October 21, 2011 at 11:26:03 AM UTC

link

Permalink

At our Latin class we pronounced it /'tsɨtsero/, using a Mediaeval pronounciation. :P

sacredceltic October 16, 2011 October 16, 2011 at 9:46:53 PM UTC

link

Permalink

J'ai appris à prononcer /siseˈʀo/ dans mes cours de latin en France. Je pense que c'est une question de génération...à confirmer...

Nero October 16, 2011 October 16, 2011 at 5:33:50 AM UTC

link

Permalink

I believe so. You'd have to talk to TRANG about it though.

treskro3 October 16, 2011 October 16, 2011 at 12:04:33 AM UTC

link

Permalink

Are there any options for differentiating between Mandarin Chinese in simplified or traditional characters? I know that all of them are currently being translated under the PRC flag, which is labelled Simplified Chinese, regardless of what character set is being used. It's not too big of an issue, but for the sake of accuracy it seems strange to have a text in traditional characters labelled as simplified.

Regarding inclusion of languages, I was wondering about the absence of Hokkien while Teochew is present in the list. I'm not trying to put it down or anything, but the two are pretty much at the same level of taxonomy, both being part of Southern Min with Hokkien having almost 30 million more speakers.

And, one final question: how are pronunciation guides for languages like Mandarin, Cantonese, and Japanese added? Are they generated automatically or must they be added manually somehow?

hide replies show replies

sysko October 16, 2011 October 16, 2011 at 12:38:12 AM UTC

link

Permalink

The fact that a Mandarin Chinese sentence is labelled "Mandarin Chinese" (the fact we use the PRC flag is an other issue), the script used doesn't change the fact it's Mandarin. If you pay attention, there's an icon precising the script used
http://flags.tatoeba.org/img/si...ed_chinese.png
or that one
http://flags.tatoeba.org/img/tr...al_chinese.png

We accept both script, because people in Taiwan or Hongkong when writting mandarin will used traditional script, books written before the simplification reform in mainland China are in traditional etc.

For Hokkien/Teochew, actually the problem is the same as with Shanghainese and Suzhouese.

We're using for the moment the international standard ISO 639-3 alpha 3 to put the "limit" between languages (this way it's easier to have an objective point of view, because as you may know, for a lot of Chinese, Cantonese is only a "dialect" of Mandarin, which is linguistically speaking not correct, and it's the case with a lot of languages around the world). According to it, Teochew and Hokkien are both under the code "nan" for Min Nan language http://www.ethnologue.com/show_...e.asp?code=nan , so yep actually it would have been more accurate to name it "Min Nan" in Tatoeba, we were not aware of that when someone proposed us to add Teochew around a year and a half ago. But for the moment it's the only "dialect" of Min Nan we have, a user proposed us to start to Hokkien some months ago, but I was quite busy, and we were talking about this problem, and somehow we lost contact.

In a more general in tatoeba we add a language only when we have somebody with a sufficient level (either a native or someone with pretty strong knowledge in that language, to avoid the "I find this language cool, and discover 5 sentences on a website"). Because as we have no limit on the number of languages we can support (the goal is to have them all), but as there's more than 6000 languages, it would really clutter the interface if we were to add them all know, we prefer to add them when we start to have sentences in it. So for Hokkien, if you're a native speaker or know some, or have strong knowledge in it, we can add it, but it would be a bit special as we have already teotchew with the same ISO code, so maybe I can just rename it "Min Nan" and sentences in Teochew dialect of Min Nan will be tagged so, same for Hokkien, soon a new ISO standard will be released with this time a code for each dialect of each languages, at that time we will be able to make something smarter to be able to not only tell the language but also the dialect it's part of.

For the pronunciation, they are automatically added and for the moment they can't be edited (but in the future it will be possible to do so)

hide replies show replies

treskro3 October 16, 2011 October 16, 2011 at 1:13:52 AM UTC

link

Permalink

Thanks for the clarification. I wouldn't object to having Min Nan as a language with Teochew and Hokkien included underneath, but I guess it's not my decision to make. I am a native speaker, but I'm only partially proficient in the language so there's no need to make any special moves for me. At the moment, however, it seems as though there are only 6 sentences in Teochew, so... I mean obviously it's still up to you.

hide replies show replies

Demetrius October 21, 2011 October 21, 2011 at 11:37:52 AM UTC

link

Permalink

How different are Teochew and Hokkien? Are they mutally intellegible?

hide replies show replies

treskro3 October 21, 2011 October 21, 2011 at 2:59:49 PM UTC

link

Permalink

They're similar enough to both be classified as part of Southern Min, but they are not mutually intelligible.

paula_guisard October 14, 2011 October 14, 2011 at 8:44:03 PM UTC

link

Permalink

I'm not sure if it has already been discussed, but what is the community's opinion on being able to create tags more freely? I mean, for many languages, if we search for a verb, sometimes we are not able to find any matching sentences due to verb conjugation. For example: imagine I'm confused about the use of te expression "to see red" and I write "see red" on the search bar, but the only sentence avaiable is "she SAW red". Then I won't be able to find it. Especially if I don't know that the past of "see" is "saw". Wouldn't it be useful to be able to tag the infinitive form of verbs into our sentences? What does everybody think about it?

hide replies show replies

sysko October 14, 2011 October 14, 2011 at 9:08:37 PM UTC

link

Permalink

actually it's more about improving the search engine capacity to recognize infinitive form of verbs, because right now an advanced user can create any tags (though normal users can't) without any restriction, rather than improving tagging capacity, otherwise we will would finish with hundrends of thousand tags (number of languages we have * number of verbs in each language) which that will uselessly clutter the tags list.

hide replies show replies

paula_guisard October 19, 2011 October 19, 2011 at 7:40:07 PM UTC

link

Permalink

Hm, I get it know. Well, it's just that I was actually thinking of a whole new tagging system, different from the one we use today. Instead of having fixed tags that we select from a list, it would work pretty much like a website search engine. Whenever we create a website, we can create tags for them which are included in the search of any search engine. It would pretty much work like that. So, imagine I've got a sentence, like:

> He woke me up

I could include the tages: "wake up; woke up; wake someone up", for example. Whenever the search engine was used, it would look up for the searched term on the sentences as well as on the hidden tags created by the users at the moment of adding up that sentence.

Of course there would have to be a huge effort into changing the search engine. I just though it would be more helpful.

Certainly, it's up to the community, of course. I'm just trying to learn everyone's thoughts on the matter. :)

hide replies show replies

sacredceltic October 20, 2011 October 20, 2011 at 9:15:45 AM UTC

link

Permalink

Cela impliquerait de faire confiance à tous les utilisateurs (y compris les petits plaisantins et les saboteurs) dans leur capacité à étiqueter leurs phrases de manière pertinente...
Ou alors il faudrait, comme pour le contenu des phrases, pouvoir débattre des étiquettes personnelles, ce qui créerait un système parallèle de discussion...

Personnellement, je ne crois pas à la sagesse des foules. Je crois même plutôt à la bêtise crasse des foules, que l'Histoire des hommes nous enseigne abondamment. Les foules ont inventé le Stalinisme, le Nazisme, Le Maoïsme, Les Khmers Rouges, Le génocide arménien, au Rwanda, et plus récemment on a pu voir la foule se marcher dessus lors d'une simple parade en Allemagne (comme ça arrive fréquemment dans tous les stades et lieux publics...)
Coluche, un humoriste français, disait qu'à plus de 2, on est une bande de cons. Je ne suis pas loin de partager cet avis.

Je me suis déjà souvent exprimé sur ce sujet mais, au risque de me répéter, je vais le faire encore une fois :
De même que la mauvaise monnaie chasse la bonne, la mauvaise information chasse la bonne, et c'est vrai aussi des phrases et de leurs traductions.
Donc si on passe assez de temps avec suffisamment de personnes, sachant que les personnes éclairées ou expertes sont une minorité, elles seront toujours débordées par l'ignorance, l'incompétence et la bêtise.
À terme, je prédis même que le nombre de phrases et de traductions erronées sur Tatoeba dépassera le nombre de phrases et de traductions justes. C'est peut-être déjà le cas...
Mais, au moins, grâce aux débats (parfois houleux) la plupart des phrases erronées arrivent plus ou moins, tôt ou tard (plutôt tard) à être corrigées.

Pour les étiquettes personnelles, je serais beaucoup plus dubitatif. Les gens s'y accrocheraient à tout prix et l'indexation deviendrait complètement foireuse.
Tous les sites d'étiquettes « libres » que je connais terminent tous en un ramassis d'étiquettes mal rédigées, pleines de fautes d'orthographes, redondantes, non administrées et au final inexploitables, surtout en plusieurs langues...
Un exemple pour vous en convaincre : http://fr.forvo.com/tags/

Vous pouvez constater que même la liste actuelle des étiquettes Tatoeba, pourtant réservée aux contributeurs confirmés, et plusieurs fois expurgée, est déjà assez bordélique http://tatoeba.org/fre/tags/view_all et la raison en est que, à part pour les étiquettes utilisées pour l'administration des phrases et de leurs corrections, personne ne s'accorde sur ce que devrait être les étiquettes et à quoi elles devraient servir (il n'y a d'ailleurs pas de débat sur ce sujet, car chacun considère le statu quo selon sa propre interprétation.)

hide replies show replies

Scott October 20, 2011 October 20, 2011 at 5:08:05 PM UTC

link

Permalink

I think that the tag system works pretty well, but tags should be classified. The corpus quality is good in my opinion, though moderators should probably be a bit more active in applying corrections (in French at least).

Here are some of the links for moderators:

http://tatoeba.org/eng/tags/for...e_spelling/eng
http://tatoeba.org/fre/tags/for...rs/@change/eng
http://tatoeba.org/eng/tags/for...ge_grammar/eng

Overall, I agree with sacredceltic that opening the tag system to everyone or having "personal tags" would be a mess.

hide replies show replies

sacredceltic October 20, 2011 October 20, 2011 at 5:46:07 PM UTC

link

Permalink

Il y a actuellement des étiquettes parfaitement inutiles, parce qu'elles n'apportent pas d'information générale aux contributeurs, mais sont utilisées comme codes privés par certains, et qui devraient plutôt faire l'objet de listes...

Le principe des étiquettes, c'est qu'elles sont visibles par tous, et donc leur lecture doit être utile à tous, autrement l'espace public est encombré et ça introduit de la confusion, d'autant qu'elles sont dans une seule langue que tout le monde ne comprend déjà pas forcément...
Toute étiquette qui n'est pas destinée à informer tout le monde et qui ne le fait pas suffisamment clairement (abréviations, mots intraduisibles ou difficilement compréhensibles) devrait donc être retirée et son usage actuel devrait être traité autrement.

hide replies show replies

sysko October 20, 2011 October 20, 2011 at 6:09:03 PM UTC

link

Permalink

oui dans le nouveau système de tag (j'ai plus en tete s'il est deja utilisable sur tato.sysko.fr(en tout cas je l'ai codé)), ne pourront être ajouté que des tags déjà existants, (il sera bien évidemment possible de me demander (à terme il y a aura un formulaire pour cela, pour automatiser la chose de mon coté))

cela évitera les tags inutiles, les tags redondants, permettra de plus facilement les traduire etc.

beaucoup d'autres (dont ce que proposent l'utilisateur au dessus) passeront sur des champs "métas" qui fonctionneront sur un principe de clé valeur comme suit

clé valeur
auteur Victor Hugo
forme "brut" je avoir mangé un pomme
grammaire S V C

(ce ne sont que des exemples, pas la peine de discuter de leur contenu propre )

hide replies show replies

sacredceltic October 20, 2011 October 20, 2011 at 6:26:48 PM UTC

link

Permalink

L'essentiel, c'est que ça n'encombre pas l'espace visuel. L'intérêt d'une étiquette, c'est bien sûr d'attirer l'attention sur quelque chose d'important (par exemple le fait qu'il s'agit d'une citation ou d'un proverbe, ou d'une rime, et qu'il faut donc en tenir compte pour traduire).

Mais trop d'information tue l'information. S'il y a donc trop d'étiquettes, l'œil n'est plus attiré par l'essentiel et ça entraîne de la confusion chez les nouveaux arrivants qui ne comprennent pas très bien à quoi ça sert...

sungkhum October 14, 2011 October 14, 2011 at 12:17:29 AM UTC

link

Permalink

Does anyone know if there is a site like Tatoeba but for words rather than sentences? I've seen wiktionary, but it really is not as user friendly as Tatoeba's interface and so I've had trouble getting people involved. It would be great if Tatoeba could have a sister site for word to word (dictionary type) translations as well - especially because the data can be so easily accessed through downloading .csv files.

hide replies show replies

sysko October 14, 2011 October 14, 2011 at 4:07:44 AM UTC

link

Permalink

If you mean, something which the same as tatoeba on

1 - multi-language with not a single "source" language
2 - possibly to interlink, in order to permit to people to easily create a A to C dictionnary if there's already a A to B and B to C dictionnary by validating translation
3 - Free as in free speech (bab.la is not no?), which permit to reuse it in other project and not keep it captive of one website
4 - code source of the plateform also free to permit contributors to either help to improve it, or to "fork" it, if one day the admin turns crazy

No there's no such a plateform yet (actually even only point 1 2 and 3), the much closer, if you think the "open culture" aspect is important is the wiktionnary, with all the drawbacks we know it have

So actually last year while I was trying to find back a code architecture for the new version of tatoeba, I've made a test on creating a project like you described http://redmine.sysko.fr/project...ict/repository (don't pay attention to the name "shanghainesedict", it's because at first I only wanted to make a shanghainedictionnary website, and soon it turns out that it was actually able to host the same features as tatoeba for a dictionnary)

Of course it will not be a simple "copy paste" of tatoeba code with only a change in the content, but for the moment I'm focusing on the new version of tatoeba itself, so this new project is not going to appear before some months.

hide replies show replies

sungkhum October 14, 2011 October 14, 2011 at 8:16:23 AM UTC

link

Permalink

Totally understand something like that would take some time. I do think it would be a great resource. I will take a look at the code and see if I can get it to work on my server - I'm not a programmer, but perhaps I can give some feedback that would be helpful.

I work with English and Khmer, so that is my main motivation (I have a mediawiki site as a dictionary right now, but it isn't user friendly and it is limited to English and Khmer, rather than any language http://dictionary.sbbic.org) - there aren't many resources out there in Khmer, so I have been looking around for a long time for something that could take a large dictionary but make it open (as in anyone can download it and use it for whatever they want) as well as collaborative and user friendly (unlike wiktionary). Your site is as close as I have found anywhere.

So thanks for your hard work - it is great to see a site like this thriving.

-Nathan

slomox October 14, 2011 October 14, 2011 at 3:45:09 PM UTC

link

Permalink

>Of course it will not be a simple "copy paste" of tatoeba
>code with only a change in the content

It won't? The Tatoeba software is Open Source and available for download, isn't it? You just need to install an instance of it and put in words instead of whole sentences. Or am I missing something?

paula_guisard October 14, 2011 October 14, 2011 at 3:28:59 AM UTC

link

Permalink

I like using http://bab.la
Even though it's a normal dictionnary (that focuses on words rather then sentences), it shows them in several different contextes so you can choose the best option for you.

hide replies show replies

sungkhum October 14, 2011 October 14, 2011 at 8:09:34 AM UTC

link

Permalink

Cool, never seen bab.la, but it doesn't look opensource (meaning the content cannot be downloaded, unless I am mistaken). But it does look like a good resource.

sungkhum October 14, 2011 October 14, 2011 at 12:21:59 AM UTC

link

Permalink

Actually I just found this: https://www.assembla.com/spaces...Wr3Bj7ab7jnrAJ

Is this the source of the backend of Tatoeba? If we hosted a dictionary site using the Tatoeba backend, would that be an acceptable thing (unless of course there is already one that exists)?

Thanks,
Nathan

Snout October 13, 2011 October 13, 2011 at 4:03:19 PM UTC

link

Permalink

Hi ! I don't know if it's the right place to suggest that (or if it has already been discussed before) but I was just thinking about how it would be helpful for correcting sentences to see in a twinkling of an eye from what language it's been translated, for example with a little flag icon next to each sentence.

hide replies show replies

sacredceltic October 13, 2011 October 13, 2011 at 4:34:00 PM UTC

link

Permalink

et si elle est rattachée à plusieurs langues ?

hide replies show replies

Snout October 13, 2011 October 13, 2011 at 4:37:05 PM UTC

link

Permalink

Oui mais on traduit toujours par rapport à une phrase non ?

hide replies show replies

sacredceltic October 13, 2011 October 13, 2011 at 4:51:58 PM UTC

link

Permalink

Oui et non. Si tu te réfères au concept de phrase « originale », je débats que ce concept existe.
Si tu te réfères à la phrase dont je pars pour faire ma traduction à l'instant (qui peut être la nième dérivation de traduction d'une autre phrase « originale »), alors tu as raison dans 95% des cas, mais pas 100%.
Par exemple, il m'arrive fréquemment de traduire à la fois plusieurs langues en une seule et de les rattacher toutes dans la même minute, ou bien de faire des doublons exprès pour que le rattachement soit automatiquement effectué plus tard, si les phrases sont trop éloignées des unes des autres pour que je puisse les rattacher moi-même...
Donc qu'afficher dans ces cas comme langue de départ ?
Plus Tatoeba contient de phrases et de langues, plus de phrases identiques existent sans que nous en ayons conscience à un instant t.

Exemple :
Je traduis une phrase E de l'espéranto vers la phrase F en français.
quelques minutes avant, quelqu'un avait déjà traduit le phrase russe R, traduction de la phrase E, vers le français. Donc il y a maintenant 2 phrases F. Je ne peux pas en être conscient parce que je filtre le russe, qui ne m'intéresse pas, et de toutes façons, comme la traduction du russe serait une traduction indirecte au second degré, je ne pourrais la voir que quand je suis sur E mais pas sur ma F rattachée à E.
Une F est traduite de l'espéranto, une autre du russe. Qu'afficher lorsque ces 2 phrases vont être fusionnées, c'est à dire à tout moment où la procédure de déduplication sera exécutée ou que n'importe quel gestionnaire du corpus décide de les fusionner ?

alexmarcelo October 13, 2011 October 13, 2011 at 4:38:20 PM UTC

link

Permalink

Something similar has been discussed here:
http://tatoeba.org/wall/show_me...3#message_7113

sungkhum October 13, 2011 October 13, 2011 at 12:24:29 AM UTC

link

Permalink

How do I make a public list? I want to add the Khmer language to this site, so I am trying to follow the steps in the FAQ, but I don't see a place to make a public list for Khmer.
Thanks!

hide replies show replies

sungkhum October 13, 2011 October 13, 2011 at 12:42:16 AM UTC

link

Permalink

I figured it out - click on "Browse" then "Browse by list" and then create a new list - once there click the "make public" checkbox.

-Nathan

hide replies show replies

Vortarulo October 13, 2011 October 13, 2011 at 1:54:33 AM UTC

link

Permalink

I'm happy to see Khmer being added by someone!

And welcome to Tatoeba! :)

hide replies show replies

sungkhum October 13, 2011 October 13, 2011 at 5:29:20 AM UTC

link

Permalink

Thanks!

sacredceltic October 13, 2011 October 13, 2011 at 3:39:27 PM UTC

link

Permalink

Bonvenon!

sysko October 12, 2011 October 12, 2011 at 2:57:23 AM UTC

link

Permalink

100 000 phrases en français \o/ allez hop maintenant mon avion pour la Chine ;-)

hide replies show replies

sacredceltic October 12, 2011 October 12, 2011 at 8:40:42 AM UTC

link

Permalink

*** 100.000 phrases en français ***
[fra]
La langue française compte désormais 100.000 phrases sur Tatoeba ! Félicitations à tous les francophones de Tatoeba !

Parmi les 40 premières langues de Tatoeba :
Le français est la 1ère deuxième langue de traduction et la 2e troisième langue de traduction après l'allemand.

Le français est la première langue de traduction pour l'espéranto (devant l'allemand), essentiellement grâce au travail inlassable et de très grande qualité de GrizaLeono, qui, seul, a déjà traduit près d'1/4 des phrases françaises dans la langue internationale. Merci à lui, dont la langue natale est le flamand et qui maîtrise si bien le français et l'espéranto !

Le français est la 2e langue de traduction pour :

l'anglais
le japonais
l'allemand
l'italien
le chinois
l'hébreu
le vietnamien
l'occitan

Le français est la troisième langue de traduction pour :

l'espagnol
le turc
l'ukrainien
l'arabe

[epo]
La franca lingvo nun havas 100.000 frazojn en Tatoeba ! Gratulon al ĉiuj francolingvanoj ĉe Tatoeba !

Inter la 40 unuaj lingvoj de Tatoeba :
La franca estas la 1a dua traduklingvo kaj la 2a tria traduklingvo post la germana.

La franca estas la unua traduklingvo por esperanto (antaŭ la germana), precipe dank'al la senlacigebla kaj altklasa laboro de GrizLeono, kiu, sole, jam tradukis preskaŭ 1/4 el la francaj frazoj en la internacia lingvo. Dank'al li, kies gepatra lingvo estas la flandra kaj kiu tiom bone superregas la francan kaj esperanton!
La franca estas la dua traduklingvo por:

la angla
la japana
la germana
la itala
la ĉina
la hebrea
la vjetnama
la okcitana

La franca estas la tria traduklingvo por:

la hispana
la turka
la ukraina
la araba

[eng]
The French language now features 100,000 sentences in Tatoeba ! Congratulations to all french-speakers on Tatoeba !

Among the first 40 languages on Tatoeba :
French is the 1st second translation language and the 2d third translation language after German.

French is the first language of translation for Esperanto (ahead of German), mainly thanks to the tireless and very high quality work by GrizaLeono, who, alone, already translated almost 1/4 of the French sentences into the international language. Thanks to him whose mother language is Flemish and who masters French and Esperanto so well!

French is the second language of translation for:

English
Japanese
German
Italian
Chinese
Hebrew
Vietnamese
Occitan

French is the third language of translation for:

Spanish
Turkic
Ukrainian
Arabic

Scott October 12, 2011 October 12, 2011 at 4:30:54 PM UTC

link

Permalink

\o/

alexmarcelo October 12, 2011 October 12, 2011 at 5:58:21 PM UTC

link

Permalink

Félicitations à tous !

Wall (7,122 threads)

Tips

sharptoothed

sharptoothed

TATAR1

AlanF_US

sharptoothed

Shanaz

Qaztat

TATAR1

Tartar

menaud

Need some help?

Developers

About