Profîl
Cumleyî
Çekuye
Etudî
Lîsteyî
Favorîyî
Şiroveyî
Şiroveyê ke cumleyanê sysko ser o ameyê kerdene
Mesajê Dêsî
Dekewtişî
Veng
Transkrîpsyonî
Cumleyanê sysko biaçarne

done

the software works on "words" y "words" not on characters by characters, and looking to the number of "words" I use (around 100k). I think it's really realiable, as anyway it's the technics used by wikipedia (and their words list is far shorter than mine), and most of possible ambiguity disappear if you see the text as "words" (anyway otherwise it would have been ambigous for human reader too).
But after some "errors" are still possible, for the rare case of single characters "words" and also for errors in the words list itself. So by correcting the words list, and by permit manual edit of the generated "other script" version (which will be possible in a near future) we should be able to reach easily a 99,9% accuracy. (I think we're near 95~98% for the moment)

chinese sentences has a little icon in the icon bar to say if it's simplified or traditional script.

Oh yep sure, in fact the software I've made only segments words, the words themselves, and their pinyin is already stored, so it's just a per entry correction to make :), I will do that, if you see other entries which are missing this correction, can you put them in this thread ? :)

you still can contact the author to ask him if he's the author of this sentences, or if at least it does have the right on these sentences, so that he can give you a copy of them relicenced under the CC-BY or a compatible licence. Because to be honest the GPL has not been made for text, books or so, there are more suitable licences for this kind of data (such as the CC-BY / CC-BY-SA , or the Gnu FDL if you want to say in Gnu's licence), so I do think the guy chose th GPL more because it was to say "my data are open" rather than for the exact terms of the licence. So it costs nothing to ask :) (1000 shanghainese sentences come from a copyrighted books for which the author given me the explicit authorization to use them under CC-BY)

but it's still not a different language, and it still can be written in both script. If a sentence is specific to a part of China, then it will be tagged accordingly.

Hi, glad to see a new Chinese user here
If you take a look you will see that each chinese sentences come with both script, and when you add a sentence, the other script version is automatically generated. Moreover as said, it's 2 different scripts, not 2 different languages.

the technical limitation is 500 characters (due to the way we store sentences). After for the "guideline" part, I agree with what Pahramp said.

it's depend of what you mean by an "annotation system" :)

I will be in Macao the 16th

for Chinese I've made a sentence analyser so it's should be possible too :) and anyway if we can do that autotically for 90% of the language we support, it's already far better than nothing :)

=your sentence will only search exact match of "your" + sentence not "your sentence"

="your sentence starting with a equal and enclosed by quote, it's case unsensitive"

ça devrait remarcher :)

http://yoyodyne.cc/tatoeba/
\o/
Thanks to everyone, this year with you was wonderful
See you in 2011

(ah ce propos il faut que tu corriges un petit truc, car les personnes qui comme sacredceltic et moi avions déjà mis des espaces insécables, nos phrases on a présent deux espaces, un fine insécable, et un insécable)

my bad, an error while manipulating the server, should be back now ^^
thanks to have noticed me :)

有 :p

In the sentence commment yep, but not in the sentence text itself. Because as the data of tatoeba can be reuse for any purpose it is important to keep the sentence as pure as possible. But as I plan with one of my friends, to focus on tools for sentence analysis when we will have finished a first release of tatoeba, I think I will add a field somewhere to add such informations in a more specific place than the "all purpose" comments.

thanks to you too, I think you've spent more time adding all the Cantonese sentences (in addition to all the Mandarin sentences) than me coding this software :)
for the list, a .txt file with the following format would be perfect
word[tab]jyutping
word2[tab]jyutping2
:)