Perfil
Frasas
Vocabulary
Reviews
Lists
Marcapaginas
Comentaris
Comentaris sus las frasas de Demetrius
Cabinats
Jornals
Audio
Transcriptions
Translate Demetrius's sentences

> that's cheating
Of course it is. ;)
But what I meant is that dictionaries often provide example sentences. It depends on a dictionary. And technically half of the Tatoeba sentences can easily end up in a dictionary. It's not a reason to delete it.

> rather than word I will say "one sementical unit"
What is a semantical unit? A seme?
Then "Buy" has 2 of these.
> that if an entry in tatoeba can also be
> found in a dictionnary, (delta the flexion)
In WWWJDIC you can find most phrases from Tatoeba.
WWWJDIC is a dictionary.
It means:
All Japanese phrases should be deleted.
IMHO all needs moderation. Boracasli lacks it, but forbiding all the 1-word sentences isn't any better.

But Buy is an example of imperative mood.
In may languages it would have a T-V distinction...

Actually, I don't understand why "Buy" is less important than "Cat is not human". >_<

Can you give a more clear guidelines?
IMHO sentences shouldn't be deleted simply because you suspect they were taken from a dictionary.
Also consider polysynthetic languages, where a great lot of very useful phrases can be said in one word. For example, in Chukchi phrasebook I’ve found the following single word sentences:
«Титэтгивик?» means «How much?»
«Тантыԓянвыԓьын?» means «Is the road good?»
Do you think they are also out of the scope of this project?

But is useful for natural language processing:
a) automatic translators,
b) sentence classification
Tatoeba is a text corpus. Programmers can write an algorithm, but they need a text corpus to make it work.
For example, in Tatoeba bad sentences have tags "rude", "offensive", "XXX".
Using a simple alghorithm[1] and Tatoeba sentences, anyone can write a program that can look at any sentence in the same language and say: "It's rude" or "It's not rude". It then can be used, for example, to hide some text from children.
Or, for example, it's possible to create a program that detects a language using Tatoeba data.
Or check whether the text is optimistic or pessimistic.
Or even to create automatic translators. (But for these, a lot of text is neccessary. For many language we have too few sentences for this... now :))
And many other things... Practically all programs working with language need a text corpus!
Tatoeba is not the only corpus, there are many of them. But Tatoeba is better because:
* It's free,
* It's multilingual (usually corpora support only 1 language, or 2, not more)
[1] For example, you can use a naive Bayesian classifier for this.

=))
We don't. :) On the wall, there may be discussion. But if it's a sentence, it's 100% OK.
We need different sentences! ^^
And we do have patriotic sentences. :)
See:
http://tatoeba.org/eng/sentences/show/467460
http://tatoeba.org/eng/sentences/show/485186

Cool, thank you. ^^

زنده باد زبان فارسی
:)
Can you add this as a sentence please? :)

Cool!

Now I don’t know if I know what I’ve said.
ö (It’s my new way of writing :o)

What the?..
What is it supposed to mean?

0

IMHO their license is too restrictive.

IMO, this kind of metadata is not fit for tags.

Thank you for the link! ^^

+1

Wictionary is hard to edit for an average user.
It’s hard to find a balance between a computer-parsable dictionary and a easy-to-edit for an average human being. The Wiktionary is *much* *more* *complicated* than Tatoeba.
Aslo, although it’s exportable and parseable, but I haven’t seen any program that presents the exported data in a form of a bilingual dictionary.
All in all, I believe the Wiki engine is not fit for creating dictionaries.
I think we’ll run into the problem of a dictionary later:
1. Now we have some tags [verb_of_motion, Genitive] that are better fit as tags for words, not for sentences. => We need tags for words.
2. We can’t tag all the words, or force users to do it, since it’s too much work. => We need a morphology analyser.
3. Morphology data about the words need the dictionary. Wiktionary is hard to edit for an average user and rarely exported. => We need something more lightweight.
So I believe one day something like Tatoeba dictionary will emerge.
Also, there is a problem: what language edition of Wiktionary to choose? The explanations are different, but the translations in all Wiktionaries in fact duplicate each other.

By the way, there is a secret copy of Tatoeba. ;) It is blue.

Well, it depends on the language.
Arabic script for Uyghur shows all the vowels.