iznan n Demetrius ɣef uɣrab-is

Demetrius {{ icon }}

keyboard_arrow_right

Amaɣnu

keyboard_arrow_right

Tifyar

keyboard_arrow_right

Amawal

keyboard_arrow_right

Iceggiren

keyboard_arrow_right

Tibdarin

keyboard_arrow_right

Inurifen

keyboard_arrow_right

Iwenniten

keyboard_arrow_right

Iwenniten ɣef tifyar n Demetrius

keyboard_arrow_right

Iznan n uɣrab

keyboard_arrow_right

Aɣmis

keyboard_arrow_right

Imesli

keyboard_arrow_right

Iḍrisen

translate

Suqqel tifyar n Demetrius

Demetrius 17 Ctembeṛ 2010 17 Ctembeṛ 2010 3:52:06 n tmeddit UTC

link

Aseɣwen yezgan

> that's cheating
Of course it is. ;)

But what I meant is that dictionaries often provide example sentences. It depends on a dictionary. And technically half of the Tatoeba sentences can easily end up in a dictionary. It's not a reason to delete it.

Demetrius 17 Ctembeṛ 2010 17 Ctembeṛ 2010 2:12:42 n tmeddit UTC

link

Aseɣwen yezgan

> rather than word I will say "one sementical unit"
What is a semantical unit? A seme?
Then "Buy" has 2 of these.

> that if an entry in tatoeba can also be
> found in a dictionnary, (delta the flexion)
In WWWJDIC you can find most phrases from Tatoeba.
WWWJDIC is a dictionary.

It means:
All Japanese phrases should be deleted.

IMHO all needs moderation. Boracasli lacks it, but forbiding all the 1-word sentences isn't any better.

Demetrius 16 Ctembeṛ 2010 16 Ctembeṛ 2010 9:37:33 n tufat UTC

link

Aseɣwen yezgan

But Buy is an example of imperative mood.

In may languages it would have a T-V distinction...

Demetrius 16 Ctembeṛ 2010 16 Ctembeṛ 2010 8:59:39 n tufat UTC

link

Aseɣwen yezgan

Actually, I don't understand why "Buy" is less important than "Cat is not human". >_<

Demetrius 16 Ctembeṛ 2010 16 Ctembeṛ 2010 8:15:41 n tufat UTC

link

Aseɣwen yezgan

Can you give a more clear guidelines?

IMHO sentences shouldn't be deleted simply because you suspect they were taken from a dictionary.

Also consider polysynthetic languages, where a great lot of very useful phrases can be said in one word. For example, in Chukchi phrasebook I’ve found the following single word sentences:
«Титэтгивик?» means «How much?»
«Тантыԓянвыԓьын?» means «Is the road good?»

Do you think they are also out of the scope of this project?

Demetrius 15 Ctembeṛ 2010 15 Ctembeṛ 2010 11:39:51 n tufat UTC

link

Aseɣwen yezgan

But is useful for natural language processing:
a) automatic translators,
b) sentence classification

Tatoeba is a text corpus. Programmers can write an algorithm, but they need a text corpus to make it work.

For example, in Tatoeba bad sentences have tags "rude", "offensive", "XXX".

Using a simple alghorithm[1] and Tatoeba sentences, anyone can write a program that can look at any sentence in the same language and say: "It's rude" or "It's not rude". It then can be used, for example, to hide some text from children.

Or, for example, it's possible to create a program that detects a language using Tatoeba data.

Or check whether the text is optimistic or pessimistic.

Or even to create automatic translators. (But for these, a lot of text is neccessary. For many language we have too few sentences for this... now :))

And many other things... Practically all programs working with language need a text corpus!

Tatoeba is not the only corpus, there are many of them. But Tatoeba is better because:
* It's free,
* It's multilingual (usually corpora support only 1 language, or 2, not more)

[1] For example, you can use a naive Bayesian classifier for this.

Demetrius 15 Ctembeṛ 2010 15 Ctembeṛ 2010 11:01:46 n tufat UTC

link

Aseɣwen yezgan

=))

We don't. :) On the wall, there may be discussion. But if it's a sentence, it's 100% OK.

We need different sentences! ^^

And we do have patriotic sentences. :)

See:
http://tatoeba.org/eng/sentences/show/467460
http://tatoeba.org/eng/sentences/show/485186

Demetrius 15 Ctembeṛ 2010 15 Ctembeṛ 2010 9:50:26 n tufat UTC

link

Aseɣwen yezgan

Cool, thank you. ^^

Demetrius 15 Ctembeṛ 2010 15 Ctembeṛ 2010 9:48:33 n tufat UTC

link

Aseɣwen yezgan

زنده باد زبان فارسی
:)

Can you add this as a sentence please? :)

Demetrius 14 Ctembeṛ 2010 14 Ctembeṛ 2010 5:37:17 n tmeddit UTC

link

Aseɣwen yezgan

Cool!

Demetrius 14 Ctembeṛ 2010 14 Ctembeṛ 2010 2:17:02 n tmeddit UTC

link

Aseɣwen yezgan

Now I don’t know if I know what I’ve said.

ö (It’s my new way of writing :o)

Demetrius 13 Ctembeṛ 2010 13 Ctembeṛ 2010 12:20:43 n tufat UTC

link

Aseɣwen yezgan

What the?..

What is it supposed to mean?

Demetrius 13 Ctembeṛ 2010 13 Ctembeṛ 2010 12:19:28 n tufat UTC

link

Aseɣwen yezgan

Demetrius 12 Ctembeṛ 2010 12 Ctembeṛ 2010 9:55:23 n tmeddit UTC

link

Aseɣwen yezgan

IMHO their license is too restrictive.

Demetrius 12 Ctembeṛ 2010 12 Ctembeṛ 2010 9:53:05 n tmeddit UTC

link

Aseɣwen yezgan

IMO, this kind of metadata is not fit for tags.

Demetrius 12 Ctembeṛ 2010 12 Ctembeṛ 2010 9:47:36 n tmeddit UTC

link

Aseɣwen yezgan

Thank you for the link! ^^

Demetrius 12 Ctembeṛ 2010 12 Ctembeṛ 2010 9:44:27 n tmeddit UTC

link

Aseɣwen yezgan

Demetrius 12 Ctembeṛ 2010 12 Ctembeṛ 2010 9:05:01 n tmeddit UTC

link

Aseɣwen yezgan

Wictionary is hard to edit for an average user.

It’s hard to find a balance between a computer-parsable dictionary and a easy-to-edit for an average human being. The Wiktionary is *much* *more* *complicated* than Tatoeba.

Aslo, although it’s exportable and parseable, but I haven’t seen any program that presents the exported data in a form of a bilingual dictionary.

All in all, I believe the Wiki engine is not fit for creating dictionaries.

I think we’ll run into the problem of a dictionary later:
1. Now we have some tags [verb_of_motion, Genitive] that are better fit as tags for words, not for sentences. => We need tags for words.
2. We can’t tag all the words, or force users to do it, since it’s too much work. => We need a morphology analyser.
3. Morphology data about the words need the dictionary. Wiktionary is hard to edit for an average user and rarely exported. => We need something more lightweight.

So I believe one day something like Tatoeba dictionary will emerge.

Also, there is a problem: what language edition of Wiktionary to choose? The explanations are different, but the translations in all Wiktionaries in fact duplicate each other.

Demetrius 12 Ctembeṛ 2010 12 Ctembeṛ 2010 8:54:37 n tmeddit UTC

link

Aseɣwen yezgan

By the way, there is a secret copy of Tatoeba. ;) It is blue.

Demetrius 12 Ctembeṛ 2010 12 Ctembeṛ 2010 11:01:30 n tufat UTC

link

Aseɣwen yezgan

Well, it depends on the language.

Arabic script for Uyghur shows all the vowels.

Tesriḍ tallelt?

Ineflayen

Ɣef

iznan n Demetrius ɣef uɣrab-is (amatu 442)

Tesriḍ tallelt?

Ineflayen

Ɣef