menu
Tatoeba
language
注册 登录
language 中文(大陆简体)
menu
Tatoeba

chevron_right 注册

chevron_right 登录

浏览

chevron_right 随机句子

chevron_right 选择语言

chevron_right 选择列表

chevron_right 选择标签

chevron_right 选择音频

社群

chevron_right 留言板

chevron_right 用户列表

chevron_right 用户的语言

chevron_right 母语者

search
clear
swap_horiz
search
Demetrius {{ icon }} keyboard_arrow_right

个人资料

keyboard_arrow_right

句子

keyboard_arrow_right

词汇

keyboard_arrow_right

审阅

keyboard_arrow_right

列表

keyboard_arrow_right

收藏

keyboard_arrow_right

评论

keyboard_arrow_right

Demetrius句子上的评论

keyboard_arrow_right

留言板信息

keyboard_arrow_right

历史记录

keyboard_arrow_right

音频

keyboard_arrow_right

转写

translate

翻译Demetrius的句子

Demetrius的留言板信息(共计442条)

Demetrius Demetrius 2010年9月17日 2010年9月17日 UTC 下午3:52:06 link 永久链接

> that's cheating
Of course it is. ;)

But what I meant is that dictionaries often provide example sentences. It depends on a dictionary. And technically half of the Tatoeba sentences can easily end up in a dictionary. It's not a reason to delete it.

Demetrius Demetrius 2010年9月17日 2010年9月17日 UTC 下午2:12:42 link 永久链接

> rather than word I will say "one sementical unit"
What is a semantical unit? A seme?
Then "Buy" has 2 of these.

> that if an entry in tatoeba can also be
> found in a dictionnary, (delta the flexion)
In WWWJDIC you can find most phrases from Tatoeba.
WWWJDIC is a dictionary.

It means:
All Japanese phrases should be deleted.

IMHO all needs moderation. Boracasli lacks it, but forbiding all the 1-word sentences isn't any better.

Demetrius Demetrius 2010年9月16日 2010年9月16日 UTC 上午9:37:33 link 永久链接

But Buy is an example of imperative mood.

In may languages it would have a T-V distinction...

Demetrius Demetrius 2010年9月16日 2010年9月16日 UTC 上午8:59:39 link 永久链接

Actually, I don't understand why "Buy" is less important than "Cat is not human". >_<

Demetrius Demetrius 2010年9月16日 2010年9月16日 UTC 上午8:15:41 link 永久链接

Can you give a more clear guidelines?

IMHO sentences shouldn't be deleted simply because you suspect they were taken from a dictionary.


Also consider polysynthetic languages, where a great lot of very useful phrases can be said in one word. For example, in Chukchi phrasebook I’ve found the following single word sentences:
«Титэтгивик?» means «How much?»
«Тантыԓянвыԓьын?» means «Is the road good?»

Do you think they are also out of the scope of this project?

Demetrius Demetrius 2010年9月15日 2010年9月15日 UTC 上午11:39:51 link 永久链接

But is useful for natural language processing:
a) automatic translators,
b) sentence classification

Tatoeba is a text corpus. Programmers can write an algorithm, but they need a text corpus to make it work.



For example, in Tatoeba bad sentences have tags "rude", "offensive", "XXX".

Using a simple alghorithm[1] and Tatoeba sentences, anyone can write a program that can look at any sentence in the same language and say: "It's rude" or "It's not rude". It then can be used, for example, to hide some text from children.

Or, for example, it's possible to create a program that detects a language using Tatoeba data.

Or check whether the text is optimistic or pessimistic.

Or even to create automatic translators. (But for these, a lot of text is neccessary. For many language we have too few sentences for this... now :))

And many other things... Practically all programs working with language need a text corpus!


Tatoeba is not the only corpus, there are many of them. But Tatoeba is better because:
* It's free,
* It's multilingual (usually corpora support only 1 language, or 2, not more)


[1] For example, you can use a naive Bayesian classifier for this.

Demetrius Demetrius 2010年9月15日 2010年9月15日 UTC 上午11:01:46 link 永久链接

=))

We don't. :) On the wall, there may be discussion. But if it's a sentence, it's 100% OK.

We need different sentences! ^^

And we do have patriotic sentences. :)

See:
http://tatoeba.org/eng/sentences/show/467460
http://tatoeba.org/eng/sentences/show/485186

Demetrius Demetrius 2010年9月15日 2010年9月15日 UTC 上午9:50:26 link 永久链接

Cool, thank you. ^^

Demetrius Demetrius 2010年9月15日 2010年9月15日 UTC 上午9:48:33 link 永久链接

زنده باد زبان فارسی
:)

Can you add this as a sentence please? :)

Demetrius Demetrius 2010年9月14日 2010年9月14日 UTC 下午5:37:17 link 永久链接

Cool!

Demetrius Demetrius 2010年9月14日 2010年9月14日 UTC 下午2:17:02 link 永久链接

Now I don’t know if I know what I’ve said.

ö (It’s my new way of writing :o)

Demetrius Demetrius 2010年9月13日 2010年9月13日 UTC 上午12:20:43 link 永久链接

What the?..

What is it supposed to mean?

Demetrius Demetrius 2010年9月13日 2010年9月13日 UTC 上午12:19:28 link 永久链接

0

Demetrius Demetrius 2010年9月12日 2010年9月12日 UTC 下午9:55:23 link 永久链接

IMHO their license is too restrictive.

Demetrius Demetrius 2010年9月12日 2010年9月12日 UTC 下午9:53:05 link 永久链接

IMO, this kind of metadata is not fit for tags.

Demetrius Demetrius 2010年9月12日 2010年9月12日 UTC 下午9:47:36 link 永久链接

Thank you for the link! ^^

Demetrius Demetrius 2010年9月12日 2010年9月12日 UTC 下午9:44:27 link 永久链接

+1

Demetrius Demetrius 2010年9月12日 2010年9月12日 UTC 下午9:05:01 link 永久链接

Wictionary is hard to edit for an average user.

It’s hard to find a balance between a computer-parsable dictionary and a easy-to-edit for an average human being. The Wiktionary is *much* *more* *complicated* than Tatoeba.

Aslo, although it’s exportable and parseable, but I haven’t seen any program that presents the exported data in a form of a bilingual dictionary.

All in all, I believe the Wiki engine is not fit for creating dictionaries.



I think we’ll run into the problem of a dictionary later:
1. Now we have some tags [verb_of_motion, Genitive] that are better fit as tags for words, not for sentences. => We need tags for words.
2. We can’t tag all the words, or force users to do it, since it’s too much work. => We need a morphology analyser.
3. Morphology data about the words need the dictionary. Wiktionary is hard to edit for an average user and rarely exported. => We need something more lightweight.

So I believe one day something like Tatoeba dictionary will emerge.

Also, there is a problem: what language edition of Wiktionary to choose? The explanations are different, but the translations in all Wiktionaries in fact duplicate each other.

Demetrius Demetrius 2010年9月12日 2010年9月12日 UTC 下午8:54:37 link 永久链接

By the way, there is a secret copy of Tatoeba. ;) It is blue.

Demetrius Demetrius 2010年9月12日 2010年9月12日 UTC 上午11:01:30 link 永久链接

Well, it depends on the language.

Arabic script for Uyghur shows all the vowels.