menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
blay_paul blay_paul June 12, 2010 June 12, 2010 at 10:17:37 AM UTC link Permalink

MeCab 管理人

Could we get a volunteer to take charge of MeCab / MeCab dictionary improvements? There are some things that look like easy fixes (一晩中 ひとばんじゅう) and some things that are just weird glitches (ご覧 has no okurigana). I don't really have the time to look into it myself so I think another person is needed.

{{vm.hiddenReplies[1240] ? 'expand_more' : 'expand_less'}} hide replies show replies
JimBreen JimBreen June 17, 2010 June 17, 2010 at 7:52:36 AM UTC link Permalink

As I have commented, adding to MeCab's dictionary won't fix all these. You really have to fix it on a case-by-case basis. The WWWJDIC indices would help here, although not all sentences are indexed.

{{vm.hiddenReplies[1299] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 17, 2010 June 17, 2010 at 8:14:05 AM UTC link Permalink

Yes, but _some_ of them should be fixable via dictionary. Looking at the list
http://tatoeba.org/eng/sentences_lists/show/113
I'd suspect that most of them could be.

{{vm.hiddenReplies[1300] ? 'expand_more' : 'expand_less'}} hide replies show replies
JimBreen JimBreen June 17, 2010 June 17, 2010 at 8:29:31 AM UTC link Permalink

Looking at the first one (アマゾン川は延々と北ブ流れている) I'd say it's a mistake by whoever programmed the MeCab interface in Tatoeba. アマゾン川 is handled correctly by MeCab:
アマゾン川 名詞,固有名詞,一般,*,*,*,アマゾン川,アマゾンガワ,アマゾンガワ

The second one (母にひと月に一度手紙を書きます) is the same. Mecab handles ひと月 OK, but Tatoeba ignores it.
ひと月 名詞,一般,*,*,*,*,ひと月,ヒトツキ,ヒトツキ

I think someone is not noticing the kanji at the end of the string returned.

Which dictionary is being used, BTW?