About unapproved sentences

You may see some sentences in red. These sentences are not approved by Tatoeba's community. They raise copyright issues or are otherwise problematic. If you are a contributor, please avoid translating them.

Logs

- date unknown
四月一日です。
- date unknown
linked to 245523

Report mistakes

Do not hesitate to post a comment if you see a mistake!

NOTE: If the sentence does not belong to anyone and you know how to correct the mistake, feel free to correct it without posting any comment. You will have to adopt the sentence before you can edit it.

Sentence #168964

jpn
四月一日です。
四月[しがつ] 一[いち] 日[にち] です[] 。[]

Important! You are about to add a translation to the sentence above. If you do not understand this sentence, click on "Cancel" to display everything again, and then click on the sentence that you understand and want to translate from.

Please do not forget capital letters and punctuation! Thank you.

Comments

raeldor
Jan 10th 2013, 02:12
Japanese kana translation is ichinichi instead of tsuitachi. Is this kana generated automatically, or are you able to correct? Do I need to submit corrections to the Mecab project or something? I would have expected Mecab to be more accurate in these cases, since ichinichi would be rarely used after the gatsu kanji. Weird.
bunbuku
Jan 10th 2013, 03:39
Unfortunately, I'm not able to correct these furigana mistakes.
Additionally you're right, it should be read ついたち in this case.
raeldor
Jan 10th 2013, 03:49
May I ask how the kana is generated please? Are you using Mecab?
CK
CK
Jan 10th 2013, 03:53
>May I ask how the kana is generated please? Are you using Mecab?
It is automatically generated and shouldn't be trusted too much.
For study purposes, it's better to use something like Rikaichan or Rikaikun.
(You don't want to waste your time learning something that is wrong.)

According to this "history" item.

2010-04-02:
Switched to MeCab for handling Japanese Fugigana and Romaji. Romaji now only shows up on mouseover rather than being displayed on the page.
raeldor
Jan 10th 2013, 04:30
Given Mecab's popularity and the algorithm that's supposed to be the best, I'm a little surprised it falls on it's a$$ at something this simple.
raeldor
Jan 10th 2013, 04:38
I understand Mecab is trained on sample data. It would be useful if Tatoeba could accept changes to kana. That way it could become a good source of training data for morphological analyzers.
sharptoothed
Jan 10th 2013, 10:43
> I'm a little surprised it falls on it's a$$ at something this simple.
All is not that simple as it may seem, unfortunately. Japanese morphology is rather complex for machine analysis and requires good dictionaries based on representative corpus. Currently, only a few dictionaries for MeCab available (I know about IPADIC, JUMADIC and NAISTDIC). Depending on the dictionary, MeCab may produce different results. Take a look at "best" results:
IPADIC
四月 名詞,副詞可能,*,*,*,*,四月,シガツ,シガツ
一 名詞,数,*,*,*,*,一,イチ,イチ
日 名詞,接尾,助数詞,*,*,*,日,ニチ,ニチ
です 助動詞,*,*,*,特殊・デス,基本形,です,デス,デス
。 記号,句点,*,*,*,*,。,。,。
JUMADIC
四 名詞,数詞,*,*,四,よっ,*
月 接尾辞,名詞性名詞助数辞,*,*,月,がつ,*
一 名詞,数詞,*,*,一,ひと,*
日 接尾辞,名詞性名詞助数辞,*,*,日,にち,*
です 判定詞,*,判定詞,デス列基本形,だ,です,*
。 特殊,句点,*,*,。,。,*

As we can see, the latter is even more inaccurate than the one we have on Tatoeba. Actually, with IPADIC it's possible to find more accurate result but, from MeCab's point of view, it's not the best:
四月 名詞,副詞可能,*,*,*,*,四月,シガツ,シガツ
一日 名詞,固有名詞,地域,一般,*,*,一日,ヒトイチ,ヒトイチ
です 助動詞,*,*,*,特殊・デス,基本形,です,デス,デス
。 記号,句点,*,*,*,*,。,。,。

I hope the dictionaries and algorithms will continue to improve but, for the time being, we have what we have.
CK
CK
Jan 10th 2013, 11:12
If you're interested in other kinds of "furigana mistakes" on this website, you can see all the furigana_mistake tagged sentences.

http://tatoeba.org/eng/tags/sho...rigana_mistake
raeldor
Jan 10th 2013, 15:19
It was mentioned in another thread that adding meta-data to the sentences is perhaps on the cards. This could be a fantastic mechanism for overriding the default furigana in the case of mistakes.

You need to be logged in to add a comment. If you are not registered, you can register here.