menu
Tatoeba
language
Nýskráning Innskrá
language Íslenska
menu
Tatoeba

chevron_right Nýskráning

chevron_right Innskrá

Vafra

chevron_right Sýna setningu af handahófi

chevron_right Vafra eftir tungumáli

chevron_right Vafra eftir lista

chevron_right Vafra eftir merki

chevron_right Vafra upptökum

Samfélag

chevron_right Veggur

chevron_right Meðlimalisti

chevron_right Listi tungumála meðlima

chevron_right Innfæddir

search
clear
swap_horiz
search

Setning #168964

info_outline Metadata
warning
Your sentence was not added because the following already exists.
Setning #{{vm.sentence.id}} — eigandi er {{vm.sentence.user.username}} Setning #{{vm.sentence.id}}
{{vm.sentence.furigana.info_message}} {{vm.sentence.text}}
star This sentence belongs to a native speaker.
warning Þessi setning er ekki áreiðanleg.
content_copy Afrita setningu info Fara á síðu setningar
subdirectory_arrow_right
warning
{{transcription.info_message}}
Þýðingar
Aftengja þessa setningu link Gera í beinþýðingu chevron_right
{{translation.furigana.info_message}} {{translation.text}} Núverandi setning #{{::translation.id}} hefur verið bætt við sem þýðingu.
edit Breyta þessari setningu
warning Þessi setning er ekki áreiðanleg.
content_copy Afrita setningu info Fara á síðu setningar
subdirectory_arrow_right
warning
{{transcription.info_message}}
Þýðingar af þýðingum
Aftengja þessa setningu link Gera í beinþýðingu chevron_right
{{translation.furigana.info_message}} {{translation.text}} Núverandi setning #{{::translation.id}} hefur verið bætt við sem þýðingu.
edit Breyta þessari setningu
warning Þessi setning er ekki áreiðanleg.
content_copy Afrita setningu info Fara á síðu setningar
subdirectory_arrow_right
warning
{{transcription.info_message}}
{{vm.expandableIcon}} {{vm.sentence.expandLabel}} Færri þýðingar

Ummæli

raeldor raeldor 10. janúar 2013 10. janúar 2013 kl. 02:12:52 UTC flag Report link Tengill

Japanese kana translation is ichinichi instead of tsuitachi. Is this kana generated automatically, or are you able to correct? Do I need to submit corrections to the Mecab project or something? I would have expected Mecab to be more accurate in these cases, since ichinichi would be rarely used after the gatsu kanji. Weird.

bunbuku bunbuku 10. janúar 2013 10. janúar 2013 kl. 03:39:00 UTC flag Report link Tengill

Unfortunately, I'm not able to correct these furigana mistakes.
Additionally you're right, it should be read ついたち in this case.

raeldor raeldor 10. janúar 2013 10. janúar 2013 kl. 03:49:33 UTC flag Report link Tengill

May I ask how the kana is generated please? Are you using Mecab?

raeldor raeldor 10. janúar 2013 10. janúar 2013 kl. 04:30:48 UTC flag Report link Tengill

Given Mecab's popularity and the algorithm that's supposed to be the best, I'm a little surprised it falls on it's a$$ at something this simple.

raeldor raeldor 10. janúar 2013 10. janúar 2013 kl. 04:38:08 UTC flag Report link Tengill

I understand Mecab is trained on sample data. It would be useful if Tatoeba could accept changes to kana. That way it could become a good source of training data for morphological analyzers.

sharptoothed sharptoothed 10. janúar 2013 10. janúar 2013 kl. 10:43:28 UTC flag Report link Tengill

> I'm a little surprised it falls on it's a$$ at something this simple.
All is not that simple as it may seem, unfortunately. Japanese morphology is rather complex for machine analysis and requires good dictionaries based on representative corpus. Currently, only a few dictionaries for MeCab available (I know about IPADIC, JUMADIC and NAISTDIC). Depending on the dictionary, MeCab may produce different results. Take a look at "best" results:
IPADIC
四月 名詞,副詞可能,*,*,*,*,四月,シガツ,シガツ
一 名詞,数,*,*,*,*,一,イチ,イチ
日 名詞,接尾,助数詞,*,*,*,日,ニチ,ニチ
です 助動詞,*,*,*,特殊・デス,基本形,です,デス,デス
。 記号,句点,*,*,*,*,。,。,。
JUMADIC
四 名詞,数詞,*,*,四,よっ,*
月 接尾辞,名詞性名詞助数辞,*,*,月,がつ,*
一 名詞,数詞,*,*,一,ひと,*
日 接尾辞,名詞性名詞助数辞,*,*,日,にち,*
です 判定詞,*,判定詞,デス列基本形,だ,です,*
。 特殊,句点,*,*,。,。,*

As we can see, the latter is even more inaccurate than the one we have on Tatoeba. Actually, with IPADIC it's possible to find more accurate result but, from MeCab's point of view, it's not the best:
四月 名詞,副詞可能,*,*,*,*,四月,シガツ,シガツ
一日 名詞,固有名詞,地域,一般,*,*,一日,ヒトイチ,ヒトイチ
です 助動詞,*,*,*,特殊・デス,基本形,です,デス,デス
。 記号,句点,*,*,*,*,。,。,。

I hope the dictionaries and algorithms will continue to improve but, for the time being, we have what we have.

raeldor raeldor 10. janúar 2013 10. janúar 2013 kl. 15:19:23 UTC flag Report link Tengill

It was mentioned in another thread that adding meta-data to the sentences is perhaps on the cards. This could be a fantastic mechanism for overriding the default furigana in the case of mistakes.

raeldor raeldor 20. ágúst 2018 20. ágúst 2018 kl. 04:50:40 UTC flag Report link Tengill

Hi. Hate to keep harping on here. This has been reported as incorrect by another user. Interestingly the sentence on this site now shows as 'ついたち', which is correct. Unfortunately the Japan Indexes file still shows as 'いちにち'. Are you not using the Japan Indexes file for furigana? What's the relationship now between this site and the Japan Indexes if you're not using them anymore? Thanks.

JimBreen JimBreen 20. ágúst 2018 20. ágúst 2018 kl. 05:26:40 UTC flag Report link Tengill

AFAIK the furigana is generated on the spot. The indices are static and maintained by hand. I've corrected this one to ついたち. If you see any others in error let me know directly.

raeldor raeldor 6. maí 2021 6. maí 2021 kl. 04:21:45 UTC flag Report link Tengill

Hi Jim,

Long time no hear. I'm finding a few more places where the index appears to be incorrect. Are you still the go-to man to fix these? For example in 186382, it's tagged...

我々 が どの位{どのくらい} 原子力 に 頼る{頼っている} か 落ちつく{落ち着いて} 考える{考えて} 見る[05]{みよう}

Where 落ち着いて with two kanji is tagged to 落ちつく with one kanji instead of 落ち着く with two kanji.

Are you still fixing these by hand? Thanks!

JimBreen JimBreen 6. maí 2021 6. maí 2021 kl. 05:33:53 UTC flag Report link Tengill

It's not actually an error. The indexing relates to the dictionary entry, which has several surface forms: 落ち着く; 落ちつく; 落着く; 落ち付く; 落付く. The linking form happens to be 落ちつく. There are 88 sentences linked to this entry, and we use the same linking form regardless of the form that actually appears in the text, i.e. the {...} part.

Metadata

close

Listar

Sentence text

License: CC BY 2.0 FR

Saga

We cannot determine yet whether this sentence was initially derived from translation or not.

四月一日です。

bætt við af óþekktum meðlima — dagsetning óþekkt

tengd af óþekktum meðlima — dagsetning óþekkt

tengd af Raizin — 23. október 2015

tengd af Aiji — 4. október 2017

tengd af Yorwba — 6. ágúst 2019

#10045487

tengd af Endrio — 18. maí 2021

#10045487

aftengd af Horus — 2. júlí 2021

tengd af Horus — 2. júlí 2021