menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
blay_paul blay_paul February 21, 2010 February 21, 2010 at 1:45:36 PM UTC link Permalink

Seriously - romaji editing now. ;-)

I don't think there's any point in waiting for "a serious Japanese contributor". Most of the romaji errors are very obvious and either I, or half a dozen or so regulars here, would be well able to correct them if they had the chance.

I would go as far as to say that it would be better not to have romaji AT ALL rather than leave them in the current state.

{{vm.hiddenReplies[240] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG February 21, 2010 February 21, 2010 at 6:16:09 PM UTC link Permalink

Then it may be no romaji at all... But I want opinion from more users first. Is it better to have no romaji at all, or is it better to have something even if it's not 100% correct?

I know Nemo is against romaji as well, but if I have added it, it was because more than two people had requested it in the past.

Regarding editable romaji, I'd rather avoid having people to waste time on correcting romaji which is why I don't want to make it editable.
Most of the time it's a systematic error that can be found in more than 100 other sentences. If I made romaji editable, you'd have to edit them one by one.
You'd also have to make sure everyone agrees on the romanization rules and follows them, which is again more work.

I think it's better to improve the software (not necessarily KAKASI) to the point where it can't get any better. It would save time for so many other people in the world...

Perhaps there is someone out there who is actively developing an open source Japanese parser and furgina-romaji converter. I haven't had time to search, but if you do find one (and by "you" I mean anyone who is reading this), by all means, let me know.

{{vm.hiddenReplies[241] ? 'expand_more' : 'expand_less'}} hide replies show replies
Nemo Nemo February 21, 2010 February 21, 2010 at 8:13:59 PM UTC link Permalink

I'm for having all the romaji on the site be accurate. If the best way to do that is deleting all of the romaji, then I'd say do that. If you really want an accurate romaji representation, it will probably need to be written ad hoc. I don't think this should be too hard though, so long as it is written for this project specifically, and it is done soon. This site is currently comprised mostly of the Tanaka Corpus, so far as I am aware, so almost every word in the Japanese examples should be also present in EDICT, which has the reading of every word in it in kana. If there are multiple readings, I would just make the output something like:
僕は市場へ行った
*** boku wa (shijyou | ichiba) e itta

So that the edge cases could be fixed. It's still a lot of work, but it's doable. (In this case, the difference is irrelevant, but in many it could be relevant). You could then dump the database into a text file of all beginning with ***. I believe EDICT even has the readings listed in order of frequency, so if you wanted to you could have it just guess the first one every time, and fixing the few that got put in incorrectly would not be a huge ordeal. I would recommend keeping some automatic conversion in place, and storing things in the database as:
僕は市場へ行った
ぼくはしじょうへいった
and having the conversion take place from the kana to romaji on-the-fly. Also, force those editing the romaji to use kana. Basically introduce a learning curve that will discourage those who don't know better from thinking they do. Also, changes in romanization could be implemented very easily. I personally use wapuro romaji whenever I do, which is rare still, but I know this is less than ideal for learning.

{{vm.hiddenReplies[242] ? 'expand_more' : 'expand_less'}} hide replies show replies
Nemo Nemo February 21, 2010 February 21, 2010 at 8:24:15 PM UTC link Permalink

My whole post is a waste of time, lol. The software you are using has an output to kana mode, which would not be subject to the pitfalls that romaji is. I suggest we use that. Kana is not that difficult to learn, and there's no sense in learning grammar/sentences before kana anyway.

{{vm.hiddenReplies[243] ? 'expand_more' : 'expand_less'}} hide replies show replies
Nemo Nemo February 21, 2010 February 21, 2010 at 8:33:34 PM UTC link Permalink

We need post editing, haha. JUMAN does exactly what you need. It converts from kanji to hiragana, and labels each word with what it is. So, if it says は is a 助詞 (particle), you can output wa, and the same for all of the others. I'm not sure that it outputs romaji (The sample set-up does not), but with kana and part of speech, romaji is just a lookup table away.

blay_paul blay_paul February 21, 2010 February 21, 2010 at 8:55:46 PM UTC link Permalink

> I'd rather avoid having people to waste time on
> correcting romaji which is why I don't want to make
> it editable.

That's basically another way of saying that the romaji isn't important. If the romaji isn't important I'd rather it wasn't there than be there and often incorrect. ;-)

Having a kana version or furigana would be a nice alternative. kana would get rid of the
o / wo
e / he
wa / ha
confusion. Note that a combination of Edict and the Index information could be used to generate pretty-much-correct furigana or kana. (Not that easy, but doable)

{{vm.hiddenReplies[245] ? 'expand_more' : 'expand_less'}} hide replies show replies
JeroenHoek JeroenHoek March 1, 2010 March 1, 2010 at 9:39:11 AM UTC link Permalink

I agree with Paul that furigana might me preferable to broken rōmaji. Learning the basics of kana shouldn't take you more than a month or two, after that, kanji readings become the hard part. Furigana should, in my opinion, eliminate the need for rōmaji for learners of Japanese.

Rōmaji is mostly useful for transcribing Japanese for a public that cannot read any Japanese at all. Also, the rōmaji generated by Kakasi is wāpuro-rōmaji.