menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
nickyeow nickyeow September 6, 2014 September 6, 2014 at 9:02:39 PM UTC link Permalink

I think this has been discussed before, but is it possible to make the romanisation of Cantonese sentences editable?

It is simply impossible for computer-generated romanisation to be 100% accurate. To illustrate the problem, the final particle 呀 can be pronounced as aa1, aa3, aa4 and aa6 depending on the tone you want to convey...

{{vm.hiddenReplies[20341] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US September 8, 2014 September 8, 2014 at 3:52:35 AM UTC link Permalink

Yes, this has been discussed. It would take me a while to find the thread, though.

User55521 User55521 September 8, 2014, edited September 8, 2014 September 8, 2014 at 4:42:48 PM UTC, edited September 8, 2014 at 4:53:43 PM UTC link Permalink

Here are related tickets in the issue tracker:
https://github.com/Tatoeba/tatoeba2/issues/264
https://www.assembla.com/spaces...ctivity/ticket [old ticket]
https://github.com/Tatoeba/tatoeba2/issues/87 [related problem: arabic transcription]

Happy Mid-Autumn Festival, by the way! :)

{{vm.hiddenReplies[20354] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux September 8, 2014 September 8, 2014 at 7:33:23 PM UTC link Permalink

I actually started to draft a possible implementation in that ticket: https://github.com/Tatoeba/tatoeba2/issues/77

It would be great we could list all the languages that could benefit from having editable alternative scripts, so we can implement a solution that could easily be ported to other languages. We basically need to know:
— the language;
— its alternative script(s), how they derivate from the main script and what are they used for (a link to the Wikipedia page should be enough);
— whether the script(s) can be computer-generated with 100 % accuracy;
— if not, what are the tools out there that can generate a partially accurate script;

{{vm.hiddenReplies[20355] ? 'expand_more' : 'expand_less'}} hide replies show replies
tommy_san tommy_san September 16, 2014 September 16, 2014 at 10:56:30 AM UTC link Permalink

Glad to hear that!
I think every language would require some kind of reading aid at least partially to show how to read, say, "2014" or "Louis XIV".
You need also to keep in mind that sometimes multiple readings are possible (eg. 何 and 明日 in Japanese). Quality of reading aids should be controlled much the same way as the sentences, since there are some readings that are theoretically possible but unlikely.

{{vm.hiddenReplies[20424] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux September 16, 2014 September 16, 2014 at 12:23:19 PM UTC link Permalink

> I think every language would require some kind of reading aid at least partially to show how to read, say, "2014" or "Louis XIV".

That makes sense. Actually I have the feeling that the way furiganas are implemented is kinda wrong because they are treated as an alternative script, just like e.g. romanization of Chinese. As a result, Japanese sentences are displayed twice (one with furiganas and one without), which is a total waste a space and a bad way of presenting sentences. Implementing reading aids as a different concept would both solve this bad presentation and make it available for any language.

On the other hand, although reading aids may help with pronunciation, they are inexistant in Latin-based languages, so we won’t be showing the reality of these languages by attaching reading aids. This could e.g. trick Japanese learners into thinking that English can actually have reading aids just like Japanese uses furiganas.

> You need also to keep in mind that sometimes multiple readings are possible (eg. 何 and 明日 in Japanese). Quality of reading aids should be controlled much the same way as the sentences, since there are some readings that are theoretically possible but unlikely.
Do you mean we should allow multiple readings? I’m not sure it’s a good idea.

{{vm.hiddenReplies[20426] ? 'expand_more' : 'expand_less'}} hide replies show replies
tommy_san tommy_san September 16, 2014 September 16, 2014 at 12:46:18 PM UTC link Permalink

> On the other hand, although reading aids may help with pronunciation, they are inexistant in Latin-based languages, so we won’t be showing the reality of these languages by attaching reading aids.

Ah, do you mean something like this?

Zweitausenddreizehn
Tom wurde 2013 in Boston geboren.

That would look weird indeed. I haven't thought about a concrete way to display it. There should be a neater way.


> Do you mean we should allow multiple readings? I’m not sure it’s a good idea.

Yes. If you don't want to allow them, you need to allow multiple sentences that look the same. Otherwise people would end up arguing which reading is the best.

#3404030 Tom was born in Boston in 2013. (two thousand thirteen)
#12345678 Tom was born in Boston in 2013. (two thousand and thirteen)
#12345679 Tom was born in Boston in 2013. (twenty thirteen)

{{vm.hiddenReplies[20427] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux September 16, 2014 September 16, 2014 at 1:57:10 PM UTC link Permalink

I got your point. I agree that we should allow multiple readings, but then things get complicated.

If one can come up with a neat way to display these readings for Latin-based languages (like a tooltip or something), how are we going to display multiple readings for Japanese?

What will be the order of the readings (which one we put on the top or bottom of the list)?

What reading should we use for transliterations (like romanization)? We just can’t say that one of the readings is “the main one” because people are likely not to agree.

And there may be other issues I’m not thinking about yet.

Alternatively, we could say that we only allow one reading, which is up to the owner. It doesn’t mean it’s the only way to read, but it’s how the owner would read it.

{{vm.hiddenReplies[20428] ? 'expand_more' : 'expand_less'}} hide replies show replies
tommy_san tommy_san September 16, 2014 September 16, 2014 at 2:36:23 PM UTC link Permalink

> Alternatively, we could say that we only allow one reading, which is up to the owner. It doesn’t mean it’s the only way to read, but it’s how the owner would read it.

Maybe that's better. One reason is that the meaning of a sentence can change according to the readings.
映画館は人気(にんき/ひとけ)がなかった。
明日来るお客さんって何人(なんにん/なにじん)なの?

But I think we should be able to edit the readings of sentences owned by others when they're wrong. And here again arises the problem "Is this (automatically generated) reading really absolutely wrong? I'd never read it this way, but maybe some native speakers do." We'll need a sensible way to handle it.