Menu
*** Editable transcriptions progress ***
Lately, I’ve been working on the editable transcriptions feature and set it up on http://dev.tatoeba.org/ (don’t be afraid by the big blue announcement text, it’s another work in progress by Trang). I tried to address the comments of the previous thread¹. You’re welcome to check it out and comment.
Notable changes:
• Design update, new icons, (hopefully) better usability.
• Added a button to edit (instead of clicking on the text).
• Added a button to verify transcription (instead of having to click edit → OK).
• Changed the toggle transcription button with per-transcription toggle buttons.
• Helpful errors displayed on invalid transcriptions (only furigana has strict validation for the moment).
• Various bugs fixes.
¹ https://tatoeba.org/wall/show_m...#message_22870
I really liked that! It would help both students and some translators a lot. But won't it be implemented to the Slavic and Semitic languages or any other language that's not "romanized" (that don't use the Roman script) such as Greek?
It could. Actually, the way I implemented it makes it very easy to do it for other languages as well. However, whether adding romanization for “any other language that's not "romanized" (that don't use the Roman script)” is a good thing or not is something members seems to have different opinions about. See for example this thread: https://tatoeba.org/wall/show_m...#message_21405
> whether adding romanization for “any other language that's not "romanized" (that don't use the Roman script)” is a good thing...
I think, for different languages we could add different data that is actual to a particular language. For Russian, for example, we could provide a version of a sentence with stress marks. Romanization is just one of the options.
Stress marks are good idea, too. For example, Serbian has several types of stress, so they might be very helpful.
1. It would be nice if there was a button to verify a transcription with a single click.
2. Since the machine-gendrated readings of Japanese numerals are almost always wrong, you may as well remove them. It's quite troublesome to turn "3{さん}0{ぜろ}分{ふん}" into "30分{さんじゅっぷん}".
3. Chinese automatic transcriptions could be improved a little.
For example, "fǎyǔ zhōng ,“soleil” shì tàiyáng de yìsi 。" should be turned into "Fǎyǔ zhōng, "soleil" shì tàiyáng de yìsi."
> 1. It would be nice if there was a button to verify a transcription with a single click.
I moved the warning icon to the left and made it a click-to-review button.
> 2. Since the machine-gendrated readings of Japanese numerals are almost always wrong, you may as well remove them. It's quite troublesome to turn "3{さん}0{ぜろ}分{ふん}" into "30分{さんじゅっぷん}".
Done.
> 3. Chinese automatic transcriptions could be improved a little.
I know. This is yet to be done.
When I visited the site yesterday, I didn't notice the X mark next to the warning message that auto-generated transcriptions might be wrong. A text like "Don't show again" might be more user-friendly.
Would it be possible to add such a feature to other languages such as Hakka Chinese, Sumerian, or Khmer? Khmer might be harder as not even Google Translate does it well, though there is a site that gives good transliterations (though not really IPA): dictionary.tovnah.com
As for Sumerian, there is a system we use, and you can see it on the Wikipedia article: https://en.wikipedia.org/wiki/C...dian_Cuneiform
Though the "Primary" transliterations are not always accurate (à la Japanese-style)
I created an article on the wiki for new transcription requests: http://en.wiki.tatoeba.org/arti...iption-request
I’m still not satisfied with the permission system but I don’t know how to improve it.
The current system is as follows:
• New transcriptions can be added by anyone.
• When a transcription was last edited by someone else than the sentence owner, only the transcription author, the sentence owner, corpus maintainers and admins may edit it.
• When a transcription was last edited by the owner of the sentence, only the sentence owner, corpus maintainers and admins may edit it.
So the current system is rather open, in order to allow transcriptions to be added on sentences of inactive contributors or contributors who don’t want to bother transcribing their sentences or just can’t because they don’t know the transcription script. But it also allow vandalism while we don’t have any countermeasure. I also feel it’s a bit too complex to understand for newcomers.