Menu
I would like to suggest that the automatic transliteration of Georgian be improved. At present it's not fully coherent, partially conforming to the IPA alphabet, partially not (for obscure reasons), in the case of ქ and ყ completely misleading. Here my suggestions:
ა a
ბ b
გ ɡ (currently: g)
დ d
ე e
ვ v
ზ z
თ tʰ
ი i
კ k’
ლ l
მ m
ნ n
ო o
პ p’
ჟ ʒ
რ r
ს s
ტ t’
უ u
ფ pʰ
ქ kʰ (currently: q)
ღ ɣ (currently: gh)
ყ q’ (currently: kh)
შ ʃ
ჩ ʧʰ (currently: ch)
ც ʦʰ (currently: ʦ)
ძ ʣ (currently: dz)
წ ʦ’ (currently: ts)
ჭ ʧ’ (currently: tch)
ხ x
ჯ ʤ
ჰ h
Thank you for reporting this, Johannes. I know nothing to Georgian, but according to these tables¹, our current transliteration table sounds like a mix of IPA and « National system
(2002) ». I think we should either stick to one or the other. In your opinion, would be the most helpful for Georgian learners, IPA or national system?
1. https://en.wikipedia.org/wiki/R...teration_table
I wonder why Georgian sentences have transliterations in the first place. Do native speakers use both scripts as in Persian and Uzbek?
Back in 2010, someone wrote on the Wall that we need transliterations of Sanskrit, Urdu, Yiddish, Armenian, Persian, Kazakh, Iraqi Arabic, Bengali, Greek, Uzbek, Egyptian Arabic, Korean, Hebrew, Tatar, Serbian, Bulgarian, Cantonese, Belarusian, Uighur, Hindi, Arabic, Ukrainian and Russian sentences. Would you agree?
[not needed anymore- removed by CK]
Thanks for your replies. I don't know whether a transliteration is necessary at all. I haven't seen other instances of transliteration on Tatoeba yet. But I stumbled over the obvious inconsistencies of the current system. And as it sticks mostly to IPA signs, I suggested to do that consequently (with the exception of e and o perhaps which in Georgian are pronounced as open vowels). The so-called national transcription system is rather for standardizing the rendering of Georgian names in an international context, but I don't think that it's the best option for a linguistic purpose.
Those who know Georgian definitely don't need neither transcription, nor transliteration :-). But those who don't know it, I guess, would be glad to have either things. Same about other non-Latin script languages.
Japanese speakers would be glad to have katakana transcriptions for any non-Japanese language.
(ジャパニーズ・スピーカーズ・ウッド・ビー・グラッド・トゥー・ハヴ・カタカナ・トランスクリプションズ・フォア・エニー・ノン・ジャパニーズ・ラングエッジ。)
http://blogs.c.yimg.jp/res/blog...g_1?1368627399
> someone wrote that we ... need transliterations of Sanskrit, Urdu, Yiddish, Armenian, Persian, Kazakh, Iraqi Arabic, Bengali, Greek, Uzbek, Egyptian Arabic, Korean, Hebrew, Tatar, Serbian, Bulgarian, Cantonese, Belarusian, Uighur, Hindi, Arabic, Ukrainian and Russian sentences. Would you agree?
Do we NEED transliterations? Probably not. Would they be useful? To establish that, we should ask the following questions:
(1) Is it feasible to produce automatically produced transliterations of high quality?
(2) If transliterations cannot be automatically produced, is it feasible to expect contributors to supply or correct them?
(3) Is there a sizable community of people who would be helped by their existence?
(4) Is the benefit of adding transliterations substantial enough to justify the work that might be done in other areas of Tatoeba instead?
If I asked myself these questions with regard to Hebrew (the one language from the list that I would be qualified to address), the answers would all be "no". I suspect the answers would be the same for other languages in the list.
Umm, I might be missing something, but why do we need transliterations for any language (except may be for languages using a logographic writing system)?
[not needed anymore- removed by CK]
> However, if the transliterations aren't 100% correct, they can be misleading and possibly bad for language learners.
I don’t think we can consider transliterations as bad for language learners. It’s just a tool. Bad things can happen when you misuse a tool, but you can’t blame the tool, only the people using it. I think people should just consider that any transliteration can’t be 100% correct and use it with caution. Just like a translation can’t convey all the meaning, a transliteration can’t convey all the writing. You know, even Hepburn can’t distinguish 景気 and ケーキ.
> Of course, it helps lower-level language learners to read a
> language they wouldn't otherwise be able to read.
Isn't the writing system, along with the pronunciation, the first thing someone learning a language should learn (except in cases where the language does not have its own writing system, as sabretou mentioned below)?
> why do we need transliterations for any language (except may be for languages using a logographic writing system)?
Because Tatoeba may be used by any combination of people speaking language X and learning language Y. A transcription may help as long as Y uses a writing system unintelligible for most people speaking X. For instance there are many people who can’t read the Latin alphabet, and for them a transcription into their native writing system would be as helpful as a transcription of say Japanese into Latin is to you.
A transcription that I think every language would benefit is IPA. It could be used as a gap to produce transcriptions for any combination of language in theory.
Someone knowing Arabic and trying to learn English would learn the Latin alphabet first, isn't it? I think a beginner level learner of English without any knowledge of the alphabet would be able to get any kind of help from Tatoeba, even with the transliteration. But then again, I am not an expert on language teaching methods.
As for the Japanese transliteration, Kana is good enough, I think.
The only languages which need a transliteration system are the ones using the Chinese characters and those which do not have their own writing system.
BTW, I am not getting into any kind of heated debate. I am just sharing my thoughts. :D
I share your point of view. Speaking about Arabic, I think we just can't make a good automatic transliteration since it's impossible. Arabic sentences are written without vowels so you have to get accustomed to reading it properly. Sometimes we can find Arabic scripts with harakats but very seldom. Besides, some vowels change according to the word order, grammatical structure (existence of particles) and the type of verb.
As for Russian, everyone who learns this language starts with alfabet which has some similar letters with English. It would be much more useful to read russian letters as is than try to transliterate it.
I think the point is that we can't and don't have to and perhaps shouldn't provide everything that could be useful for learners.
I agree with CK that maybe Tatoeba.org shouldn't be useful for absolute beginners who haven't even mastered the script of the language, since some of them would surely contaminate the corpus with untrustworthy sentences and translations.
I understand Tatoeba.org as a workplace. We collect information that needs to be provided by native speakers. If machines can't properly read sentences, it makes sense that we manually add reading aids. I guess transliterations in languages like Uzbek can be useful for contributors. If you find other kinds of transcriptions (or whatever) useful, you can make your own website using our data.
> BTW, I am not getting into any kind of heated debate. I am just sharing my thoughts. :D
Me, too. ☺
It's useful for languages like Chinese or Japanese, of course. But in Uzbek's case, for example, it has to do with a lack of a commonly accepted script. Sometimes there is debate on which script a language should be written in, and even if one is declared 'official', the other might still be reasonably prevalent. This was a common problem across former-Soviet Union states, I believe. It's also present in South Asia, where, for one example, Konkani is written in Devanagari, Kannada and Roman scripts.
In cases like these, it might be helpful to include both (or more) commonly used scripts, as potential learners will likely be learning one or both of them. In Chinese's case as well, it is only reasonable to assume that a learner of Chinese will also learn pinyin; and for Japanese, the kana syllabary.
However, we do not need a transliteration for every script into every other script. Not only is that entirely impractical, it is far outside the scope of this project. Scripts and other systems can be reasonably found without having to force Tatoeba to use them.
When considering learners of a particular language, we must only consider the language that is being learned, not the native language of the learner. That business can be left to textbooks, guides, translators and other learning tools.
> why do we need transliterations for any language?
That was my initial question.
http://tatoeba.org/wall/show_me...#message_21417
Different people would find different kinds of transliterations or transcriptions useful. Let's take the Russian sentence "Он очень похож на своего отца." (#374741) as an example.
(1) If you're learning the Cyrillic alphabet right now, or if you want to learn Russian (or just a few sentences) without the Cyrillic alphabet, you might want a transliteration.
"On ochen' pohozh na svoego otca."
"On očen' pochož na svoego otca."
Note that these transliterations have nothing to do with the pronunciation. For example, the "g" in "svoego" is pronounced [v].
Rōmaji in Japanese falls in here. We write "ashita" (明日/あした), for example, but we actually pronounce [aɕta] without "i".
(2) You might welcome a rough phonetic transcription written for the speakers of your language.
"ohn OH-cheen' pah-KHOZH nuh svuh-ee-VOH aht-TSAH." (for English speakers)
"on ótschinʲ pachósh na swajiwó atzá." (for German speakers)
"オン・オーチン・パホージュ・ナ・スヴァイヴォー・アッツァ。" (for Japanese speakers)
Needless to say, this kind of thing is inaccurate and misleading in most cases and there's often no standardized way of transcription.
(3) If you know the Cyrillic alphabet and the pronunciation rules, you'd be able to pronounce more or less correctly if there are accent marks (and "ё"). This is what sentences in many textbooks look like.
"Он о́чень похо́ж на своего́ отца́."
Furigana in Japanese, niqqud in Hebrew, tashkil in Arabic and macrons in Latin would be categorized here.
(4) If you're a serious student, you'd be happy if there were IPA transcriptions.
"[on ˈot͡ɕɪnʲ pɐˈxoʐ nə svəjɪˈvo ɐtˈt͡sa]" (Is this correct?)
I agree with other members that (2) is not for Tatoeba.
I'm also not sure if (1) is wanted, except when more than one scripts are parallelly used by native speakers. When you put this kind of transliteration, it should probably be as discreet as the rōmaji now.
(3) might be worth considering, though it would require some or much manual work. I like this option because it doesn't use a foreign script.
The IPA (4) is great, but the problem is most native speakers cannot write it and most learners cannot read it.
In addition to these, when a sentence uses a non-phonetic script (for example Arabic numerals) and when there's a way to write it in another way ("two thousand and fifteen"), it would be nice to have that information in the corpus.