menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
Jens_Odo Jens_Odo January 10, 2015, edited January 10, 2015 January 10, 2015 at 6:10:28 PM UTC, edited January 10, 2015 at 6:13:43 PM UTC link Permalink

I would like to suggest that the automatic transliteration of Georgian be improved. At present it's not fully coherent, partially conforming to the IPA alphabet, partially not (for obscure reasons), in the case of ქ and ყ completely misleading. Here my suggestions:
ა a
ბ b
გ ɡ (currently: g)
დ d
ე e
ვ v
ზ z
თ tʰ
ი i
კ k’
ლ l
მ m
ნ n
ო o
პ p’
ჟ ʒ
რ r
ს s
ტ t’
უ u
ფ pʰ
ქ kʰ (currently: q)
ღ ɣ (currently: gh)
ყ q’ (currently: kh)
შ ʃ
ჩ ʧʰ (currently: ch)
ც ʦʰ (currently: ʦ)
ძ ʣ (currently: dz)
წ ʦ’ (currently: ts)
ჭ ʧ’ (currently: tch)
ხ x
ჯ ʤ
ჰ h

{{vm.hiddenReplies[21405] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux January 11, 2015 January 11, 2015 at 9:42:57 AM UTC link Permalink

Thank you for reporting this, Johannes. I know nothing to Georgian, but according to these tables¹, our current transliteration table sounds like a mix of IPA and « National system
(2002) ». I think we should either stick to one or the other. In your opinion, would be the most helpful for Georgian learners, IPA or national system?

1. https://en.wikipedia.org/wiki/R...teration_table

tommy_san tommy_san January 11, 2015 January 11, 2015 at 5:59:17 PM UTC link Permalink

I wonder why Georgian sentences have transliterations in the first place. Do native speakers use both scripts as in Persian and Uzbek?

Back in 2010, someone wrote on the Wall that we need transliterations of Sanskrit, Urdu, Yiddish, Armenian, Persian, Kazakh, Iraqi Arabic, Bengali, Greek, Uzbek, Egyptian Arabic, Korean, Hebrew, Tatar, Serbian, Bulgarian, Cantonese, Belarusian, Uighur, Hindi, Arabic, Ukrainian and Russian sentences. Would you agree?

{{vm.hiddenReplies[21417] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK January 11, 2015, edited October 30, 2019 January 11, 2015 at 6:25:38 PM UTC, edited October 30, 2019 at 7:31:29 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[21419] ? 'expand_more' : 'expand_less'}} hide replies show replies
Jens_Odo Jens_Odo January 11, 2015 January 11, 2015 at 8:53:40 PM UTC link Permalink

Thanks for your replies. I don't know whether a transliteration is necessary at all. I haven't seen other instances of transliteration on Tatoeba yet. But I stumbled over the obvious inconsistencies of the current system. And as it sticks mostly to IPA signs, I suggested to do that consequently (with the exception of e and o perhaps which in Georgian are pronounced as open vowels). The so-called national transcription system is rather for standardizing the rendering of Georgian names in an international context, but I don't think that it's the best option for a linguistic purpose.

{{vm.hiddenReplies[21426] ? 'expand_more' : 'expand_less'}} hide replies show replies
coolmona coolmona January 14, 2015 January 14, 2015 at 2:57:15 PM UTC link Permalink

Those who know Georgian definitely don't need neither transcription, nor transliteration :-). But those who don't know it, I guess, would be glad to have either things. Same about other non-Latin script languages.

{{vm.hiddenReplies[21466] ? 'expand_more' : 'expand_less'}} hide replies show replies
tommy_san tommy_san January 14, 2015, edited January 14, 2015 January 14, 2015 at 3:39:11 PM UTC, edited January 14, 2015 at 5:36:44 PM UTC link Permalink

Japanese speakers would be glad to have katakana transcriptions for any non-Japanese language.

(ジャパニーズ・スピーカーズ・ウッド・ビー・グラッド・トゥー・ハヴ・カタカナ・トランスクリプションズ・フォア・エニー・ノン・ジャパニーズ・ラングエッジ。)

http://blogs.c.yimg.jp/res/blog...g_1?1368627399

AlanF_US AlanF_US January 12, 2015, edited January 12, 2015 January 12, 2015 at 1:11:21 AM UTC, edited January 12, 2015 at 2:28:55 AM UTC link Permalink

> someone wrote that we ... need transliterations of Sanskrit, Urdu, Yiddish, Armenian, Persian, Kazakh, Iraqi Arabic, Bengali, Greek, Uzbek, Egyptian Arabic, Korean, Hebrew, Tatar, Serbian, Bulgarian, Cantonese, Belarusian, Uighur, Hindi, Arabic, Ukrainian and Russian sentences. Would you agree?

Do we NEED transliterations? Probably not. Would they be useful? To establish that, we should ask the following questions:
(1) Is it feasible to produce automatically produced transliterations of high quality?
(2) If transliterations cannot be automatically produced, is it feasible to expect contributors to supply or correct them?
(3) Is there a sizable community of people who would be helped by their existence?
(4) Is the benefit of adding transliterations substantial enough to justify the work that might be done in other areas of Tatoeba instead?

If I asked myself these questions with regard to Hebrew (the one language from the list that I would be qualified to address), the answers would all be "no". I suspect the answers would be the same for other languages in the list.

tanay tanay January 14, 2015 January 14, 2015 at 5:43:47 PM UTC link Permalink

Umm, I might be missing something, but why do we need transliterations for any language (except may be for languages using a logographic writing system)?

{{vm.hiddenReplies[21469] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK January 14, 2015, edited October 30, 2019 January 14, 2015 at 6:10:05 PM UTC, edited October 30, 2019 at 7:31:21 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[21470] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux January 14, 2015 January 14, 2015 at 9:49:51 PM UTC link Permalink

> However, if the transliterations aren't 100% correct, they can be misleading and possibly bad for language learners.

I don’t think we can consider transliterations as bad for language learners. It’s just a tool. Bad things can happen when you misuse a tool, but you can’t blame the tool, only the people using it. I think people should just consider that any transliteration can’t be 100% correct and use it with caution. Just like a translation can’t convey all the meaning, a transliteration can’t convey all the writing. You know, even Hepburn can’t distinguish 景気 and ケーキ.

tanay tanay January 15, 2015 January 15, 2015 at 11:33:09 AM UTC link Permalink

> Of course, it helps lower-level language learners to read a
> language they wouldn't otherwise be able to read.

Isn't the writing system, along with the pronunciation, the first thing someone learning a language should learn (except in cases where the language does not have its own writing system, as sabretou mentioned below)?

gillux gillux January 14, 2015 January 14, 2015 at 9:37:03 PM UTC link Permalink

> why do we need transliterations for any language (except may be for languages using a logographic writing system)?

Because Tatoeba may be used by any combination of people speaking language X and learning language Y. A transcription may help as long as Y uses a writing system unintelligible for most people speaking X. For instance there are many people who can’t read the Latin alphabet, and for them a transcription into their native writing system would be as helpful as a transcription of say Japanese into Latin is to you.

A transcription that I think every language would benefit is IPA. It could be used as a gap to produce transcriptions for any combination of language in theory.

{{vm.hiddenReplies[21472] ? 'expand_more' : 'expand_less'}} hide replies show replies
tanay tanay January 15, 2015 January 15, 2015 at 12:28:50 PM UTC link Permalink

Someone knowing Arabic and trying to learn English would learn the Latin alphabet first, isn't it? I think a beginner level learner of English without any knowledge of the alphabet would be able to get any kind of help from Tatoeba, even with the transliteration. But then again, I am not an expert on language teaching methods.

As for the Japanese transliteration, Kana is good enough, I think.

The only languages which need a transliteration system are the ones using the Chinese characters and those which do not have their own writing system.

BTW, I am not getting into any kind of heated debate. I am just sharing my thoughts. :D

{{vm.hiddenReplies[21486] ? 'expand_more' : 'expand_less'}} hide replies show replies
odexed odexed January 15, 2015, edited January 15, 2015 January 15, 2015 at 1:17:52 PM UTC, edited January 15, 2015 at 1:22:59 PM UTC link Permalink

I share your point of view. Speaking about Arabic, I think we just can't make a good automatic transliteration since it's impossible. Arabic sentences are written without vowels so you have to get accustomed to reading it properly. Sometimes we can find Arabic scripts with harakats but very seldom. Besides, some vowels change according to the word order, grammatical structure (existence of particles) and the type of verb.

As for Russian, everyone who learns this language starts with alfabet which has some similar letters with English. It would be much more useful to read russian letters as is than try to transliterate it.

tommy_san tommy_san January 15, 2015 January 15, 2015 at 1:52:20 PM UTC link Permalink

I think the point is that we can't and don't have to and perhaps shouldn't provide everything that could be useful for learners.

I agree with CK that maybe Tatoeba.org shouldn't be useful for absolute beginners who haven't even mastered the script of the language, since some of them would surely contaminate the corpus with untrustworthy sentences and translations.

I understand Tatoeba.org as a workplace. We collect information that needs to be provided by native speakers. If machines can't properly read sentences, it makes sense that we manually add reading aids. I guess transliterations in languages like Uzbek can be useful for contributors. If you find other kinds of transcriptions (or whatever) useful, you can make your own website using our data.

> BTW, I am not getting into any kind of heated debate. I am just sharing my thoughts. :D

Me, too. ☺

sabretou sabretou January 14, 2015 January 14, 2015 at 9:55:56 PM UTC link Permalink

It's useful for languages like Chinese or Japanese, of course. But in Uzbek's case, for example, it has to do with a lack of a commonly accepted script. Sometimes there is debate on which script a language should be written in, and even if one is declared 'official', the other might still be reasonably prevalent. This was a common problem across former-Soviet Union states, I believe. It's also present in South Asia, where, for one example, Konkani is written in Devanagari, Kannada and Roman scripts.

In cases like these, it might be helpful to include both (or more) commonly used scripts, as potential learners will likely be learning one or both of them. In Chinese's case as well, it is only reasonable to assume that a learner of Chinese will also learn pinyin; and for Japanese, the kana syllabary.

However, we do not need a transliteration for every script into every other script. Not only is that entirely impractical, it is far outside the scope of this project. Scripts and other systems can be reasonably found without having to force Tatoeba to use them.

When considering learners of a particular language, we must only consider the language that is being learned, not the native language of the learner. That business can be left to textbooks, guides, translators and other learning tools.

tommy_san tommy_san January 15, 2015 January 15, 2015 at 1:58:55 AM UTC link Permalink

> why do we need transliterations for any language?

That was my initial question.
http://tatoeba.org/wall/show_me...#message_21417

Different people would find different kinds of transliterations or transcriptions useful. Let's take the Russian sentence "Он очень похож на своего отца." (#374741) as an example.

(1) If you're learning the Cyrillic alphabet right now, or if you want to learn Russian (or just a few sentences) without the Cyrillic alphabet, you might want a transliteration.
"On ochen' pohozh na svoego otca."
"On očen' pochož na svoego otca."
Note that these transliterations have nothing to do with the pronunciation. For example, the "g" in "svoego" is pronounced [v].
Rōmaji in Japanese falls in here. We write "ashita" (明日/あした), for example, but we actually pronounce [aɕta] without "i".

(2) You might welcome a rough phonetic transcription written for the speakers of your language.
"ohn OH-cheen' pah-KHOZH nuh svuh-ee-VOH aht-TSAH." (for English speakers)
"on ótschinʲ pachósh na swajiwó atzá." (for German speakers)
"オン・オーチン・パホージュ・ナ・スヴァイヴォー・アッツァ。" (for Japanese speakers)
Needless to say, this kind of thing is inaccurate and misleading in most cases and there's often no standardized way of transcription.

(3) If you know the Cyrillic alphabet and the pronunciation rules, you'd be able to pronounce more or less correctly if there are accent marks (and "ё"). This is what sentences in many textbooks look like.
"Он о́чень похо́ж на своего́ отца́."
Furigana in Japanese, niqqud in Hebrew, tashkil in Arabic and macrons in Latin would be categorized here.

(4) If you're a serious student, you'd be happy if there were IPA transcriptions.
"[on ˈot͡ɕɪnʲ pɐˈxoʐ nə svəjɪˈvo ɐtˈt͡sa]" (Is this correct?)

I agree with other members that (2) is not for Tatoeba.
I'm also not sure if (1) is wanted, except when more than one scripts are parallelly used by native speakers. When you put this kind of transliteration, it should probably be as discreet as the rōmaji now.
(3) might be worth considering, though it would require some or much manual work. I like this option because it doesn't use a foreign script.
The IPA (4) is great, but the problem is most native speakers cannot write it and most learners cannot read it.

In addition to these, when a sentence uses a non-phonetic script (for example Arabic numerals) and when there's a way to write it in another way ("two thousand and fifteen"), it would be nice to have that information in the corpus.