Wall (7,068 threads)
Tips
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
liangjkds1
2 hours ago
AlanF_US
12 hours ago
xilitante
yesterday
atitarev
2 days ago
atitarev
2 days ago
deniko
2 days ago
atitarev
2 days ago
deniko
2 days ago
atitarev
2 days ago
deniko
2 days ago

Hi,
Do we provide transliteration or full vocalisation for languages such as Arabic?
I have only seen partial vocalisation (for disambiguations) and no transliteration.
If I understand correctly, Mandarin Chinese is the only language featuring transliterations and Japanese can also have furigana.

It would indeed be very nice if Arabic featured full vocalisation, just as Chinese and Japanese offer such aids as well.

Thanks for the support, @Waldelfe,
By "vocalisation" I actually mean providing vowels (diacritics) (Arabic حَرَكات ḥarakāt), which makes the pronunciation of Arabic words unambiguous. Just in case you or someone reading the thread don't know
Example sentence:
How do I get to the train station?
Unvocalised Arabic text: كيف أصل إلى محطة القطار؟
Vocalised: كَيْفَ أَصِلُ إِلَى مَحَطَّةِ الْقِطَار؟
Transliteration: kayfa ʾaṣilu ʾilā maḥaṭṭati l-qiṭār(i)?
There is more than one way to transliterate Arabic, so this is just an example above.
The level of vocalisation can vary, especially on final vowels or their absence (ʾiʿrāb - إِعْرَاب).
E.g. مَحَطَّةُ الْقِطَارِ (maḥaṭṭatu l-qiṭāri) in the nominative case is more detailed than مَحَطَّة الْقِطار maḥaṭṭat al-qiṭār or when something is considered "obvious" and doesn't require consistent diacritics.
Let's have a discussion on this. I contribute Arabic translations on Wiktionary and the vast majority of Arabic word and sentence translations have diacritics and automated transliterations. It should be doable in Tatoeba as well, although it's not possible to do it automatically. It will require some effort (can also be error-prone!).

Hello, thank you for bringing this up.
I did a quick search for Arabic vocalizers.
It looks like we could use some of these tools
https://github.com/linuxscout/mishkal
https://github.com/Barqawiz/Shakkala
to automatically generate vocalized versions of the sentences while allowing for manual correction when needed. This would also allow to find Arabic sentences by searching for their words in vocalized form, if that's any useful.
I am also interested in the cultural background and contemporary usage of vocalized Arabic. For example, I know that Japanese reading aids (furigana) is used in signs aimed at kids, child books, teenager books like manga, occasionally in "young adult" litterature, and otherwise in any text that displays very rare or ambigous Chinese characters. What is the current usage of vocalized Arabic?
Also, you mentioned that the vocalized form may vary. Does it only vary in terms of quantity of extra information (diacritics) added to the original word, or could it also vary as in different people use different "orthography" to vocalize?

First of all, all tools may be helpful but not reliable. No tool can produce a reliable vocalised Arabic text from an unvocalised one. They can help to reduce the typing and manual input effort but it all would have to checked by a knowledgeable human. No harm in researching, though.
Vocalised Arabic is used in religious texts, especially Qur'an, otherwise, the usage is very similar to Japan's furigana, mainland China's pinyin and Taiwan's bopomofo (zhuyin fuhao). I possess a few vocalised readers, dictionaries and textbooks. Noteably, Oxford or Larousse Arabic dictionaries, Russian author Ilya Frank's Arabic Joha adventures book, Lingualism pdf bilingual books in MSA have full vocalisations and audio.
On various level's vocalisation. Let's say, we want to write "the large book" - الكتاب الكبير (al-kitāb al-kabīr), also al-kitābu al-kabīr(u) or al-kitābu l-kabīr(u)
1. Full vocalisation in the nominative case can be الْكِتَابُ الْكَبِيرُ - al-kitābu al-kabīru or al-kitābu l-kabīru with markings for cases - ʾiʿrāb - https://en.wikipedia.org/wiki/ʾIʿrab. This can be even more pedantic with الْكِتَابُ ٱلْكَبِيرُ, marking the first alif in the second word as silent (a less common diacritic). The final vowels used in the ʾiʿrāb is often omitted for various reasons. Vowels are often unmarked when they are considered obvious by native speakers or were introduced earlier for learners. E.g. كِتَاب (kitab) can be written as كِتاب.
2. Some diacritics are seldom used or even missing on Arabic keyboards, e.g. هٰذَا (hāḏā , “this) with a dagger alif (a vertical stick). So, you may see in a vocalized text هَذا, which would be haḏā, an incorrect shortening for the lack of the diacritic (or unwillingness to use it).

Oh, by the way @gillux. I don't have issues with using English personal or city names in Arabic and problems with transliterations are exaggerated.
So "Tom" is written as توم in Arabic, or تُوم with a vocalisation and no final case ending is added to foreign names like this. It would produce "tūm" but Arabs recognise foreign names and pronounce them imitating the foreign pronunciation for known personal or geographical names (sometimes adjusting to the Arabic phonology). It depends on the speaker, of course and there may be multiple ways of saying foreign names or loanwords.

Hello, the search problem is solved now, thank you for reporting it and sorry for the inconvenience!

Hi, the system has been throwing errors today, we are not able to search or translate. I even tried to log out and log in again

Okazis eraro "68023405f1d53"

Hallo zusammen! 🙂
Ich habe beim Laden der Tatoeba-Seite einen Fehler erhalten.
Der Fehlercode lautet: 68022d82ce05a.
Kann mir jemand in dieser Angelegenheit weiterhelfen?

Во время поиска возникла ошибка. Если она будет повторяться, пожалуйста, свяжитесь с нами и сообщите код ошибки «6801fb7d1ba2d».

Hi,
I am Rusydy, and I would like to add my native language, Mandar, which is spoken in Indonesia, to Tatoeba. Following the instructions in this [wiki](https://en.wiki.tatoeba.org/art...age-request#), I created a [list in Tatoeba](https://tatoeba.org/en/sentence...s/show/173571) and sent a private message on the Tatoeba website about a month ago, but I have not received a response.
Today, I read the [wiki for adding a new language](https://github.com/Tatoeba/tato...-new-language) on GitHub. Therefore, I am wondering if I can open a new PR to add my endangered native language.
Thank you,
Rusydy

Good job, Rusydy ! It's so sad to see languages disappear along with all the specific cultural knowledge that goes with it. I'm sure Tatoeba will be the right place to preserve your memory.

If I may give you a warning, there's a hidden flaw in this type of collective global endeavour. That is "fitting to the norm/vibe". Each culture, and subsequently each language possess its own cultural specificities. So please do not try to "fit" to the "global culture"'s vibe, by submitting sentences that are merely translations of known sentences from modern popular languages. Please also submit sentences that are entirely specific to your language and culture, so that translators will have to really dig into it, in order to be able to translate properly if they ever can. That will precisely pinpoint the specific value of your language and culture. Learning what Tom and Mary do in Boston in your language, is not that interesting anyway and doesn't give credit to your unique cultural contribution.

Indeed. There's a very useful site for learning Kanji and Japanese terms (JPDB)... that uses Tom as its go-to name, and it bugs me so much. I could be learning various Japanese names, or even imprinting on a couple specific baseline Japanese names, but instead I get TOMU for what seems to be no good reason. I prefer to see native sentences with native names.

You can filter out sentences based on such criteria or their authors. That’s what I do.

On JPDB, you can filter out example sentences? Or are you talking on this site here?
I mean the rest of the sentence typically teaches me something useful; it's just the out-of-place name that's jarring. "今井先生は教えている" is fully native and feels right; "スミス先生は教えている" feels off because it's injecting a non-Japanese character into the action for no useful reason. (And one whose name breaks the Japanese phonology to boot.)
On this site, it makes sense when it's a translation of a sentence where the characters are named natively to the original sentence. But I always like to see whichever characters are native to the sentence's original language.

you'll never guess where it gets its sentences from :p (it's this website)
the reasoning behind using one name for everything is that it helps avoid duplicates, e.g. "tom eats pears" vs "akira eats pears", which would otherwise have to be translated twice and would disconnect potentially useful indirect translations
but yeah it definitely has its flaws, and not just cultural; e.g. i've heard that the second wildcard "mary" doesn't decline at all in russian which reduces the information in russian translations
my preferred solution is to instead be creative enough with my sentences that there's no way they'll be duplicates of existing ones :p (which i know doesn't work for all types of contributors, but i like what i like)

Dear Rusydy, it is great you are requesting a new language, but once it's approved here on tatoeba, make sure to write sentences in that language, at least over a thousand phrases. There are many languages that have less than 20 phrases, so it doesn't make sense to me.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

Hello, CK told me that system Horus takes care of exact duplicated sentences, it hasn't happened until now. Please delete Sentence #13151024, it was a typo.

✹✹ Stats & Graphs ✹✹
Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

Most interesting