I have problems adding Ottoman Turkish sentences written in Latin script. The punctuation order gets corrupted and it looks weird. The system only works properly when using Arabic script with Ottoman Turkish.
Some languages are written in more than one script. Like Azerbaijani (Latin, Arabic and Cyrillic), Kurdish (Latin and Arabic) or Serbian (Latin and Cyrillic).
I guess there are Serbian sentences written in both Latin and Cyrillic scripts on Tatoeba. It isn't a problem when the script direction is same as in Serbian, but if the direction is different, it becomes difficult to use the script other than the 'default' one.
Can't something be done about it?
I've noticed that the 'other language' flag is direction-neutral. It allows both left-to-right and right-to-left scripts. So I think it should be possible to implement this to other languages that can be written in multiple scripts.
Also, it would be really helpful if simplified and traditional Chinese could be somehow separated. A lot of otherwise simple sentences take me a while to translate because I have to change it to simplified chinese on google translate first. Just some kind of tag in advanced search, or a separate category?
That's a different issue, but I agree. It would be useful. When studying a language that can be written in multiple scripts, one may need to view sentences written only in a particular script like Chinese sentences written in traditional Chinese or Berber sentences written in the Tifinagh script. Currently, it's not possible to separate/filter sentences in such a way.
Actually we used to automatically generate the sentences in the other script for Chinese.
Like this sentence for instance: https://tatoeba.org/eng/sentences/show/7776103
If the sentence is in simplified, it would have the traditional version in grey. If it was in traditional, it would have the simplified version in great.
I don't think we've ever decide to remove this feature so it must have broken at some point...
Hm, when I look up sentences it appears for a couple but not for most. Like maybe one or two out of every ten sentences?
Yea I'm not sure what some have it and some not. But I know for sure that we used to have it on all Chinese sentences, automatically generated.
This is fixed now. All the Mandarin Chinese sentences have the alternate script.
Related GitHub issue: https://github.com/Tatoeba/tatoeba2/issues/1784
As you pointed out, the current implementation assumes that Ottoman Turkish is written right-to-left using Arabic script.
I had a look at the English Wikipedia article about the Ottoman Turkish language, and I am a bit confused because it says that this language switched to the Latin script as it evolved into modern Turkish. Can you elaborate about the contemporary use of Arabic vs. Latin to write Ottoman Turkish?
One way to quickly solve the display problem is to set the direction of Ottoman Turkish to "auto". Another, much more complex way is to implement multiple script support in and auto-convert between, but only if it's worth, that is to say there are actually native speakers using Latin and Arabic, we want to be able find sentences written in Arabic by the searching in Latin and vice-versa, the conversion can be partly or fully automated, etc.
As you found out, the direction of sentences of "unknown" language is set to automatic. That said, this is not a reason to set the language of your Ottoman Turkish sentences written in Latin script to "unknown", just because they look better. I strongly discourage you from doing this because then these sentences are excluded from the Ottoman Turkish corpus, they won't show up in searches and statistics, which is preventing contributors/learners of Ottoman Turkish from finding them. What's worse, since *only you* know their actual language, if for some reason you forget about them or stop contributing, these sentences will never be assigned to the correct language and will be definitely lost.
>I had a look at the English Wikipedia article about the Ottoman Turkish language, and I am a bit confused because it says that this language switched to the Latin script as it evolved into modern Turkish. Can you elaborate about the contemporary use of Arabic vs. Latin to write Ottoman Turkish?
Thanks for your reply, gillux. Have you seen the GitHub issue? I tried to explain this there. Also, there are some other languages being affected from this issue.
The Turkish language reform consists of a script reform and replacing of loanwords. They are different things. Allowing Ottoman Turkish sentences in the Latin script will increase contributions in the old language and its readability. Currently, almost all 'Ottoman Turkish' sentences on Tatoeba are simply transliterations of modern Turkish into the Arabic script. They're not wrong, but they don't truly reflect the old language. If one looked here to compare Ottoman Turkish and modern Turkish, they would assume that the only difference is the alphabet.
>One way to quickly solve the display problem is to set the direction of Ottoman Turkish to "auto".
This sounds good to me. If doing it would display sentences in both Arabic and Latin scripts correctly and wouldn't cause any unintended consequences, why not?
> I strongly discourage you from doing this because then these sentences are excluded from the Ottoman Turkish corpus, they won't show up in searches and statistics
I created only one pair set as 'unknown' for demonstration. I'm adding romanized Ottoman Turkish sentences to Turkish corpus for now. I will change them back to Ottoman Turkish once a solution is found.
> Have you seen the GitHub issue?
Sorry, I didn’t. I commented there too.
> Allowing Ottoman Turkish sentences in the Latin script will increase contributions in the old language and its readability.
I see. Let me try to understand the situation. Can you tell me if the following is correct?
1. Ottoman Turkish is not a living language any more (there are no native speakers alive).
2. Native speakers of Ottoman Turkish used the Arabic script only.
3. Most of the people who understand Ottoman Turkish are native speakers of Turkish.
4. Native speakers of Turkish are unfamiliar with the Arabic script.
If that is correct, I believe it makes sense to convert Ottoman Turkish from Arabic to Latin, but not the other way around, because Latin not is no more than a reading aid for native speakers of Turkish. In other words, I think all Ottoman Turkish sentences should stay in Arabic only, while we only attach Latin as a transcription of them.
> I created only one pair set as 'unknown' for demonstration.
I see. Next time, please use https://dev.tatoeba.org/ instead for demonstration purposes.
Yes, they're all correct, gillux. Even if there were some very old people using Ottoman Turkish as the primary language today, they, too, would use the Latin script to be understood.
I'm not asking Ottoman Turkish sentences to be converted to Latin anyway. If one wants to add sentences in the Arabic script, it's perfectly fine. I simply want users to be allowed using the Latin script, too. The Arabic script is consonantal. That makes it rather difficult to read and use unknown words and expressions comparing to the Latin script. Perhaps that's why almost all Ottoman Turkish sentences here are transliterations of modern Turkish.