Wall (7,124 threads)
Tips
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
sharptoothed
8 days ago
sharptoothed
8 days ago
TATAR1
8 days ago
AlanF_US
9 days ago
sharptoothed
10 days ago
Shanaz
13 days ago
Qaztat
13 days ago
TATAR1
13 days ago
Tartar
13 days ago
menaud
15 days ago
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

Plej varmajn paskajn bondezirojn al vi.
Ĝojan Paskon!

Ĝojan Paskon ankaŭ al vi!!

Hello, do we have any of the admins around? Some people have been posting unrelated things?
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

If you want to report a spammer, please send a private message to TatoebaAdmins (or, if you can't remember that username, any individual admin). Please do not write Wall posts with links to spammers's profiles, messages, or sentences, since this will bring them more attention and encourage them to write more spam.

Oh, I used team@tatoeba.org and didn't get an answer. So I used the wrong address. Thanks for letting me know!

That address works, too. I see that you wrote an e-mail three days ago. The messages were hidden pretty soon after that. Thanks for reporting the problem.

When you search for a Korean word on Tatoeba, such as 오늘, sentences including 오늘은 won't appear. Korean is a language that usually uses many kinds of suffixes. Is it possible for the search engine to recognize words with different suffixes?

You can use a * symbol to represent any number of characters. E.g. 오늘* will find 오늘 followed by any suffix: https://tatoeba.org/en/sentence...%EB%8A%98*&to=
You can also use it at the beginning, e.g. *십시오 for polite requests: https://tatoeba.org/en/sentence...C%EC%98%A4&to=
Or somewhere in the middle of a word if you want.
This and other search engine features are explained on the wiki: https://en.wiki.tatoeba.org/art...ow/text-search

Well, I wish more that Tatoeba treats Korean like Chinese and Japanese... Korean just has too many compounds and suffixes, so considering the parts between each space as independent words is impractical. Also, this makes the 'required vocabulary' have to include all forms of same words...

I understand, and I wish we had better support for Korean, too. But I would like to point out that the handling of Chinese and Japanese is different but not that great neither. Chinese and Japanese characters are all considered independently and the search engine does not recognize word boundaries. This leads to the limitation described here: https://en.wiki.tatoeba.org/art...ord-boundaries
Now we could perfectly enable the same behavior for Korean characters too, if you think it would be overall beneficial despite that limitation. If you'd like to help evaluating such change, we could enable it on our testing server and get your feedback.

Yes, that would be nice! Then "학생이에요." (I am a student) could be either found by "학생" (student) and "에요" (to be)? "대한민국" (Republic of Korea) could be either found with "민국" (republic) and "대한" (Korea) right? Also, if I add "월" (month) or "술래잡기" (tag) in required vocabularies, and if a sentence includes "5월" (May) or "술래잡기와" (와 means "and"), it will also be considered to include that word? Also I think it's good for Korean than other languages because Korean commonly uses 2,500+ seperate Unicode characters in their language. This would ensure the accuracy (it's not like it would include "apple" when you search "a", if you enabled it for English)

I have temporarily configured Korean to be treated like Chinese and Japanese on the testing server: https://dev.tatoeba.org/fr/sent...C%EA%B5%AD&to=
Note that the testing server only contains a subset of what is on tatoeba.org, and it is separate. Feel free to add whatever Korean sentences you want on the testing server, so you can test out how the search behave. (You’ll have to create a new account there.) Newly added sentences should appear in search results within 15 minutes.
Once it is confirmed that this change overall improves search in Korean, we can bring it to tatoeba.org, too.

Hi,
Do we provide transliteration or full vocalisation for languages such as Arabic?
I have only seen partial vocalisation (for disambiguations) and no transliteration.
If I understand correctly, Mandarin Chinese is the only language featuring transliterations and Japanese can also have furigana.

It would indeed be very nice if Arabic featured full vocalisation, just as Chinese and Japanese offer such aids as well.

Thanks for the support, @Waldelfe,
By "vocalisation" I actually mean providing vowels (diacritics) (Arabic حَرَكات ḥarakāt), which makes the pronunciation of Arabic words unambiguous. Just in case you or someone reading the thread don't know
Example sentence:
How do I get to the train station?
Unvocalised Arabic text: كيف أصل إلى محطة القطار؟
Vocalised: كَيْفَ أَصِلُ إِلَى مَحَطَّةِ الْقِطَار؟
Transliteration: kayfa ʾaṣilu ʾilā maḥaṭṭati l-qiṭār(i)?
There is more than one way to transliterate Arabic, so this is just an example above.
The level of vocalisation can vary, especially on final vowels or their absence (ʾiʿrāb - إِعْرَاب).
E.g. مَحَطَّةُ الْقِطَارِ (maḥaṭṭatu l-qiṭāri) in the nominative case is more detailed than مَحَطَّة الْقِطار maḥaṭṭat al-qiṭār or when something is considered "obvious" and doesn't require consistent diacritics.
Let's have a discussion on this. I contribute Arabic translations on Wiktionary and the vast majority of Arabic word and sentence translations have diacritics and automated transliterations. It should be doable in Tatoeba as well, although it's not possible to do it automatically. It will require some effort (can also be error-prone!).

Hello, thank you for bringing this up.
I did a quick search for Arabic vocalizers.
It looks like we could use some of these tools
https://github.com/linuxscout/mishkal
https://github.com/Barqawiz/Shakkala
to automatically generate vocalized versions of the sentences while allowing for manual correction when needed. This would also allow to find Arabic sentences by searching for their words in vocalized form, if that's any useful.
I am also interested in the cultural background and contemporary usage of vocalized Arabic. For example, I know that Japanese reading aids (furigana) is used in signs aimed at kids, child books, teenager books like manga, occasionally in "young adult" litterature, and otherwise in any text that displays very rare or ambigous Chinese characters. What is the current usage of vocalized Arabic?
Also, you mentioned that the vocalized form may vary. Does it only vary in terms of quantity of extra information (diacritics) added to the original word, or could it also vary as in different people use different "orthography" to vocalize?

First of all, all tools may be helpful but not reliable. No tool can produce a reliable vocalised Arabic text from an unvocalised one. They can help to reduce the typing and manual input effort but it all would have to checked by a knowledgeable human. No harm in researching, though.
Vocalised Arabic is used in religious texts, especially Qur'an, otherwise, the usage is very similar to Japan's furigana, mainland China's pinyin and Taiwan's bopomofo (zhuyin fuhao). I possess a few vocalised readers, dictionaries and textbooks. Noteably, Oxford or Larousse Arabic dictionaries, Russian author Ilya Frank's Arabic Joha adventures book, Lingualism pdf bilingual books in MSA have full vocalisations and audio.
On various level's vocalisation. Let's say, we want to write "the large book" - الكتاب الكبير (al-kitāb al-kabīr), also al-kitābu al-kabīr(u) or al-kitābu l-kabīr(u)
1. Full vocalisation in the nominative case can be الْكِتَابُ الْكَبِيرُ - al-kitābu al-kabīru or al-kitābu l-kabīru with markings for cases - ʾiʿrāb - https://en.wikipedia.org/wiki/ʾIʿrab. This can be even more pedantic with الْكِتَابُ ٱلْكَبِيرُ, marking the first alif in the second word as silent (a less common diacritic). The final vowels used in the ʾiʿrāb is often omitted for various reasons. Vowels are often unmarked when they are considered obvious by native speakers or were introduced earlier for learners. E.g. كِتَاب (kitab) can be written as كِتاب.
2. Some diacritics are seldom used or even missing on Arabic keyboards, e.g. هٰذَا (hāḏā , “this) with a dagger alif (a vertical stick). So, you may see in a vocalized text هَذا, which would be haḏā, an incorrect shortening for the lack of the diacritic (or unwillingness to use it).

Oh, by the way @gillux. I don't have issues with using English personal or city names in Arabic and problems with transliterations are exaggerated.
So "Tom" is written as توم in Arabic, or تُوم with a vocalisation and no final case ending is added to foreign names like this. It would produce "tūm" but Arabs recognise foreign names and pronounce them imitating the foreign pronunciation for known personal or geographical names (sometimes adjusting to the Arabic phonology). It depends on the speaker, of course and there may be multiple ways of saying foreign names or loanwords.

Hello, the search problem is solved now, thank you for reporting it and sorry for the inconvenience!

Hi, the system has been throwing errors today, we are not able to search or translate. I even tried to log out and log in again

Okazis eraro "68023405f1d53"

Hallo zusammen! 🙂
Ich habe beim Laden der Tatoeba-Seite einen Fehler erhalten.
Der Fehlercode lautet: 68022d82ce05a.
Kann mir jemand in dieser Angelegenheit weiterhelfen?

Во время поиска возникла ошибка. Если она будет повторяться, пожалуйста, свяжитесь с нами и сообщите код ошибки «6801fb7d1ba2d».