clear
swap_horiz
search

Wall (4682 threads)

keyboard_arrow_left 1234567...469
OsoHombre
14 hours ago - 14 hours ago
I have noticed that many of my sentences were copied and re-published for the sole purpose of having Arab names like Sami and Layla replaced with the now legendary Tom and Mary. Here is what was previously said about this issue:

About nearly-identical sentences (near-duplicates):

Here is what Trang (the owner of Tatoeba) said:

"It is okay to create near duplicates to illustrate the linguistic properties of a language. It is *not* okay to create near duplicates just for the sake of enforcing a certain set of names. Although I understand it is not meant with bad intentions, it is still borderline spam.
If you really have a good reason for it (you or anyone), then please, before you proceed to create tens of thousands of near duplicates by merely changing the names, explain what is the problem you are facing that forces you to create these near duplicates. I'm pretty sure we can come up with a solution that doesn't result in so much intentional noise pollution in the corpus." (Trang)

Source: https://tatoeba.org/eng/wall/index#message_28304

Therefore, I ask Tatoeba's admins to take the relevant measures to prevent this from happening in the future. If I was granted the right (and everybody here should enjoy it) to use Arab names in my sentences, then why would my sentences be massively copied and modified just to insert Tom and Mary in them? I thought that this issue was solved once and for all, but much to my surprise and disappointment, I have seen this problem come back and I don't see why. Could some admin explain me this and take measures to stop it once and for all?
verdastelo9604
2 days ago
Are there any other Chinese contributers? I think we should contribute in Traditional Chinese as much as possible, because the converter is more precise from traditional to simplified.
hide replies
al_ex_an_der
yesterday
Tio estas bona ideo. Mi deziras grandan sukceson.
nickyeow
19 hours ago
+1
modene1
20 hours ago
Super resource! Wish I'd discovered it earlier.

Thanks to all the contributors.
CK
CK
6 days ago
** Stats - 2017-05-20 - Native Speaker Sentence Counts **

http://tatoeba.byethost3.com/stats-170520.html
sharptoothed
7 days ago
** Tatoeba Stats **

Tatoeba stats, graphs & charts have been updated.

http://tatoeba.j-langtools.com/allstats/
hide replies
Guybrush88
7 days ago
as always, thanks :)
maaster
6 days ago
Cпасибо за диаграммы!
tadaa25
7 days ago
Could you give links to some users who have nice collections of favorite sentences (lots of funny, interesting sentences etc)? English sentences preferably. It's fun to translate some unusual sentences sometimes, and add them to my own list.
hide replies
mailohilohi
7 days ago
You might want to have a look at my favorite sentences. http://tatoeba.org/eng/favorite...er/mailohilohi
hide replies
tadaa25
7 days ago
Thanks! Good stuff. :)
I don't understand why more people don't use this feature... Maybe I'm just really weird.
raggione
8 days ago - 8 days ago
This morning I could neither access Tatoeba through Firefox (after an update) nor through Windows Explorer. In both cases for security reasons, and Firefox, which I usually work with, expressly states that no exceptions can be made. So I'm left to work with Android on my tablet, which is not very comfortable to use. Maybe somebody could sort this out.

What I am getting in the way of an explanation is that the Security Certification has run out with Firefox on May 21 at 00.11 hours.
hide replies
PaulP
7 days ago
Same here, but with Chrome I can access.
bill
7 days ago
+1

Google Chrome and Mozilla Firefox:
http://imgur.com/a/v4Bh4
TRANG
7 days ago
Our SSL certificate was expired.

It should be fixed now :)
hide replies
raggione
7 days ago
Thanks, TRANG. It's working fine now.
bill
7 days ago - 7 days ago
Wow, that was fast!

Thanks a lot, Trang.
raggione
8 days ago
I also wish for the possibility of writing words in italics in comments.
Lazovic
22 days ago
I wonder why it's used the British flag instead of the American one. Most of the sentences are in American English and it might be good if the "symbol" of this language becomes the American flag, not British.
hide replies
Ricardo14
22 days ago - 22 days ago
+1

Intersting fact: The actual language icon represents not England/English, but also Wales/Welsh, Scots/Scotland and Ireland/Irish since they are part of the Great Britain.
AlanF_US
22 days ago
The language originated in the British Isles, so if an icon has to be chosen arbitrarily, it seems fair to me that it should be the UK flag.
hide replies
Ricardo14
21 days ago
But it became "famous" and largely used because of the American people, no? Besides, one can think that all the sentences here are in British English.
hide replies
OsoHombre
17 days ago - 15 days ago
I don't think it would be a good idea to consider North American and UK English as two separate languages because both North Americans and the British use tens of thousands of words with the same connotations, meanings, and tens of thousands of idioms with the same meanings and connotations, too. The two varieties are mutually intelligible, they have the same dictionaries (Cambridge and Oxford dictionaries include a huge number of common Americanisms) and I don't think it's OK to distinguish between them. If some want to modify the language flag, I can understand that, but we can't split them, otherwise the website would have tens of thousands, if not hundreds of thousands of similar sentences under different language labels, and if some day Australians, the Scottish, the Irish, the Carribeans, Nigerians, the Guyanese, South Africans, and people from every place where English is spoken as a first language with a local flavor (including Texans, New Yorkers, etc.) claim to have a separate flag for their local varieties of English, the website would contain millions of similar sentences with different labels, because no matter how different all those varieties may be, expressions like 'Good morning' 'How are you?' and 'I love you' are probably all said and spelt in the very same way, and who would like to see the same sentence repeated 20 or 30 times under different labels? I think that if a sentence is "too local", using words that are definitely specific to North American or British English, this should only be indicated with a tag.
hide replies
Aiji
15 days ago
I agree 120%.

I suggest to use tags for sentences that are solely used in Australia or Scotland, etc. like it is done in the French corpus for Belgian or Quebec expressions (otherwise half of Africa would have a French flag)
Solid_Rock
20 days ago
I think it would be better to have both of them splitted just like Portugal and Brazil's flag.
hide replies
Ricardo14
20 days ago
Seems fair.
beggi
2017-03-22 16:33
Arapça kelimelere bütün okutucu işaretlerin eklenmesi, bu dili benim gibi buradan öğrenmek istiyenler için elzem görünüyor. Bu konuda birşeyler yapılabilir mi?
hide replies
Gulo_Luscus
2017-03-22 20:08
Öncelikle hoş geldin. Bu konularda İngilizce yazarsan yöneticilerden veya Arapçayla ilgilenen diğer üyelerden daha net ve kesin sonuçlar alırsın. Bu dille ilgili bir bilgim olmadığından yardımcı olamayacağım ama sorununu İngilizce yazabilirim.


beggi asks if it is possible to add all Arabic diacritics to words, for they are needed to learn the language.

hide replies
odexed
2017-03-22 20:18 - 2017-03-22 20:20
> add all Arabic diacritics to words

This is not how native speakers write in Arabic. Newspapers, books (except the Holy Quran) are written without diacritics.
hide replies
beggi
2017-03-22 20:36
Sure they don't add them since they already know them. So I am not a native arabic speaker who wants to know how to write ...
hide replies
odexed
2017-03-22 20:40
The point here is that Tatoeba prefers the most natural form for the sentences.

> We want sentences that a native speaker would actually use.
http://en.wiki.tatoeba.org/arti...how/guidelines
hide replies
beggi
2017-03-22 22:04
Not a practical preference for Arabic learners lIke me...Thank you for the answers
OsoHombre
2017-03-26 10:29
I wish diacritics were systematically added. They are systematically used in the Quran to indicate how to precisely read the Holy Book, because any error in the pronunciation could change the meaning of a word, and misinterpretation of the meanings of sacred texts might be a very bad thing. However, diacritics aren't systematically used in normal writing (media, books, etc.) and this is one of the major obstacles for the promotion of the learning of Arabic by non-natives.
OsoHombre
2017-03-26 10:21
It is true that Arabic diacritics are very helpful for Arabic-language beginners. However, it is difficult to oblige ordinary contributors to systematically 'diacritize' (add tashkil = add diacritics) every single word they write. It's time-consuming. Yet if we want to make Tatoeba a really helpful website for Arabic-language learners, we should think about a system that I've observed is used for Chinese: the feature that lets people read a sentence in Chinese pinyin (Latin phonetic alphabet for Chinese) in addition to reading it in Chinese characters. It would be great if Tatoeba adopted the same system for Arabic with a feature that shows the same sentence with Arabic diacritics, but I think that, contrary to the feature used for Chinese pinyin, this cannot be automatically done for Arabic, because Arabic diacritization represents vowels (and sometimes tense consonants), but the occurrence of many of these vowels (especially at the end of an Arabic word) is governed by grammatical rules. Therefore, developing a program that automatically diacritizes Arabic sentences means that the program has to know Arabic grammatical rules, and this is not only very complex to prepare, but its complexity would also give way to many potential errors that the program would be making, and this would need years of continuous improvement before we see that program perform its function properly on Tatoeba, whew!!! Just talking about that makes me tired. It has to be done by someone, somewhere, some time, but I think that Tatoeba's volunteers aren't prepared for carrying out this from start to finish. Another solution would be having Arabic sentences diacritized by Tatoeba contributors. But before a sentence is diacritized, it needs to be really correct, because this diacritization involves literally re-writing the whole sentence by adding the diacritics manually. In this case, a sentence has really to be correct to be worth re-typing. One last solution that has just come across my mind is the transliteration of Arabic sentences in the Latin alphabet. There is already a Latin alphabet that's used by specialists to transliterate Arabic words and names (especially proper nouns), similar to the one used by Encyclopaedia Britannica (look at the map on this https://www.britannica.com/place/Saudi-Arabia). This is a partial, quick, and practical solution that would give people an idea on how to read an Arabic sentence or phrase, but the accuracy of such a transliteration system also needs to be checked by a native speaker (especially the word endings that are determined by grammatical rules).


hide replies
gillux
2017-03-27 13:11
> I think that, contrary to the feature used for Chinese pinyin, this cannot be automatically done for Arabic, because Arabic diacritization represents vowels (and sometimes tense consonants), but the occurrence of many of these vowels (especially at the end of an Arabic word) is governed by grammatical rules.

What you’re describing is exactly the same as Japanese and furigana (readings of characters). Yet, some people have developed a software, Mecab, that performs a morphological analysis of Japanese and finds the right readings most of the time. And Tatoeba makes use of Mecab, shows the readings and allows contributors to manually edit them when Mecab gets it wrong.

I don’t think any of us is going to develop such a software for Arabic diacritics, but if you can find out one that is free of use, it should be rather easy to integrate with Tatoeba, because we already have this system called transcriptions. More about that: http://en.wiki.tatoeba.org/arti...iption-request

The rationale for furigana was helping learners of Japanese, so I don’t see why we wouldn’t have a similar thing for Arabic, as long as the technology allows it.
Pfirsichbaeumchen
2017-03-23 03:01
[Suggestion]

I agree it is a hindrance for Arabic beginners, who need diacritics to learn. Japanese and Chinese come with Furigana and Pīnyīn, respectively. It would certainly help to have a feature to display (or not display) full diacritics for Arabic.
hide replies
odexed
2017-03-23 06:31 - 2017-03-23 06:55
The problem here is that it's not possible to put diacritics right automatically. And I don't think anybody would contribute sentences with full set of diacritics (the most Arabic sentences here except for those from our experienced contributors don't even have the proper punctuation.
hide replies
Pfirsichbaeumchen
2017-03-23 07:06
Naturally people would have to go to the trouble of adding them manually. It is the same problem with Japanese. No machine can do that job reliably, yet the feature exists and is used by some.

I think the same was suggested for Hebrew a while ago, and there was the same resistance.
hide replies
odexed
2017-03-23 07:48 - 2017-03-23 07:57
I don't resist, I'm just pondering your suggestion. I myself am a student of Arabic and I know how hard it is to read it when you don't know the words yet. But I think it's bad to learn something wrong (I wouldn't trust this feature). In Arabic diacritics differ for the words depending on their position in the sentence and on the previous words so it's not very useful to only know how to pronounce standalone words.
I'd rather suggest to make the recording of the audio more simple (just by clicking a button without having to use any programs). This way people could contribute audio as fast as they translate so we won't have this problem.
hide replies
OsoHombre
2017-03-26 10:35
@odexed

Yes, I'm with you regarding making audio recordings simpler. I've read about that and it's a real conundrum (peppered with some bureaucracy).
odexed
2017-03-23 08:10
By the way, I think your suggestion would make more sense for Russian (if somebody needs the feature for displaying the accents).
hide replies
AlanF_US
2017-03-23 11:53
Another issue worth thinking about is whether characters with diacritics are treated like characters without them for the purpose of search, the same way that uppercase characters are treated like lowercase characters. This has to be set up manually. I did this for Russian with the stress mark (acute sign), and I also did it for Hebrew with the vowels and final consonants. Currently, there's nothing like this in place for Arabic.
hide replies
odexed
2017-03-23 12:24
I wouldn't change the current settings because sometimes diacritics in Arabic are indispensable because they could change the meaning.
For example, دَرَسَ means to study something and دَرَّسَ - to teach someone
hide replies
OsoHombre
2017-03-26 10:41
@odexed:

Another example:

نَزَلَ - to descend, to go down

and:

نَزْل - hotel

OsoHombre
2017-03-26 10:40
@AlanF_US

This is another problem, indeed. I user might be frustrated if they can't find an Arabic word in a search engine simply because the word isn't diacritized properly, however, there is another risk. The simplest solution would be 'turning off' the diacritics captor of a search engine, but in this case, if a search engine doesn't take into consideration diacritics, the search results might yield many irrelevant results. For example, the word 'درس' may either mean:

دَرَسَ - to study

or:

دَرْس - lesson


hide replies
odexed
2017-03-26 10:51 - 2017-03-26 12:56
If I don't know how to put diacritics on a word I can always look it up in my dictionary. On the other hand, I'd get frustrated while looking for examples with "دَرَّسَ" if there are thousands sentences with "دَرْس" and "دَرَسَ"
Ricardo14
2017-03-23 19:40
+1

Tatoeba is also a big tool for learners. There would be a way to do so, I think.
OsoHombre
2017-03-26 10:31
@Pfirsichbauemchen:

If there are any concrete suggestions, then I volunteer to take part in them. After all, I'm here for the promotion of the Arabic language (although I don't have much free time at any moment of the year and I may be absent for prolonged periods at a time).
hide replies
cueyayotl
2017-03-27 14:14
I fully support such a system, where an alternate reading (in gray) for an Arabic sentence can be seen if and only if someone has manually submitted a version of the sentence with diacritics. In other words, if only a normal sentence is submitted, we will only see that sentence (in black), but if that person (or another user) submits the same sentence with diacritics (not as a new sentence, but through an extra button that can be activated for Arabic), then we will see the "normal" sentence in black and the sentence with diacritics in gray.
I fully support this kind of interface for all Arabic, all Chinese, Cyrillic-written Slavic, Philippine, etc. languages as well.
hide replies
OsoHombre
17 days ago
Cueyayotl and all the others:

In case you create such a feature, I'll volunteer to re-write all of Tatoeba's (correct) Arabic sentences using diacritics.

hide replies
Pfirsichbaeumchen
15 days ago - 15 days ago
That is a very generous offer, Oso, that would greatly help beginners of Arabic.

@TRANG: Can it be done?
hide replies
OsoHombre
15 days ago
Thank you for your encouraging message, Pfirsichbaeumchen. In fact, this will also help Arabs themselves. There are massive literacy programs in some Arab countries, and these people may also be considered as beginners in the standard form of their own language. It would be my pleasure to make such a contribution to the Tatoeba project.
keyboard_arrow_left 1234567...469