menu
تتويبا
language
سجّل لِج
language العربية
menu
تتويبا

chevron_right سجّل

chevron_right لِج

تصفح

chevron_right Show random sentence

chevron_right تصفح حسب اللغة

chevron_right تصفح حسب القائمة

chevron_right تصفح حسب الوسم

chevron_right تصفح ملفات الصوت

المجتمع

chevron_right الحائط

chevron_right قائمة بجميع الأعضاء

chevron_right لغات الأعضاء

chevron_right المتحدثون الأصليون

search
clear
swap_horiz
search
lbdx {{ icon }} keyboard_arrow_right

الملف الشخصي

keyboard_arrow_right

الجُمل

keyboard_arrow_right

المفردات

keyboard_arrow_right

Reviews

keyboard_arrow_right

القوائم

keyboard_arrow_right

المفضلة

keyboard_arrow_right

التعليقات

keyboard_arrow_right

التعليقات على جمل lbdx

keyboard_arrow_right

رسائل الحائط

keyboard_arrow_right

السجلات

keyboard_arrow_right

تسجيل صوتي

keyboard_arrow_right

المدوّنات

translate

ترجِم جمل lbdx

رسائل lbdx على الحائط (المجموع ٢٣٣)

lbdx lbdx ١٤ فبراير ٢٠٢٤ ١٤ فبراير ٢٠٢٤ ٥:٥٦:٥٢ م UTC link Permalink

The number of monthly sentence owners fell from 350-400 between 2012 and 2016 to 250-300 between 2017 and 2023. I don't have the details by language.

lbdx lbdx ١٤ فبراير ٢٠٢٤ ١٤ فبراير ٢٠٢٤ ٥:٤٨:٤٨ م UTC link Permalink

Trang, thank you for reopening the debate on this important issue.

My view on this has evolved slightly. I now think it would be simpler and more understandable to also include derived sentences in this rate limit of 3,000 sentences per language per month. Sentence counts would be reset at the beginning of each month. Once the limit has been reached, the user would not be allowed to add any more sentences until the following month. I prefer a monthly rate limit because it doesn't penalise users who don't contribute every day or every week.

Note that I'm not against the occasional import of other corpora into Tatoeba as long as they are lexically balanced and composed of sentences that are useful for language learners.

lbdx lbdx ٥ فبراير ٢٠٢٤ ٥ فبراير ٢٠٢٤ ٤:٠٢:٠٢ م UTC link Permalink

According to linguists, Berber is not a language but a group of languages [1]. Consequently "ber" is an ISO 639-5 language code but not an ISO 639-3 language code. That is probably why Berber has been declined by Wikipedia.

Tatoeba also does not accept languages that do not have an ISO 639-3 code, but an exception was made for Berber. In hindsight, this was probably not a good idea. It creates overlap and harmful competition with other Berber languages' corpora such as Kabyle.

[1] https://en.wikipedia.org/wiki/Berber_languages

lbdx lbdx ٤ فبراير ٢٠٢٤ ٤ فبراير ٢٠٢٤ ١٠:٤٦:١١ ص UTC link Permalink

The years 2017 and 2018 were years in which Tatoeba's main English-speaking contributor added hundreds of thousands of sentences in bulk.These sentences were mostly built according to syntactic patterns and used wildcards to avoid creating paraphrases that differ only in their named entities. These massive additions have greatly reduced the lexical diversity of the English corpus and increased the proportion of sentences containing pervasive words from 20% to 40%. This sudden change coincides with a sharp drop in the number of active contributors to Tatoeba.

The introduction of rate limits for sentence additions would prevent such a flood from happening again.

lbdx lbdx ٣ فبراير ٢٠٢٤, edited ٣ فبراير ٢٠٢٤ ٣ فبراير ٢٠٢٤ ٩:٤١:٤٨ ص UTC, edited ٣ فبراير ٢٠٢٤ ٩:٥٤:١٨ ص UTC link Permalink

** Pruned/Rebalanced Lists **

Rebalanced lists are lexical filters that provide a more varied and balanced view of the Tatoeba Corpus. They prohibit a word from occurring more than 10 times as often as in a reference corpus. Long sentences of more than 15 words have little success with translators and are therefore systematically pruned. The most recent sentences are pruned before older ones. The words targeted are usually pervasive named entities that are used extensively by a few Tatoebans, and not relevant across languages.

10 major languages on Tatoeba are currently supported:
- English: https://tatoeba.org/en/sentence...=1&orphans=any
- French: https://tatoeba.org/en/sentence...=1&orphans=any
- German: https://tatoeba.org/en/sentence...=1&orphans=any
- Italian: https://tatoeba.org/en/sentence...=1&orphans=any
- Japanese: https://tatoeba.org/en/sentence...=1&orphans=any
- Mandarin Chinese: https://tatoeba.org/en/sentence...=1&orphans=any
- Portuguese: https://tatoeba.org/en/sentence...=1&orphans=any
- Russian: https://tatoeba.org/en/sentence...=1&orphans=any
- Spanish: https://tatoeba.org/en/sentence...=1&orphans=any
- Turkish: https://tatoeba.org/en/sentence...=1&orphans=any

All rebalanced lists are updated automatically every Saturday.

lbdx lbdx ١٧ ديسمبر ٢٠٢٣, edited ١٧ ديسمبر ٢٠٢٣ ١٧ ديسمبر ٢٠٢٣ ٩:٠٤:٤٦ ص UTC, edited ١٧ ديسمبر ٢٠٢٣ ١١:٣٧:٥٣ ص UTC link Permalink

> a lot of contributors just translate without seeing the errors

It's tricky to spot errors in a language you don't fully master. That's why it's so important to visit the sentence page before translating it and check whether any problems have been reported. It is also essential to assess the reliability of the authors you are translating.

To determine whether a wording is commonly used, I recommend taking a look at the number of exact matches in Google Books.

Many thanks to the corpus maintainers who volunteer their time to correct all these errors.

lbdx lbdx ١٦ ديسمبر ٢٠٢٣, edited ١٦ ديسمبر ٢٠٢٣ ١٦ ديسمبر ٢٠٢٣ ١١:٢٩:١٥ ص UTC, edited ١٦ ديسمبر ٢٠٢٣ ١١:٣٥:٠٣ ص UTC link Permalink

Monthly updates adopt a weekly schedule ✨

Every Saturday morning, all my lists are now automatically updated from the cloud 🤖

- Tatominer https://tatominer.netlify.app
- Spread by Tatoebans ✨ https://tatoeba.org/en/sentences_lists/show/170280
- Pruned English ✂️ https://tatoeba.org/en/sentences_lists/show/171182
- Rated as 'not OK' 🔴 https://tatoeba.org/en/sentences_lists/show/170380
- Rated as 'unsure' 🟠 https://tatoeba.org/en/sentences_lists/show/170383
- JMdict - Japanese 🇯🇵 https://tatoeba.org/en/sentences_lists/show/171073
- JMdict - English 🇬🇧 https://tatoeba.org/en/sentences_lists/show/171072
- Tatolead https://tatolead.netlify.app

More information about these tools at my profile page: https://tatoeba.org/en/user/profile/lbdx

lbdx lbdx ٧ ديسمبر ٢٠٢٣, edited ٧ ديسمبر ٢٠٢٣ ٧ ديسمبر ٢٠٢٣ ٤:٢١:٣٥ م UTC, edited ٧ ديسمبر ٢٠٢٣ ٤:٢٢:٣٣ م UTC link Permalink

No doubt it will join your 61,000 sentences that are already on the list :):
https://tatoeba.org/en/sentence...rd_count_min=1

lbdx lbdx ٧ ديسمبر ٢٠٢٣, edited ٧ ديسمبر ٢٠٢٣ ٧ ديسمبر ٢٠٢٣ ٣:٤٠:٤٧ م UTC, edited ٧ ديسمبر ٢٠٢٣ ٣:٤٢:٢٩ م UTC link Permalink

No, because #12172884 is derived from one of mhr's German sentences, and then post-linked to your sentence.

lbdx lbdx ٧ ديسمبر ٢٠٢٣ ٧ ديسمبر ٢٠٢٣ ٢:٢٦:٣٣ م UTC link Permalink

"Spread by Tatoebans" is a multilingual list of sentences that have already significantly spread on Tatoeba.

The sentences of this subset tend to be more universal and to have a more dependable wording and spelling than other sentences on Tatoeba.

To enter the list, a sentence must have at least two links to sentences in other languages and from other contributors. Orphan sentences and post-linking are not taken into account.

This means that an original sentence must appeal to at least two speakers of two different languages to be selected. On the other hand, a derived sentence only needs to be retranslated once (by a third member into a third language).

lbdx lbdx ٢ ديسمبر ٢٠٢٣ ٢ ديسمبر ٢٠٢٣ ١٢:٥٨:٢١ م UTC link Permalink

** December 2023 Updates **

- Tatominer https://tatominer.netlify.app
- Tatolead https://tatolead.netlify.app
- Spread by Tatoebans ✨ https://tatoeba.org/en/sentences_lists/show/170280
- Rated as 'not OK' 🔴 https://tatoeba.org/en/sentences_lists/show/170380
- Rated as 'unsure' 🟠 https://tatoeba.org/en/sentences_lists/show/170383
- JMdict - Japanese 🇯🇵 https://tatoeba.org/en/sentences_lists/show/171073
- JMdict - English 🇬🇧 https://tatoeba.org/en/sentences_lists/show/171072

More information about these tools at my profile page: https://tatoeba.org/en/user/profile/lbdx

lbdx lbdx ٢٩ نوفمبر ٢٠٢٣, edited ٢٩ نوفمبر ٢٠٢٣ ٢٩ نوفمبر ٢٠٢٣ ٤:١٣:٣٧ م UTC, edited ٢٩ نوفمبر ٢٠٢٣ ٤:١٥:١١ م UTC link Permalink

> What's the number of times a word should appear in the corpus for it to be filterd out by this algorithm?

If he wasn't blocked, this question might have interested Amastan. About two-thirds of his English sentences—142,564 to be exact—have been "pruned" to build this filter list. It might be tempting to stop injecting his favorite pervasive words just below thresholds...

But wait, it seems that Amastan is contributing as always:
- account created recently
- native speaker of Kabyle/Berber/French/English/Arabic
- translates almost exclusively Amastan's English sentences

Please admins, don't let Amastan keep up his vandalism under a new identity!

lbdx lbdx ٢٩ نوفمبر ٢٠٢٣, edited ٢٩ نوفمبر ٢٠٢٣ ٢٩ نوفمبر ٢٠٢٣ ٧:٠٢:٥٣ ص UTC, edited ٢٩ نوفمبر ٢٠٢٣ ٧:٤٦:٤٧ ص UTC link Permalink

# Small tip

Note that thanks to gillux's work, you can download any list of sentences (translated into any language) directly from tatoeba.org: https://tatoeba.org/en/sentence...wnload/171446. Go to the page of the list of your choice and click on the "Download this list" icon.

You can then import this list into Anki, keeping just one translation per sentence. And if you also want audios, you can generate them automatically using the HyperTTS add-on: https://ankiweb.net/shared/info/111623432.

lbdx lbdx ٢٨ نوفمبر ٢٠٢٣, edited ٢٨ نوفمبر ٢٠٢٣ ٢٨ نوفمبر ٢٠٢٣ ١٢:٣٢:٤٠ م UTC, edited ٢٨ نوفمبر ٢٠٢٣ ١٢:٣٨:٣٥ م UTC link Permalink

The "Pruned English" list provides a more varied and balanced view of Tatoeba. It gathers all sentences of maximum 15 words that do not contain some words or sequences of words algorithmically classified as "pervasive". The pervasiveness of a text fragment is a function of its frequency, overrepresentation and informativeness. This simple filter eliminates almost half of the English sentences.

The pervasive words detected are: tom, mary, ziri, sami, yanni, rima, layla, skura, mennad, algeria, boston, berber, french, fadil, algiers, kabylie, kabyle, boldi, tatoeba, baya, algerian, benedito, edmundo, flavio, damiano, nuja, kalman, swim, fyodor, janos, leonid, adriano, miroslav, gabor, dmitri, gustavo, martino, gunter, esperanto, walid, algerians, rodrigo, oleg, lukas, tobias, bicycle, elias, igor, claudio, isabella, lorenzo, alberto, boris, santiago, amelia, ivan, yuri, karl, mosque, vladimir, farid, chess, medlars, bejaia, pietro, quran, windshield, yidir, lajos, bouteflika, giraffes, tebboune, silya, mina, coronavirus, couscous, taninna, taller, hijab, pona, sahara, figs, jayjay, salima, fluently, heathers, kabyles, hurried, berbers, suitcases, fluent, yazid, bakir, shahada, ewe, hyena, toki, homesick, swam, dania, dung, ticklish, centipede, sociopaths, marika, punctual, tagalog, saxophone, raining, stefan, eaten, o'clock, carlos, giraffe, yen, rained, lojban, jugurtha, kyoto, snowing, daphnis, barking, fuji, snowed, hokkaido, yiddish, uranus, islam, maltese.

The pervasive sequences of 3 non-pervasive words are: 'to do that', 'said that he', 'do that by', "don't think that", 'told me that', 'that he thought', 'able to do', 'that by himself', 'go to australia', 'he thought that', 'said that they', 'do that today', 'i wonder whether', 'need to do', 'me that he', 'that by herself', 'said that she', 'needed to do', 'do that again', 'told me he', 'said he thought', "that he didn't", 'do that for', "didn't know that", "didn't do that", 'should do that', 'do that anymore', 'they said that', "said he didn't", 'could do that', 'needs to do', 'me that they', 'told me they', "didn't think that", 'they said they', 'that they were', 'told me she', "won't do that", 'me that she', 'thought that you', 'not to do', "didn't seem to", "didn't need to", 'seemed to be'.

Feel free to browse this list at https://tatoeba.org/en/sentence...&unapproved=no

lbdx lbdx ٤ نوفمبر ٢٠٢٣ ٤ نوفمبر ٢٠٢٣ ٢:٢٨:٣٢ م UTC link Permalink

** November 2023 Updates **

- Tatominer https://tatominer.netlify.app
- Tatolead https://tatolead.netlify.app
- Spread by Tatoebans ✨ https://tatoeba.org/en/sentences_lists/show/170280
- Rated as 'not OK' 🔴 https://tatoeba.org/en/sentences_lists/show/170380
- Rated as 'unsure' 🟠 https://tatoeba.org/en/sentences_lists/show/170383
- Pruned English ✂️ https://tatoeba.org/en/sentences_lists/show/171182
- JMdict - Japanese 🇯🇵 https://tatoeba.org/en/sentences_lists/show/171073
- JMdict - English 🇬🇧 https://tatoeba.org/en/sentences_lists/show/171072

More information about these tools at my profile page: https://tatoeba.org/en/user/profile/lbdx

lbdx lbdx ٧ أكتوبر ٢٠٢٣ ٧ أكتوبر ٢٠٢٣ ٣:٥٦:٥٤ م UTC link Permalink

** October 2023 Updates **

- Tatominer https://tatominer.netlify.app
- Tatolead https://tatolead.netlify.app
- Spread by Tatoebans ✨ https://tatoeba.org/en/sentences_lists/show/170280
- Rated as 'not OK' 🔴 https://tatoeba.org/en/sentences_lists/show/170380
- Rated as 'unsure' 🟠 https://tatoeba.org/en/sentences_lists/show/170383
- Pruned English ✂️ https://tatoeba.org/en/sentences_lists/show/171182
- JMdict - Japanese 🇯🇵 https://tatoeba.org/en/sentences_lists/show/171073
- JMdict - English 🇬🇧 https://tatoeba.org/en/sentences_lists/show/171072

More information about these tools at my profile page: https://tatoeba.org/en/user/profile/lbdx

lbdx lbdx ٢ سبتمبر ٢٠٢٣ ٢ سبتمبر ٢٠٢٣ ١٢:٢٦:١٧ م UTC link Permalink

** September 2023 Updates **

- Tatominer https://tatominer.netlify.app
- Tatolead https://tatolead.netlify.app
- Spread by Tatoebans ✨ https://tatoeba.org/en/sentences_lists/show/170280
- Rated as 'not OK' 🔴 https://tatoeba.org/en/sentences_lists/show/170380
- Rated as 'unsure' 🟠 https://tatoeba.org/en/sentences_lists/show/170383
- Pruned English ✂️ https://tatoeba.org/en/sentences_lists/show/171182
- JMdict - Japanese 🇯🇵 https://tatoeba.org/en/sentences_lists/show/171073
- JMdict - English 🇬🇧 https://tatoeba.org/en/sentences_lists/show/171072

More information about these tools at my profile page: https://tatoeba.org/en/user/profile/lbdx

lbdx lbdx ٥ أغسطس ٢٠٢٣ ٥ أغسطس ٢٠٢٣ ٣:٠٨:٢٧ م UTC link Permalink

Je suis ravi de te rendre service. #8708717

lbdx lbdx ٥ أغسطس ٢٠٢٣ ٥ أغسطس ٢٠٢٣ ٢:٥٤:٠٠ م UTC link Permalink

** August 2023 Updates **

I've just updated a few things that I built for Tatoeba:
- Tatominer https://tatominer.netlify.app
- Tatolead https://tatolead.netlify.app
- Spread by Tatoebans ✨ https://tatoeba.org/en/sentences_lists/show/170280
- Rated as 'not OK' 🔴 https://tatoeba.org/en/sentences_lists/show/170380
- Rated as 'unsure' 🟠 https://tatoeba.org/en/sentences_lists/show/170383
- Pruned English ✂️ https://tatoeba.org/en/sentences_lists/show/171182
- JMdict - Japanese 🇯🇵 https://tatoeba.org/en/sentences_lists/show/171073
- JMdict - English 🇬🇧 https://tatoeba.org/en/sentences_lists/show/171072

More information about these tools at my profile page: https://tatoeba.org/en/user/profile/lbdx

lbdx lbdx ٣١ يوليو ٢٠٢٣, edited ٣١ يوليو ٢٠٢٣ ٣١ يوليو ٢٠٢٣ ١:٤١:٤٩ م UTC, edited ٣١ يوليو ٢٠٢٣ ٤:٥٣:١٨ م UTC link Permalink

On Tatoeba, the vandalism of a few slowly outweighs the genuine efforts of the many 😢