menu
تاتیبہ
language
رجسٹر لاگ ان
language سرائیکی
menu
تاتیبہ

chevron_right رجسٹر

chevron_right لاگ ان

براؤز

chevron_right رینڈم جملے ݙکھاؤ

chevron_right زبان نال براؤز کرو

chevron_right تندیر نال براؤز کرو

chevron_right ٹیگ نال براؤز کرو

chevron_right آڈیو براؤز کرو

برادری

chevron_right وال

chevron_right سارے ممبراں دی تندیر

chevron_right ممبراں دیاں زباناں

chevron_right مقامی ٻولݨ آ لے

search
clear
swap_horiz
search

وال (ہک تند)

گُر

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

تازہ ترین سنیہے feedback

sharptoothed

ہک منٹ پہلے

subdirectory_arrow_right

EugeneGS

ہک گھنٹہ پہلے

subdirectory_arrow_right

Thanuir

ہک گھنٹہ پہلے

feedback

LeviHighway

کل

subdirectory_arrow_right

LeviHighway

کل

subdirectory_arrow_right

LeviHighway

کل

subdirectory_arrow_right

frpzzd

کل

feedback

LeviHighway

کل

feedback

LeviHighway

کل

subdirectory_arrow_right

PaulP

کل

sharptoothed sharptoothed ہک منٹ پہلے November 9, 2025 at 4:57:41 PM UTC flag Report link پرمالنک

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

ہک گھنٹہ پہلے November 9, 2025 at 9:29:31 AM UTC link پرمالنک
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

sacredceltic sacredceltic کل November 5, 2025 at 6:35:38 PM UTC flag Report link پرمالنک

On dirait que le fonctionnement des langues par défaut, pour les phrases insérées, a changé.
J'ai beau sélectionner "détection automatique", toutes les phrases que j'insère en anglais sont immédiatement identifiées comme des phrases en français, ce qui est parfaitement stupide.

{{vm.hiddenReplies[41393] ? 'expand_more' : 'expand_less'}} جواب لکاؤ جواب ݙکھاؤ
gillux gillux کل November 6, 2025 at 10:53:12 AM UTC flag Report link پرمالنک

Rien n’a changé à ce niveau, si ce n’est que le modèle sur lequel s’appuie la détection des langues est mis à jour chaque semaine sur la base du corpus de Tatoeba (modulo les phrases étiquetées @wrong flag). Le modèle n’est jamais été parfait, notamment sur les phrases courtes.

{{vm.hiddenReplies[41400] ? 'expand_more' : 'expand_less'}} جواب لکاؤ جواب ݙکھاؤ
LeviHighway LeviHighway کل November 7, 2025 at 1:50:41 PM UTC flag Report link پرمالنک

Can I learn more about the model? When I add Mandarin sentences, the model always detect it to be Cantonese. I know Mandarin and Cantonese are extremely close, so I never use the Detect function at all.

{{vm.hiddenReplies[41412] ? 'expand_more' : 'expand_less'}} جواب لکاؤ جواب ݙکھاؤ
Thanuir Thanuir ہک گھنٹہ پہلے November 9, 2025 at 7:30:06 AM UTC flag Report link پرمالنک

Jos sinulla on isompi ja pienempi kieli jotka ovat hyvin samankaltaisia, ja lisäät lauseen pienempään, saattaa se olla algoritmin mielestä lähempänä isomman kielen lauseita.

Jos lauseessa on pienemmän kielen erityispiirteitä (joita suuremmassa ei ole), näin tapahtuu harvemmin.

{{vm.hiddenReplies[41414] ? 'expand_more' : 'expand_less'}} جواب لکاؤ جواب ݙکھاؤ
EugeneGS EugeneGS ہک گھنٹہ پہلے، ایڈت تھیا ہک گھنٹہ پہلے November 9, 2025 at 9:13:00 AM UTC، ایڈت تھیا November 9, 2025 at 1:05:39 PM UTC flag Report link پرمالنک

Maybe there's also something wrong with the model architecture. I trained a few models myself — one on all Tatoeba data and one only on Mandarin and Cantonese — and both correctly detected about 97% of cases (checked on validation and full datasets).

What's strange is that the Tatoeba model seems to prefer Cantonese, even though it has fewer sentences than Mandarin.

Edit: I have tried another architecture with transformer layers (my first models had LSTM layers). After training on whole Tatoeba database it gave 82% accuracy.

LeviHighway LeviHighway کل November 7, 2025 at 2:31:54 PM UTC flag Report link پرمالنک

Please enable Traditional-Simplified convertion to Literary Chinese. Currently, people are contributing in either Traditional or Simplified characters. So I think it needs convertion just like Mandarin Chinese.

LeviHighway LeviHighway کل November 7, 2025 at 4:16:50 AM UTC flag Report link پرمالنک

Does anyone know any website that is similar to the Tatoeba mechanism but is for vocabularies? I know Glosbe but it seems they doesn't ensure quality at all.

{{vm.hiddenReplies[41406] ? 'expand_more' : 'expand_less'}} جواب لکاؤ جواب ݙکھاؤ
frpzzd frpzzd کل November 7, 2025 at 5:16:08 AM UTC flag Report link پرمالنک

I second this question. I've also seen Glosbe but haven't contributed to it myself because (1) it is not open source, and (2) it does not (as far as I'm aware) allow for bulk data download.

This is probably not what you want because it is mainly between German and other European languages, but I like dict.cc because they allow you to download the dictionary data in its entirety (but it must be requested by email).
https://www.dict.cc

And of course, there is always Wikitionary.

What do you have in mind exactly with "ensuring quality"? Even here on Tatoeba there seems to be quite a bit of debate sometimes when it comes to correcting sentences.

{{vm.hiddenReplies[41408] ? 'expand_more' : 'expand_less'}} جواب لکاؤ جواب ݙکھاؤ
LeviHighway LeviHighway کل November 7, 2025 at 6:03:48 AM UTC flag Report link پرمالنک

well Glosbe does not have a community/comment function, and I never managed to contact their staff. it's not organized at all, all contributions are kept so it's a total mess. you have no control of anything, your contribution might be hidden at the bottom etc. Tatoeba is much better. I don't like (Chinese) Wiktionary, because it's so complicated, contributing to one entry usually takes a a day.

کل November 7, 2025 at 9:26:17 AM UTC link پرمالنک
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

LeviHighway LeviHighway کل November 7, 2025 at 4:27:41 AM UTC flag Report link پرمالنک

the only Chinese corpse maintainer is inactive. how should we deal with hundreds of Chinese sentences that need to be changed?
https://tatoeba.org/zh-cn/tags/...direction=desc

PaulP PaulP کل November 6, 2025 at 5:03:01 AM UTC flag Report link پرمالنک

Today the following question appeared in the Tatoeba Facebook group:

"Why was the Tatoeba app deleted from the Play Store?"

As far as I can remember, that app was a private initiative of one of our contributors, and nothing from the Tatoeba staff, right?

{{vm.hiddenReplies[41398] ? 'expand_more' : 'expand_less'}} جواب لکاؤ جواب ݙکھاؤ
gillux gillux کل November 6, 2025 at 11:07:52 AM UTC flag Report link پرمالنک

That’s correct, there was never an official Tatoeba app. By the way, I can’t access this group (probably because I don’t have a Facebook account), but I am curious how much the group is used.

{{vm.hiddenReplies[41402] ? 'expand_more' : 'expand_less'}} جواب لکاؤ جواب ݙکھاؤ
PaulP PaulP کل November 6, 2025 at 2:11:27 PM UTC flag Report link پرمالنک

The group was rather popular in the time when Ryck Vernaut was its admin. But after he passed away nobody really supported it. I follow the activity, and when there is something special – like now, about the app – I try to respond.

vowelharmony vowelharmony کل November 5, 2025 at 7:29:25 PM UTC flag Report link پرمالنک

hello there! I wonder if it is possible to implement a way to report an account, rather than a sentence or a comment? there are a lot of spam accounts on the website and most of them have no contributions. I am sure that admins are tackling this problem but I believe that it would be better if we were able to report them as well

please keep in mind that I am not an experienced contributor, there may be a way to do it but I wasn't able to find such a functionality. thank you

{{vm.hiddenReplies[41396] ? 'expand_more' : 'expand_less'}} جواب لکاؤ جواب ݙکھاؤ
PaulP PaulP کل November 6, 2025 at 4:59:29 AM UTC flag Report link پرمالنک

You can report them to community-admins@tatoeba.org
Mostly within a few hours the account is deleted.

gillux gillux کل November 6, 2025 at 11:04:59 AM UTC flag Report link پرمالنک

Thank you for offering your help to deal with spam accounts. At the moment, there are so many of them created every day that I don’t think it is very helpful to deal with them by means of user report. You would have to go through hundreds, if not thousands of them. This is why the functionality to report an account is not there yet. But maybe you want to report some more specific accounts, like old accounts having contributions?

We are trying to first reduce the flow of spam account to a manageable level. You can follow the progress here https://github.com/Tatoeba/tatoeba2/issues/1613, and you are welcome to contribute to this discussion here or on GitHub.

vowelharmony vowelharmony کل November 6, 2025 at 12:11:35 PM UTC flag Report link پرمالنک

@PaulP, @gillux: thank you for the assistance! I'm aware that one can directly report them to the admins but as @gillux said, there are indeed a lot of spam accounts and it wouldn't be convenient to manually report all of them

I still believe that the ability to report an account may be a useful functionality to have, at least after the amount of spammers goes down to a somewhat manageable level. will make sure to follow the discussion on GitHub as well

LeviHighway LeviHighway کل November 5, 2025 at 3:40:55 PM UTC flag Report link پرمالنک

I think most of nonong's sentences should be deleted or rewritten. They're too long and when translating his sentences into other languages, they would usually go longer than the limit.

https://tatoeba.org/zh-cn/sentences/of_user/nonong

{{vm.hiddenReplies[41391] ? 'expand_more' : 'expand_less'}} جواب لکاؤ جواب ݙکھاؤ
Vortarulo Vortarulo کل November 5, 2025 at 4:21:22 PM UTC flag Report link پرمالنک

Agreed. Also, the length of the entries keeps users from attempting to translate them. Content-wise they're good sentences, mostly. Is there a way to somehow automatically split them up into single sentences?

{{vm.hiddenReplies[41392] ? 'expand_more' : 'expand_less'}} جواب لکاؤ جواب ݙکھاؤ
ZegPhig ZegPhig کل November 5, 2025 at 6:44:31 PM UTC flag Report link پرمالنک

It will not help. Firstly, it breaks the context. Maybe it keeps users from translating them, but when sentences have more content, it makes them much more interesting for readers and language learners. And secondly, it will not change anything, because this limit is still not mentioned in any guidelines or site rules and there is no any kind of character counter when we add sentences. So I don't even know when my text will be too long and will be cut because of it . And this problem will happen again and again. But even if we could change it, there will always someone who wants to write a text with all permitted characters. So I think that the number of permitted characters should be different for translations and original sentences (more for translations). To me that seems like the most correct solution.

{{vm.hiddenReplies[41394] ? 'expand_more' : 'expand_less'}} جواب لکاؤ جواب ݙکھاؤ
AlanF_US AlanF_US کل November 5, 2025 at 7:23:05 PM UTC flag Report link پرمالنک

> I think most of nonong's sentences should be deleted or rewritten. They're too long and when translating his sentences into other languages, they would usually go longer than the limit.

Sentences are deleted only when they break site guidelines, such as when they:
- attack community members
- break copyright or license rules
- are of poor quality and cannot easily be fixed

Sentences are generally rewritten on an individual basis, to correct errors that have been pointed out in comments. Breaking up a collection of sentences, either manually or automatically, is infeasible for a variety of reasons.

Long sentences create a bunch of problems. They tend to go uncorrected, so they often contain errors, subtle or otherwise. As their length increases, so does the probability that they will contain something that is difficult or impossible to translate, rendering the entire content untranslatable. And, as you have noticed, translations are likely to be long as well. Even if your translation does fit within the maximum number of characters, it's quite possible that someone who translates your sentence will not be able to do so within the limit.

There is a simple solution: don't translate long sentences.

gillux gillux کل November 6, 2025 at 11:42:59 AM UTC flag Report link پرمالنک

Tatoeba is not only for translations. It is also useful to find examples of usage in a single language only.

If other contributors are adding long sentences, it probably means they are useful to them. Maybe they are not useful to you, but it doesn’t mean they shouldn’t be on Tatoeba. What should or should not be on Tatoeba is a totally different thing than your own needs.

If you don’t want to see long sentences, you can filter them out from the search by setting a maximum number of words.