menu
Tatoeba
language
注册 登录
language 中文(大陆简体)
menu
Tatoeba

chevron_right 注册

chevron_right 登录

浏览

chevron_right 随机句子

chevron_right 选择语言

chevron_right 选择列表

chevron_right 选择标签

chevron_right 选择音频

社群

chevron_right 留言板

chevron_right 用户列表

chevron_right 用户的语言

chevron_right 母语者

search
clear
swap_horiz
search

留言板(7,209个话题)

小贴士

提问之前先确定已经阅读了常见问题解答

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

最新留言 subdirectory_arrow_right

EugeneGS

14小时前

subdirectory_arrow_right

Ooneykcall

14小时前

subdirectory_arrow_right

LeviHighway

14小时前

subdirectory_arrow_right

frpzzd

18小时前

feedback

sharptoothed

18小时前

subdirectory_arrow_right

EugeneGS

1天前

subdirectory_arrow_right

Thanuir

1天前

feedback

LeviHighway

2天前

subdirectory_arrow_right

LeviHighway

2天前

subdirectory_arrow_right

LeviHighway

3天前

2小时前 2025年11月10日 UTC 上午9:21:50 link 永久链接
warning

该消息的内容违反了我们的规定 ,因此它是隐藏的。它只对管理员和消息的发布者显示可见。

sacredceltic sacredceltic 4天前 2025年11月5日 UTC 下午6:35:38 flag Report link 永久链接

On dirait que le fonctionnement des langues par défaut, pour les phrases insérées, a changé.
J'ai beau sélectionner "détection automatique", toutes les phrases que j'insère en anglais sont immédiatement identifiées comme des phrases en français, ce qui est parfaitement stupide.

{{vm.hiddenReplies[41393] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
gillux gillux 4天前 2025年11月6日 UTC 上午10:53:12 flag Report link 永久链接

Rien n’a changé à ce niveau, si ce n’est que le modèle sur lequel s’appuie la détection des langues est mis à jour chaque semaine sur la base du corpus de Tatoeba (modulo les phrases étiquetées @wrong flag). Le modèle n’est jamais été parfait, notamment sur les phrases courtes.

{{vm.hiddenReplies[41400] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
LeviHighway LeviHighway 2天前 2025年11月7日 UTC 下午1:50:41 flag Report link 永久链接

Can I learn more about the model? When I add Mandarin sentences, the model always detect it to be Cantonese. I know Mandarin and Cantonese are extremely close, so I never use the Detect function at all.

{{vm.hiddenReplies[41412] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
Thanuir Thanuir 1天前 2025年11月9日 UTC 上午7:30:06 flag Report link 永久链接

Jos sinulla on isompi ja pienempi kieli jotka ovat hyvin samankaltaisia, ja lisäät lauseen pienempään, saattaa se olla algoritmin mielestä lähempänä isomman kielen lauseita.

Jos lauseessa on pienemmän kielen erityispiirteitä (joita suuremmassa ei ole), näin tapahtuu harvemmin.

{{vm.hiddenReplies[41414] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
EugeneGS EugeneGS 1天前, edited 22小时前 2025年11月9日 UTC 上午9:13:00, edited 2025年11月9日 UTC 下午1:05:39 flag Report link 永久链接

Maybe there's also something wrong with the model architecture. I trained a few models myself — one on all Tatoeba data and one only on Mandarin and Cantonese — and both correctly detected about 97% of cases (checked on validation and full datasets).

What's strange is that the Tatoeba model seems to prefer Cantonese, even though it has fewer sentences than Mandarin.

Edit: I have tried another architecture with transformer layers (my first models had LSTM layers). After training on whole Tatoeba database it gave 82% accuracy.

{{vm.hiddenReplies[41415] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
frpzzd frpzzd 18小时前, edited 18小时前 2025年11月9日 UTC 下午5:47:59, edited 2025年11月9日 UTC 下午5:48:08 flag Report link 永久链接

Is your model training/testing code available online anywhere? If so, I would love to take a look for my own edification, since I've been learning about such topics recently.

{{vm.hiddenReplies[41418] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
EugeneGS EugeneGS 14小时前 2025年11月9日 UTC 下午9:35:22 flag Report link 永久链接

I've uploaded it on GitHub. The code can be used for pretty much any text classification task.
I honestly didn't expect anyone to be interested, so I'm glad you asked! Some comments in the code might not be super helpful, but if anything's unclear, feel free to reach out via private messages.

https://github.com/kilsense/Tex...2f07/main/LSTM

LeviHighway LeviHighway 14小时前 2025年11月9日 UTC 下午8:56:51 flag Report link 永久链接

lol I correct myself, it's not *always* Cantonese, but it's pretty frequent. I noticed that most Cantonese sentences on Tatoeba are very long sentences, I guess that affected the model.

{{vm.hiddenReplies[41419] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
Ooneykcall Ooneykcall 14小时前 2025年11月9日 UTC 下午9:09:39 flag Report link 永久链接

I've noticed there are some weird accounts adding many, usually long, Cantonese sentences often as translations from other languages including Russian (that's why I noticed it), whose quality I suspect is questionable, but unfortunately there are no active native speakers of Cantonese at the moment that could be dealing with that.

sharptoothed sharptoothed 18小时前 2025年11月9日 UTC 下午4:57:41 flag Report link 永久链接

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

1天前 2025年11月9日 UTC 上午9:29:31 link 永久链接
warning

该消息的内容违反了我们的规定 ,因此它是隐藏的。它只对管理员和消息的发布者显示可见。

LeviHighway LeviHighway 2天前 2025年11月7日 UTC 下午2:31:54 flag Report link 永久链接

Please enable Traditional-Simplified convertion to Literary Chinese. Currently, people are contributing in either Traditional or Simplified characters. So I think it needs convertion just like Mandarin Chinese.

LeviHighway LeviHighway 3天前 2025年11月7日 UTC 上午4:16:50 flag Report link 永久链接

Does anyone know any website that is similar to the Tatoeba mechanism but is for vocabularies? I know Glosbe but it seems they doesn't ensure quality at all.

{{vm.hiddenReplies[41406] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
frpzzd frpzzd 3天前 2025年11月7日 UTC 上午5:16:08 flag Report link 永久链接

I second this question. I've also seen Glosbe but haven't contributed to it myself because (1) it is not open source, and (2) it does not (as far as I'm aware) allow for bulk data download.

This is probably not what you want because it is mainly between German and other European languages, but I like dict.cc because they allow you to download the dictionary data in its entirety (but it must be requested by email).
https://www.dict.cc

And of course, there is always Wikitionary.

What do you have in mind exactly with "ensuring quality"? Even here on Tatoeba there seems to be quite a bit of debate sometimes when it comes to correcting sentences.

{{vm.hiddenReplies[41408] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
LeviHighway LeviHighway 3天前 2025年11月7日 UTC 上午6:03:48 flag Report link 永久链接

well Glosbe does not have a community/comment function, and I never managed to contact their staff. it's not organized at all, all contributions are kept so it's a total mess. you have no control of anything, your contribution might be hidden at the bottom etc. Tatoeba is much better. I don't like (Chinese) Wiktionary, because it's so complicated, contributing to one entry usually takes a a day.

3天前 2025年11月7日 UTC 上午9:26:17 link 永久链接
warning

该消息的内容违反了我们的规定 ,因此它是隐藏的。它只对管理员和消息的发布者显示可见。

LeviHighway LeviHighway 3天前 2025年11月7日 UTC 上午4:27:41 flag Report link 永久链接

the only Chinese corpse maintainer is inactive. how should we deal with hundreds of Chinese sentences that need to be changed?
https://tatoeba.org/zh-cn/tags/...direction=desc

PaulP PaulP 4天前 2025年11月6日 UTC 上午5:03:01 flag Report link 永久链接

Today the following question appeared in the Tatoeba Facebook group:

"Why was the Tatoeba app deleted from the Play Store?"

As far as I can remember, that app was a private initiative of one of our contributors, and nothing from the Tatoeba staff, right?

{{vm.hiddenReplies[41398] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
gillux gillux 4天前 2025年11月6日 UTC 上午11:07:52 flag Report link 永久链接

That’s correct, there was never an official Tatoeba app. By the way, I can’t access this group (probably because I don’t have a Facebook account), but I am curious how much the group is used.

{{vm.hiddenReplies[41402] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
PaulP PaulP 3天前 2025年11月6日 UTC 下午2:11:27 flag Report link 永久链接

The group was rather popular in the time when Ryck Vernaut was its admin. But after he passed away nobody really supported it. I follow the activity, and when there is something special – like now, about the app – I try to respond.

vowelharmony vowelharmony 4天前 2025年11月5日 UTC 下午7:29:25 flag Report link 永久链接

hello there! I wonder if it is possible to implement a way to report an account, rather than a sentence or a comment? there are a lot of spam accounts on the website and most of them have no contributions. I am sure that admins are tackling this problem but I believe that it would be better if we were able to report them as well

please keep in mind that I am not an experienced contributor, there may be a way to do it but I wasn't able to find such a functionality. thank you

{{vm.hiddenReplies[41396] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
PaulP PaulP 4天前 2025年11月6日 UTC 上午4:59:29 flag Report link 永久链接

You can report them to community-admins@tatoeba.org
Mostly within a few hours the account is deleted.

gillux gillux 4天前 2025年11月6日 UTC 上午11:04:59 flag Report link 永久链接

Thank you for offering your help to deal with spam accounts. At the moment, there are so many of them created every day that I don’t think it is very helpful to deal with them by means of user report. You would have to go through hundreds, if not thousands of them. This is why the functionality to report an account is not there yet. But maybe you want to report some more specific accounts, like old accounts having contributions?

We are trying to first reduce the flow of spam account to a manageable level. You can follow the progress here https://github.com/Tatoeba/tatoeba2/issues/1613, and you are welcome to contribute to this discussion here or on GitHub.

vowelharmony vowelharmony 3天前 2025年11月6日 UTC 下午12:11:35 flag Report link 永久链接

@PaulP, @gillux: thank you for the assistance! I'm aware that one can directly report them to the admins but as @gillux said, there are indeed a lot of spam accounts and it wouldn't be convenient to manually report all of them

I still believe that the ability to report an account may be a useful functionality to have, at least after the amount of spammers goes down to a somewhat manageable level. will make sure to follow the discussion on GitHub as well