Paret - Tatoeba

Wall (7366 threads)

Astúcias

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Darrièrs messatges

feedback

CK

5 hours ago

subdirectory_arrow_right

LeviHighway

22 hours ago

subdirectory_arrow_right

LeviHighway

22 hours ago

subdirectory_arrow_right

ssvb

1 days ago

subdirectory_arrow_right

zogwarg

1 days ago

subdirectory_arrow_right

AlanF_US

1 days ago

subdirectory_arrow_right

Vortarulo

1 days ago

subdirectory_arrow_right

DostKaplan

6 days ago

subdirectory_arrow_right

AlanF_US

6 days ago

subdirectory_arrow_right

AlanF_US

6 days ago

CK 5 hours ago April 24, 2026 at 8:31:01 AM UTC

flag

Report

link

Permalink

** Early History of the Tatoeba Project **

http://a4esl.org/temporary/tatoeba/history/

This may be interesting to both old and new members.

Up to 2012. It's an old webpage that I made years ago.

LeviHighway 7 days ago April 17, 2026 at 7:47:48 AM UTC

flag

Report

link

Permalink

I have always got this idea for Tatoeba but I understand how hard it is to achieve or it might doesn't fit the purpose of Tatoeba. But here I want to share it:
I wish that there is a "dictionary" function for Tatoeba. It doesn't mean you have to add definitions or etymology to words, but it should be a translation dictionary.
Here's how I think it should work:
When you search for "Chinese" via this dictionary function, you might get a few sentences like this:
I am Chinese. - 我是中國人。
I speak Chinese. - 我說中文。
I wish the contributors can tag how the word "Chinese" is translated into the target language. for example, you can mark:
I am [Chinese]. - 我是[中國人]。
I speak [Chinese]. - 我說[中文]。
This way, we can get a statistic how the word "Chinese" is translated into the target language. For example, when you look up "Chinese", it can tell you that "Chinese" is most usually translated into "中國人", and then followed by "中文".
I think Tatoeba is the best place to do this. Wiktionary is too complicated, and it can never provide as many sentences as Tatoeba.

hide replies show replies

zogwarg 1 days ago, edited 1 days ago April 23, 2026 at 8:04:08 AM UTC, edited April 23, 2026 at 8:04:54 AM UTC

flag

Report

link

Permalink

I think there are many limits to make this work with a "Tag" system:
- It would require a lot of extra work from contributors.
- For most language pairs, there is very poor word for word matching anyway. eg: [#3378275] > [#4919050]

I think pragmatically using advanced search, if can understand both the source and target languages you can already get a good idea of how a word gets translated on tatoeba (tatoeba is not necessarily representative):

https://tatoeba.org/en/sentence...rd_count_min=1

Interestingly for "Chinese" the breakdown seems to more be like:
148 "中文"
63 "汉语"
58 "中国"
42 "中国人"
18 "漢語"
16 "汉字"
16 "中國人"
16 "中國"
14 "漢字"
11 "中"
10 "中餐"
7 "普通话"
4 "国"
2 "华人"
2 "中式"
1 "華"
1 "繁体字"
1 "简体字"
1 "汉"
1 "本地"
1 "國字"
1 "华"
1 "中菜"
1 "中华"
19 "other (mostly implicitly Chinese things)"

hide replies show replies

LeviHighway 22 hours ago April 23, 2026 at 3:01:50 PM UTC

flag

Report

link

Permalink

I think for such sentences that has poor word to word matching, we should simply add a button to mark them as "implied" or something.

For the example of [#3378275] > [#4919050], we can not match "Chinese" to anything, so we can mark it as "implied". However, we can still create matching like [Chinese characters] > [字] and [characters] > [字]

One of the reasons why I give this idea is that, some people may not be able to understand both languages well. So a matching of how words are translated could help them understand the sentence structures easier.

ssvb 1 days ago April 23, 2026 at 12:19:42 PM UTC

flag

Report

link

Permalink

> I think Tatoeba is the best place to do this. Wiktionary is too complicated,

If Tatoeba implements your suggestions, then it will become more complicated. Possibly even as much complicated as Wiktionary.

BTW, you are not required to provide etymology when contributing to Wiktionary. A lot of other information is also optional. You can try to figure out what is the minimal barebone entry for a Chinese word and maybe discuss this with the other contributors if something is not clear. Start from here: https://en.wiktionary.org/wiki/...try_guidelines

I guess, one of the challenges is that Wiktionary treats Chinese as a big family of many similar languages or subdialects (Mandarin, Cantonese, Wu and the others), each having its own distinctive features. And this may cause friction, because some people may be in favour of eradicating the differences for the sake of simplification and unification, while the other people may be interested in preserving their local dialect as their precious cultural heritage.

> and it can never provide as many sentences as Tatoeba.

Wiktionary surely has stricter quality requirements for its content, so it indeed can't match Tatoeba's quantity of sentences. That said, the Tatoeba's content is also categorized via tags and it's possible to filter out dubious sentences (contributed by non-native speakers, etc.).

hide replies show replies

LeviHighway 22 hours ago, edited 22 hours ago April 23, 2026 at 2:48:16 PM UTC, edited April 23, 2026 at 2:50:59 PM UTC

flag

Report

link

Permalink

I don’t frequently contribute to the English Wiktionary, which seems more streamlined due to its large community of maintainers. However, as a contributor to the Chinese Wiktionary, I find the infrastructure much more challenging. Many templates and modules are incomplete or overly complex; editing a single entry often takes hours and is prone to errors.

Furthermore, Chinese grammar is often a subject of intense debate, frequently leading to edit wars. I’ve also noticed inaccuracies in Korean and Chinese entries, particularly when they are translated from English by contributors who may not be proficient in the target language.

My point is that we could effectively build a translation dictionary within Tatoeba. For instance, when looking up a word, seeing both the keyword and its translation in bold—similar to how some dictionaries highlight equivalents—would greatly benefit learners in understanding sentence structures.

Regarding the preservation of Chinese languages, Tatoeba handles this exceptionally well. By treating Mandarin, Cantonese, Wu, and others as distinct entities, it helps maintain the linguistic integrity and "purity" of each language.

1 days ago April 23, 2026 at 11:27:55 AM UTC

link

Permalink

warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

1 days ago April 23, 2026 at 2:29:34 AM UTC

link

Permalink

warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

Vortarulo 10 days ago April 14, 2026 at 12:52:53 AM UTC

flag

Report

link

Permalink

Is it acceptable for a member who is not fluent in a language (but understands it to a good extend) to use Google Translate (or similar) to translate sentences into that language?
For instance, could someone with an A2 knowledge of Polish use GT to translate sentences from English to Polish, proof-reading them for obvious mistakes, and then putting them into the corpus?

hide replies show replies

LeviHighway 10 days ago April 14, 2026 at 1:58:14 AM UTC

flag

Report

link

Permalink

I think machine translation is acceptable when you can assure its quality. Also, not to mention there're a lot of native speakers write bad translations, and those are not even as good as machine translations.

Thanuir 10 days ago April 14, 2026 at 6:06:02 AM UTC

flag

Report

link

Permalink

En näe tämän tuottavan merkittävää lisäarvoa tietokannalle. Suosittelisin mieluummin kääntämään puolasta omalle äidinkielelle tai muuten riittävän vahvalle kielelle, että voi mennä takuuseen omista lauseistaan, ja linkittämään ymmärtämiään puolalaisia lauseita muunkielisiin, jotka myös ymmärtää.

EugeneGS 10 days ago April 14, 2026 at 6:15:41 AM UTC

flag

Report

link

Permalink

I think it is acceptable. Translators (such as GT or DeepL) nowadays are pretty good. Also, it is not such a problem if that person translates simple sentences or translates into language that is pretty similar to their well-known languages.

CK 10 days ago April 14, 2026 at 6:50:35 AM UTC

flag

Report

link

Permalink

I would rather see people contributing in their own native languages.

Machine translation with AI has become much better than it used to be, so I can understand the temptation to use it.

Using AI to translate from a foreign language you know into your own native language might help you think of wordings you might not have otherwise thought of.

If you, as a native speaker, verify that it sounds natural, and if you know the source language well enough to verify that the meaning is the same, I think AI can be a useful tool.

hide replies show replies

ssvb 10 days ago April 14, 2026 at 7:19:16 AM UTC

flag

Report

link

Permalink

Verifying that the meaning is the same is highly problematic. That's why I started translating sentences with the "by Arthur Conan Doyle" tag and submitted by a native English speaker. The best part about them is that I can look up context (the text before and after them) by searching them on the Internet. This is much better than many of the other short sentences on Tatoeba that are ambiguous regarding the gender of the speaker or other things.

hide replies show replies

ssvb 10 days ago April 14, 2026 at 7:27:00 AM UTC

flag

Report

link

Permalink

> I started translating sentences with the "by Arthur Conan Doyle" tag

Oh, and to make things perfect, I would like to have CC0 license for both the English sentences and my translations. The works of Arthur Conan Doyle are public domain now. Is this possible?

marafon 9 days ago April 14, 2026 at 8:58:08 PM UTC

flag

Report

link

Permalink

I agree.

marafon 9 days ago April 15, 2026 at 11:06:47 AM UTC

flag

Report

link

Permalink

Contributing in a language that is not your strongest
https://en.wiki.tatoeba.org/art...ow/non-native#

hide replies show replies

AlanF_US 9 days ago, edited 9 days ago April 15, 2026 at 12:47:25 PM UTC, edited April 15, 2026 at 12:56:03 PM UTC

flag

Report

link

Permalink

Thanks for that link, @marafon. I also encourage people to consult the Rules and Guidelines ( https://en.wiki.tatoeba.org/art...ow/guidelines# ). To find it in the future, go to the bottom of any Tatoeba page and click on "Tatoeba Wiki". The first section on that page contains a link to the "Rules and Guidelines" page.

Tatoeba's mission is to serve as a source of high-quality sentences and high-quality translations. This is ensured by having them written, verified, and owned by humans who know the languages well enough to avoid introducing any mistakes, including subtle ones, not just the obvious ones mentioned by @Vortarulo.

If Tatoeba starts also acting as a consumer of sentences that do not come directly, transparently, and legally from humans, we run the very real risk of generating a cycle in which we pass these poor-quality or misappropriated sentences on to the people who get them from us.

Vortarulo 8 days ago April 15, 2026 at 5:26:31 PM UTC

flag

Report

link

Permalink

Thanks for your answers, everyone, especially the link to the Guidelines, AlanF_US. I had looked for them but didn't find them. This also goes with my impression.

The question wasn't about me, by the way, but about a user who contributes quite a lot here, but apparently with mostly(?) GT-translated sentences. Perhaps I should point them to the Guidelines.

hide replies show replies

AlanF_US 8 days ago April 16, 2026 at 12:07:41 PM UTC

flag

Report

link

Permalink

How do you know that the translations are from Google Translate?

hide replies show replies

ssvb 8 days ago April 16, 2026 at 1:30:54 PM UTC

flag

Report

link

Permalink

> How do you know that the translations are from Google Translate?

It's not a rocket science. Sometimes a translation is obviously wrong. And when the original sentence is fed into Google Translate, the bad translation precisely matches the Google Translate output.

hide replies show replies

PaulP 7 days ago April 17, 2026 at 5:23:59 AM UTC

flag

Report

link

Permalink

> Sometimes a translation is obviously wrong. And when the original sentence is fed into Google Translate, the bad translation precisely matches the Google Translate output.

Yes, that's how I detect AI translations too.
And secondly, when a user adds 10 or more translations in different languages, it's hard to believe that they speak so many languages.

hide replies show replies

kumakyoo 7 days ago April 17, 2026 at 6:57:00 AM UTC

flag

Report

link

Permalink

Just as a sidenote: I recently translated a sentence containing "Tatoeba" with DeepL and in the translation "Tatoeba" was replaced by "wiktionary".

AlanF_US 6 days ago April 17, 2026 at 1:43:11 PM UTC

flag

Report

link

Permalink

First of all, if you see that a user is contributing a bad translation, then regardless of what you think the source of that translation is, you should be leaving a comment on the translation and tagging it for action (for instance, "@change") if you can. If you see that the user habitually contributes bad translations, and contacting the user is either not feasible or has no effect, you should send a private message or an email to a corpus maintainer, an admin, or the admin team.

There are several ways to come to the conclusion that someone is using AI. One is that the person says so outright. Others are along the lines of what @ssvb and @PaulP have mentioned, namely that a bad translation happens to match, say, Google Translate's output. While this is not proof, if the evidence is pretty strong (for instance, if there are a lot of similar incidences), it's also worth mentioning to the user and/or an administrator.

I urge people to read the Rules and Guidelines in general. I've submitted an issue to request that a link to the document be added to the footer on each Tatoeba page to make it easier to find.

Vortarulo 1 days ago April 22, 2026 at 10:50:46 PM UTC

flag

Report

link

Permalink

I checked one of the more complicated sentences (it was in Latin, I believe), because it seemed a bit special. Then I saw that the Google Translate version was exactly the same... and so were Greek, Italic, Galician, Portuguese, Spanish, Turkish, etc. etc.
I asked the user, and they admitted using GT.

hide replies show replies

AlanF_US 1 days ago April 22, 2026 at 11:37:38 PM UTC

flag

Report

link

Permalink

> I asked the user, and they admitted using GT.

Then they should stop.

2 days ago April 22, 2026 at 7:53:32 AM UTC

link

Permalink

warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

DostKaplan 11 days ago, edited 10 days ago April 12, 2026 at 2:21:20 PM UTC, edited April 14, 2026 at 9:57:27 AM UTC

flag

Report

link

Permalink

What is the regexp to search for sentences (Turkish to English) containing " ten" (with a leading space)?

xxxxxx ten xxxxxxx ✅
xxxxxx'ten xxxxxxx 👎🏼
xxxxxxten xxxxxxxx 👎🏼

hide replies show replies

AlanF_US 6 days ago, edited 6 days ago April 17, 2026 at 2:13:35 PM UTC, edited April 17, 2026 at 2:19:59 PM UTC

flag

Report

link

Permalink

I assume you're asking about an expression accepted by Tatoeba's integrated search function, which is similar but not identical to a regular expression ("regexp").

I don't believe there is a way to write an expression that finds sentences where "ten" is a standalone word but excludes sentences containing a word that ends with "ten" preceded by an apostrophe. The reason is that the tokenization and search split words not only at word boundaries indicated by spaces, but also at punctuation marks (including apostrophe), which are then discarded. One hacky way to get what you want would be to search for "=ten" and then use the browser's search function with "Whole Words" enabled to search through the results.

To get the full functionality you're looking for, I think you'd have to download the sentences you want and then use a tool (such as a text editor's search function with regular expression search enabled) that gives you this level of control.

hide replies show replies

DostKaplan 6 days ago April 17, 2026 at 3:15:51 PM UTC

flag

Report

link

Permalink

"=ten" would be great if it works as expected (return sentences with standalone "ten"). Unfortunately, it also returns sentences containing "'ten" (with a leading apostrophe).

I want Sentence #4940054:
Onun güzel bir ten rengi var.

I don't want Sentence #5512986:
2013'ten beri buradayız.

Luckily there are only 3 pages of results, so I can just scroll through and eyeball them. But it would have been nice to be able to get only "ten" and not "'ten".

8 days ago April 16, 2026 at 5:55:23 AM UTC

link

Permalink

warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

11 days ago April 13, 2026 at 10:47:28 AM UTC

link

Permalink

warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

13 days ago April 11, 2026 at 2:11:03 AM UTC

link

Permalink

warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

Wall (7366 threads)

Astúcias

CK

LeviHighway

LeviHighway

ssvb

zogwarg

AlanF_US

Vortarulo

DostKaplan

AlanF_US

AlanF_US

Besonh d'ajuda ?

Desvolopaires

A prepaus