Wall - Tatoeba

Wall (7 373 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages

feedback

LeviHighway

1 hours ago

subdirectory_arrow_right

LeviHighway

2 hours ago

feedback

kylewii

1 days ago

subdirectory_arrow_right

Thanuir

2 days ago

subdirectory_arrow_right

LeviHighway

3 days ago

subdirectory_arrow_right

Kiwi_Sandwich

3 days ago

subdirectory_arrow_right

LeviHighway

3 days ago

feedback

Kiwi_Sandwich

3 days ago

subdirectory_arrow_right

Guybrush88

3 days ago

subdirectory_arrow_right

Ooneykcall

3 days ago

LeviHighway 1 hours ago 1 მაისი, 2026, 02:15:26 UTC

flag

Report

link

Permalink

https://tatoeba.org/zh-tw/sentences/show/2399340

What happened to this sentence? It says "Licensing issue" and cannot be translated.

kylewii 1 days ago 29 აპრილი, 2026, 13:03:44 UTC

flag

Report

link

Permalink

im wondering how do i put furigana on an already existing japanese sentence. there are some sentences without furigana and i can't find a way to put furigana.

hide replies show replies

LeviHighway 2 hours ago 1 მაისი, 2026, 01:24:25 UTC

flag

Report

link

Permalink

You have to become an Advanced Contributor to put furigana on Japanese sentences.

How to Become an Advanced Contributor: https://en.wiki.tatoeba.org/art...d-contributors

LeviHighway 3 days ago 27 აპრილი, 2026, 09:29:27 UTC

flag

Report

link

Permalink

Personal opinion: Please avoid overusing words like “this,” “that,” “he,” and “she” when writing sentences. Many languages do not have natural, all-purpose equivalents for these terms. I’m not saying they can’t be used, but rather encouraging more restrained use.

In addition, sentences such as “He eats this,” “She saw that,” or “He told she that he likes her” carry almost no informational value on their own. Nearly any verb can be combined with these generic words to generate endless sentences, but their practical usefulness is extremely low.

hide replies show replies

frpzzd 3 days ago 27 აპრილი, 2026, 09:55:07 UTC

flag

Report

link

Permalink

I don't see this as a problem at all.

Many languages DO have (in my opinion) words that are fairly good matches for English words such as "this", "that", "he/she", etc. If language X has no way natural of translating these terms in a way that roughly mirrors their dependence on an implied context, then perhaps such sentences should simply remain untranslated into language X. Or, as is also common practice here, such sentences could be linked to multiple possible translations in language X, in the case that language X requires greater specificity on a certain contextual point than English. (For instance, many English sentences using the 2nd person are translated into Russian using both ты and вы.)

I don't think the goal of Tatoeba should be to make sure that all sentences are translated into all possible languages, and in fact it is near impossible (or requires enormous creativity) to translate some sentences into other languages without destroying crucial bits of humor, wordplay, irony or ambiguity.

I also disagree strongly that your examples lack informational value. The informational value is just dependent on the context, making them ambiguous and non-self-contained, which is the case for much natural human speech. Is "He told *her that he likes her" really that much less interesting of a sentence to you than "Tom told Mary that he likes her"?

hide replies show replies

LeviHighway 3 days ago 27 აპრილი, 2026, 10:31:36 UTC

flag

Report

link

Permalink

My view is that sentences like “The cat drinks milk” or “The kid ate a hamburger” are far more useful than sentences that rely heavily on pronouns like “he/she” and vague references like “this/that.” As I mentioned, almost any verb can be combined with these generic words to produce endless sentences. As a result, examples such as “He eats that,” “This can be combined with that,” “This can be considered that way,” or “He refutes that” do little to enrich the diversity of the corpus.
I understand your point that not every sentence needs to be translated. However, I am concerned about having a large number of sentences that cannot be translated into my language. For instance, when a learner searches for a verb, if many of the results are sentences “generated” using “this/that,” they may either find no translation in their language or encounter translations that sound unnatural. Moreover, such examples do not help learners understand what kinds of words the verb is typically used with.
Here is an example I particularly dislike [#6760017]: “I wasn’t surprised when Tom told me I didn’t need to do that.” I have no idea in what context this could be used. It would be much clearer and more helpful to make the sentence more specific, for example: “I wasn’t surprised when my manager told me I didn’t need to rewrite the entire document.”

hide replies show replies

LeviHighway 3 days ago 27 აპრილი, 2026, 10:40:44 UTC

flag

Report

link

Permalink

Additionally, many verbs have multiple distinct translations in other languages. For example, verbs like “wear” and “play” correspond to several different verbs in Chinese and Korean, each used with different types of objects. If “this/that” is used in translation, it is not only completely unnatural, but also makes no sense, as learners cannot tell which verb should be used with which kind of object.

hide replies show replies

ssvb 3 days ago 27 აპრილი, 2026, 13:54:25 UTC

flag

Report

link

Permalink

Yes, there are many sentences with ambiguous translation on Tatoeba due to verbs, nouns and other words. For example, the https://tatoeba.org/en/sentences/show/5746251 ("Never approach a cougar.") sentence may have different translations depending on whether the "cougar" is a human or an animal in this context. There are also machine translated sentences contributed by non-native speakers, etc. Not every sentence is equal. But we can just ignore the undesired sentences and translate the others.

I think that having an ignore list for sentences would be a very useful feature. Just to be sure that we never get the same ambiguous sentence as a possible candidate for translation again.

Guybrush88 3 days ago 27 აპრილი, 2026, 15:39:41 UTC

flag

Report

link

Permalink

> Here is an example I particularly dislike [#6760017]: “I wasn’t surprised when Tom told me I didn’t need to do that.” I have no idea in what context this could be used.

I have a different view on this about Italian, since "I wasn't surprised" can be both masculine and feminine in Italian, so both versions can be helpful when searching for such language pairs (I guess for other languages as well, such as French)

Ooneykcall 3 days ago, edited 3 days ago 27 აპრილი, 2026, 14:48:22 UTC, edited 27 აპრილი, 2026, 14:48:34 UTC

flag

Report

link

Permalink

Looks like the vision you have for Tatoeba is not what other people mostly see it as. Ambiguity isn't a problem since we understand that all valid (i.e. occurring in natural speech/writing) sentences are acceptable, and, naturally, many of them are context-dependent. Two sentences being linked as translations does not at all imply that the translations must be unique; in many cases, multiple translations are possible, and Tatoeba allows us to link them all.

Like Franklin said, if language B requires greater specificity than language A in some situations so a sentence using context-dependent words like pronouns cannot be translated 'literally', you may like to add multiple possible translations, each meant for different contexts. I imagine this would be a good way to show learners that this kind of sentences cannot be translated to language B as they are, but must be specified in some way, depending on what exactly the pronoun (or some other context-dependent word) refers to.

hide replies show replies

LeviHighway 3 days ago 27 აპრილი, 2026, 14:58:57 UTC

flag

Report

link

Permalink

What I’m discussing is really not an issue of ambiguity. I’m just somewhat tired of the practice of generating large numbers of sentences by relying on generic pronouns. What I dislike is when users repeatedly combine “this/that” and “he/she” with only one (or maybe zero) word that actually carries meaning.

For example, if there are currently no sentences containing “eat” on Tatoeba, I wouldn’t want anyone to just keep adding sentences like “He eats this.”, “He eats that.”, “She eats this.”, “She eats that.”, “We eat these.”, “We eat those.” and so on.

hide replies show replies

Ooneykcall 3 days ago 27 აპრილი, 2026, 15:09:11 UTC

flag

Report

link

Permalink

The issue with certain users generating a ton of similar basic sentences with only minor variations is real, but I suppose mostly in the past, since most such sentences have already been generated. There's no problem with using pronouns in general though, since they occur all the time in natural speech. The sentence you cited as eliciting a particularly strong dislike (#6760017) is actually perfectly fine in my opinion, given that it is decently complicated, having three clauses: [I wasn't surprised] [when Tom told me] [I didn't need to do that]. Generating a hundred clones of that sentence with only some words replaced is not something I approve of, but on its own it's a perfectly valid and useful sentence.

ssvb 3 days ago 27 აპრილი, 2026, 15:18:10 UTC

flag

Report

link

Permalink

> if language B requires greater specificity than language A in some situations so a sentence using context-dependent words like pronouns cannot be translated 'literally', you may like to add multiple possible translations, each meant for different contexts.

This is not very practical in many cases. If the language A is English, then there are already way too many sentences. Providing every possible translation for the simplistic sentences that are lacking context would be a monumental effort. I originally came to Tatoeba from the Clozemaster language learning website in 2023, because I noticed that it was frequently offering somewhat odd Belarusian sentences, that were implying that "Tom" or "he" had the feminine grammatical gender. Since they had been imported from Tatoeba, I tried to challenge this on Tatoeba as rather misleading for language learners, but was accused of being a "homophobe". Yet I still think that the unusual grammatical gender should be preferably only used in the sentences, where the other words clearly confirm the homosexual/transsexual context, rather than something generic like "Tom is a teacher." or "He visited a dentist yesterday."

hide replies show replies

Ooneykcall 3 days ago 27 აპრილი, 2026, 15:36:11 UTC

flag

Report

link

Permalink

No need/obligation to provide every possible translation; providing some is enough. I do create/link multiple (2~4) translations of one sentence fairly regularly. The list of translations offered isn't meant to be necessarily exhausting; all that matters is that they are all valid in some realistic context.

Specifically constructed unusual implied contexts that next to no native speakers would think of upon encountering a standalone sentence like that are supposed to be discouraged, as per the rules and guidelines (the 'Unexpectedly saw the train leave' example, where 'Unexpectedly' functions as a name, except no one has a name like that so it's needlessly confusing). Using 'Tom' is a feminine name would certainly qualify as needlessly confusing (even if the whole situation was clearly spelt out, and certainly if it was 'implied'), but there have been some users who liked to be edgy like that. I reckon all of them are currently gone though, as far as major Tatoeba languages go. Obviously, can't speak for less popular languages that barely have contributors, whose input is thus left unchecked.

Thanuir 2 days ago 28 აპრილი, 2026, 07:21:41 UTC

flag

Report

link

Permalink

En alkaisi kieltämään lauseita, mutta olen samaa mieltä siitä, että lause on yleensä kiinnostavampi jos siinä mainitaan "leivänpaahdin" kuin "se", ja mikä tahansa muu nimi kuin Tom. Tämä johtaa monipuolisempaan tietokantaan.

Kiwi_Sandwich 3 days ago 27 აპრილი, 2026, 21:02:41 UTC

flag

Report

link

Permalink

Is there a field, like @transcription, to match text in translations? Since words can have multiple meanings, this would be useful to filter sentences that only have translations with certain words.

Example search of Spanish sentences with English translations:
@text papa @translation dad
@text papa @translation pope
@text papa @translation potato

hide replies show replies

LeviHighway 3 days ago 27 აპრილი, 2026, 23:10:56 UTC

flag

Report

link

Permalink

https://en.wiki.tatoeba.org/art...ow/text-search

Limit matches to transcriptions or alternative scripts or sentence text
Some languages can be written in different scripts (such as traditional/simplified Chinese, or Latin/Cyrillic Uzbek). Others also have transcriptions (such as Pinyin Chinese or Japanese furigana).

By default, keywords will be searched everywhere: sentence text, alternative script and transcription. This means a sentence might come up in the results just because the transcription is matching.

You can control exactly what is searched by using the @text and @transcription prefixes, respectively targeting the sentence text and what’s under the sentence text.

To search for Japanese sentences containing かな in the furigana.

@transcription かな
To search for Japanese sentences containing 国 in sentence text and くに in the furigana.

@text 国 @transcription くに
To search for Japanese sentences containing 国 in sentence text but NOT くに in the furigana.

@text 国 -@transcription くに
To search for Chinese sentences containing 著 in sentence text and zháo in Pinyin.

@text 著 @transcription zhao3

hide replies show replies

Kiwi_Sandwich 3 days ago 28 აპრილი, 2026, 00:06:43 UTC

flag

Report

link

Permalink

Thank you for your reply. However, that is not what I am asking.

I read the “How to Search for Text” section in Tatoeba’s Wiki, and also the “Field search operator” section of the Manticore documentation, where it gives an example “@title hello @body world”.

The @transcription field does not find matches in translations, it only works in languages where the sentence can be written in different scripts, such as Chinese or Japanese, as far as I can tell.

If I use “@text papa @transcription potato” to try to find Spanish sentences with English translations, there are no results (because Spanish doesn’t have a transcription field), even though there are many sentences with “papa” in the source and “potato” in the translation.

If I just search for “papa”, the results include sentences where that word means “dad,” where it means “pope,” and where it means “potato.” I’m wondering if it’s possible to filter those results based on the specific translation that I’m looking for.

hide replies show replies

LeviHighway 3 days ago 28 აპრილი, 2026, 02:49:33 UTC

flag

Report

link

Permalink

Sorry, I misunderstood what you meant. Indeed, the current advanced search can only filter by things like the language of the translation, whether the link is direct or indirect, whether it is an orphan, whether it is unapproved, whether it is contributed by a native speaker, and whether it has audio. There is no option at all to filter by the specific words contained in the translation.

As you pointed out, papa could be translated as “dad,” “pope,” or even “potato,” so learners may need to filter sentences that include a particular meaning. That’s why I suggested adding a dictionary feature in another post[1], although I realize that would be extremely difficult to implement.

So if we could also filter translations by the words they contain, that would indeed be a very good way to address this kind of issue.

[1] https://tatoeba.org/zh-tw/wall/...#message_41826

ssvb 3 days ago 27 აპრილი, 2026, 13:18:05 UTC

flag

Report

link

Permalink

Tatoeba does not seem to support CC0 to CC0 translation pairs. For example, I'm not allowed to change the license of https://tatoeba.org/en/sentences/show/13876837 to CC0 even though the original sentence https://tatoeba.org/en/sentences/show/9737299 is CC0 licensed.

Additionally, it would be great to have the citations of the classic English writers (that are already public domain worldwide) licensed as CC0 on Tatoeba. Such as the sentences with the "by Arthur Conan Doyle" tag.

gillux 4 days ago 26 აპრილი, 2026, 04:48:07 UTC

flag

Report

link

Permalink

If some of you would like to financially support my work on Tatoeba, I have created a Liberapay account for that: https://liberapay.com/gillux

Of course, you can also donate to the Tatoeba association https://tatoeba.org/donate and the money will support the whole project instead, including infrastructure and legal costs.

CK 6 days ago 24 აპრილი, 2026, 08:31:01 UTC

flag

Report

link

Permalink

** Early History of the Tatoeba Project **

http://a4esl.org/temporary/tatoeba/history/

This may be interesting to both old and new members.

Up to 2012. It's an old webpage that I made years ago.

hide replies show replies

LeviHighway 6 days ago 24 აპრილი, 2026, 13:48:28 UTC

flag

Report

link

Permalink

The picture of the early version of Tatoebe has something I mentioned in another post on the Wall.
http://a4esl.org/temporary/tato...ly-version.jpg

In the picture, the English word "power" is highlighted, and the Japanese word for "power", which is "力" are also highlighted. I assume other words that are highlighted are also words for "power".

This is exactly the function I mentioned on this post:
https://tatoeba.org/zh-tw/wall/...#message_41826

Can you tell me more about what it was and how it worked?

hide replies show replies

gillux 4 days ago 26 აპრილი, 2026, 04:27:57 UTC

flag

Report

link

Permalink

I wasn’t there to talk about it, but from what I heard, this screenshot comes from an early version of Tatoeba, and the code has been completely rewritten from scratch since. That’s why our current code repository is named "tatoeba2" [1], that screenshot being "tatoeba 1", or simply "tatoeba" at that time.

In that early version, sentences are not organized as a graph but as a table: only one sentence per language in each sentence group. I guess this model, while having constraints, helped having the feature you want.

[1] https://github.com/Tatoeba/tatoeba2

6 days ago 25 აპრილი, 2026, 03:19:03 UTC

link

Permalink

warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

LeviHighway 13 days ago 17 აპრილი, 2026, 07:47:48 UTC

flag

Report

link

Permalink

I have always got this idea for Tatoeba but I understand how hard it is to achieve or it might doesn't fit the purpose of Tatoeba. But here I want to share it:
I wish that there is a "dictionary" function for Tatoeba. It doesn't mean you have to add definitions or etymology to words, but it should be a translation dictionary.
Here's how I think it should work:
When you search for "Chinese" via this dictionary function, you might get a few sentences like this:
I am Chinese. - 我是中國人。
I speak Chinese. - 我說中文。
I wish the contributors can tag how the word "Chinese" is translated into the target language. for example, you can mark:
I am [Chinese]. - 我是[中國人]。
I speak [Chinese]. - 我說[中文]。
This way, we can get a statistic how the word "Chinese" is translated into the target language. For example, when you look up "Chinese", it can tell you that "Chinese" is most usually translated into "中國人", and then followed by "中文".
I think Tatoeba is the best place to do this. Wiktionary is too complicated, and it can never provide as many sentences as Tatoeba.

hide replies show replies

zogwarg 7 days ago, edited 7 days ago 23 აპრილი, 2026, 08:04:08 UTC, edited 23 აპრილი, 2026, 08:04:54 UTC

flag

Report

link

Permalink

I think there are many limits to make this work with a "Tag" system:
- It would require a lot of extra work from contributors.
- For most language pairs, there is very poor word for word matching anyway. eg: [#3378275] > [#4919050]

I think pragmatically using advanced search, if can understand both the source and target languages you can already get a good idea of how a word gets translated on tatoeba (tatoeba is not necessarily representative):

https://tatoeba.org/en/sentence...rd_count_min=1

Interestingly for "Chinese" the breakdown seems to more be like:
148 "中文"
63 "汉语"
58 "中国"
42 "中国人"
18 "漢語"
16 "汉字"
16 "中國人"
16 "中國"
14 "漢字"
11 "中"
10 "中餐"
7 "普通话"
4 "国"
2 "华人"
2 "中式"
1 "華"
1 "繁体字"
1 "简体字"
1 "汉"
1 "本地"
1 "國字"
1 "华"
1 "中菜"
1 "中华"
19 "other (mostly implicitly Chinese things)"

hide replies show replies

LeviHighway 7 days ago 23 აპრილი, 2026, 15:01:50 UTC

flag

Report

link

Permalink

I think for such sentences that has poor word to word matching, we should simply add a button to mark them as "implied" or something.

For the example of [#3378275] > [#4919050], we can not match "Chinese" to anything, so we can mark it as "implied". However, we can still create matching like [Chinese characters] > [字] and [characters] > [字]

One of the reasons why I give this idea is that, some people may not be able to understand both languages well. So a matching of how words are translated could help them understand the sentence structures easier.

ssvb 7 days ago 23 აპრილი, 2026, 12:19:42 UTC

flag

Report

link

Permalink

> I think Tatoeba is the best place to do this. Wiktionary is too complicated,

If Tatoeba implements your suggestions, then it will become more complicated. Possibly even as much complicated as Wiktionary.

BTW, you are not required to provide etymology when contributing to Wiktionary. A lot of other information is also optional. You can try to figure out what is the minimal barebone entry for a Chinese word and maybe discuss this with the other contributors if something is not clear. Start from here: https://en.wiktionary.org/wiki/...try_guidelines

I guess, one of the challenges is that Wiktionary treats Chinese as a big family of many similar languages or subdialects (Mandarin, Cantonese, Wu and the others), each having its own distinctive features. And this may cause friction, because some people may be in favour of eradicating the differences for the sake of simplification and unification, while the other people may be interested in preserving their local dialect as their precious cultural heritage.

> and it can never provide as many sentences as Tatoeba.

Wiktionary surely has stricter quality requirements for its content, so it indeed can't match Tatoeba's quantity of sentences. That said, the Tatoeba's content is also categorized via tags and it's possible to filter out dubious sentences (contributed by non-native speakers, etc.).

hide replies show replies

LeviHighway 7 days ago, edited 7 days ago 23 აპრილი, 2026, 14:48:16 UTC, edited 23 აპრილი, 2026, 14:50:59 UTC

flag

Report

link

Permalink

I don’t frequently contribute to the English Wiktionary, which seems more streamlined due to its large community of maintainers. However, as a contributor to the Chinese Wiktionary, I find the infrastructure much more challenging. Many templates and modules are incomplete or overly complex; editing a single entry often takes hours and is prone to errors.

Furthermore, Chinese grammar is often a subject of intense debate, frequently leading to edit wars. I’ve also noticed inaccuracies in Korean and Chinese entries, particularly when they are translated from English by contributors who may not be proficient in the target language.

My point is that we could effectively build a translation dictionary within Tatoeba. For instance, when looking up a word, seeing both the keyword and its translation in bold—similar to how some dictionaries highlight equivalents—would greatly benefit learners in understanding sentence structures.

Regarding the preservation of Chinese languages, Tatoeba handles this exceptionally well. By treating Mandarin, Cantonese, Wu, and others as distinct entities, it helps maintain the linguistic integrity and "purity" of each language.

7 days ago 23 აპრილი, 2026, 11:27:55 UTC

link

Permalink

warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

Wall (7 373 threads)

Tips

LeviHighway

LeviHighway

kylewii

Thanuir

LeviHighway

Kiwi_Sandwich

LeviHighway

Kiwi_Sandwich

Guybrush88

Ooneykcall

Need some help?

Developers

About