Wall (5415 threads)

8 days ago
I've found some untranslated strings in the French UI and also a "missed capitalization"

8 days ago
Also: "Norwegian Nynorsk"
8 days ago
You can apply on Transifex to translate UI strings -
5 days ago
I don't have a Transifex account, but someone else could add that one:
Guadeloupean Creole French → Créole guadeloupéen

(Even if I do not understand why all language names all begin with a capital for French because in French there is normally no capital for language names.)
15 days ago
I know that it has been argued previously that it is best to try to use a small, more or less universal, set of personal names and placenames for the Tatoeba project, whilst others have preferred to use locally known or historically significant names from their own culture or country. While I don't wish to privilege some specific names above others, and a person's attachment to locally significant names is certainly understandable, perhaps even laudable, there are very practical advantages to consistently using a small set of names.

As an example, the names Sofia, Adamo and Lidia and the town of Bjalistoko are culturally relevant to Esperantists because the names are those of Esperanto founder L.L. Zamenhof's children, sadly murdered in Hitler's extermination camps, and the Polish town of Białystok is where Zamenhof lived for many years.

But Esperantists using the Tatoeba site, attempting to hew to some universality, will generally use the names Tomo or Toĉjo, Maria or Manjo, Johano or Joĉjo, which are regularly formed variations of the English names Tom, Mary and John, and Bostono, the Esperanto proper noun corresponding to Boston.

Similarly, there are traditional Arabic or Berber forms of names that are closely related to the names Tom, Mary and John, and the name Boston surely is findable on an Arabic world map. In my opinion there should be no problem with using these consistently in order to prevent the near-duplications that arise when we pick a different set of names like Sami, Layla, or Mennad, etc. Here are my suggestions; I wonder what others think:

Tom توماس
Mary مريم
John يوحنا
Boston بوسطن
14 days ago - 14 days ago
> there are very practical advantages to consistently
> using a small set of names

There are also practical advantages to *not* using them:

1. Names are words that can be translated. E.g. Russian names are translated in Belarusian (Елизавета/Yelizavieta becomes Лізавета/Lizavieta), while English names aren't (Elizabeth becomes Элізабэт/Elizabet; unless she's a queen, then she's Лізавета/Lizavieta). You're losing a great deal of information by forcing all names to be 'Tom' and 'Mary'.

2. Names are words that follow some language rules. Names are declined differently (e.g. Russian declines 'Darwin' and 'Pushkin' differently), spelt differently (e.g. Lithuanian surnames like Чюрлёнис 'Čiurlionis' are the only case when ю, я can follow ч in Russian), capitalised differnently, get different suffixes and prefixes, etc. Again, all this is often lost during the unification.

Unifying names will make Tatoeba less useful. E.g. imagine creating a language detector. If you replace Shymkent with Boston, Tatoeba will have no examples of Шы in Russian, so a detector based on Tatoeba data will likely to misdetect sentences about Shymkent as non-Russian.

If we unify names, then why not unify other words too? Let's unify all actions to 'eat' and all objects to 'apple'. We'd have a lot of sentences like 'I don't mind eating an apple', 'I've never eaten an apple' and 'I'd like to eat an apple', etc. Surely this would reduce a lot of duplication!

This is reductio ad absurdum, but I believe it's not really different from the name unification. *Of course* we will get less duplication if we discard a lot of information! But that's hardly practical.
13 days ago
13 days ago
13 days ago
13 days ago
If we add together Shishir, Ricardo14, and soweli_Elepanto's votes, and throw in one for Seael and one for me, we get 1004 votes against restricting vocabulary to "Tom" and "Mary" (and "apple"). I have always thought that the people who favor the restrictions were in quite a small minority (fewer than five, let's say), but an influential one, because at least one of those people has, over the years,
- contributed a huge number of sentences in the best represented and most translated language (English),
- selected sentences with that restricted subset of names when compiling lists for other people to translate, and
- complained regularly when people wrote sentences that included names not in the subset

Just for the record, these restrictions were never a Tatoeba policy, and there's nothing to stop people from writing sentences with other names. (Of course, there's nothing to stop other people from writing or selecting only sentences that feature "Tom", "Mary", "Boston", and "apple".)
8 days ago
And +1 for me. Concerning East Slavic languages, “Mary” is an awful choice. It has defective (read incomplete) declination pattern. The inability to use local names compromises the entire Slavic scheme of syntactic encoding with cases. E.g. “Я описал Мэри” ‘I describe·PAST Mary·ANYCASE’ ‘I described Mary to someone’ or ‘I described someone to Mary’. À bannir.
13 days ago
Minäkin olen samaa mieltä.
13 days ago - 13 days ago
My feeling is that if a moderately-competent language learner knows how to make the appropriate changes, then we don't really need a lot of near-duplicate sentences with the only difference being the person's name.

Being a collection of sentences, we're not trying to take the place of textbooks.

Here are a few of the advantages of using a set of wildcards as described on

* We get more translations grouped together. That is, instead of "My name is George" being translated into German, "My name is Fred" translated into French, "My name is Dan" translated into Spanish, etc., we can get many languages grouped together with "My name is Tom." This also means that we can see indirectly-translated sentences that have the potential of being directly linked.

* If a Russian contributor contributes a sentence with the equivalent of "Tom", there is a chance that the same basic sentence already exists in another language and these will eventually be linked as people notice indirect translations.

* Using the wildcards also means that the same contributor doesn't accidentally submit basically the same sentence that he/she did earlier with a different name. doesn't let the exact same sentence to be submitted again.

* For the same reason, if you add a translation to an existing sentence with the "Tom" equivalent already in the database, the sentence you are translating gets linked directly to that sentence, also showing any indirectly-linked translations that may already exist.

13 days ago - 12 days ago
> My feeling is that if a moderately-competent
> language learner knows how to make the
> appropriate changes

My feeling is that languages are often the *sole* examples of some linguistic phenomena.

For example, in English, the digraph ff at the beginning of the words might be capitalised ff (not Ff):

I doubt a moderately-competent language learner knows this. Heck, I doubt many native speakers know this! This is very useful as a sample data for all kinds of automatic capitalisation algorithms. But too bad for their authors, because Castle ffrench will be replaced with Boston, and Rose ffrench will be replaced with Mary.

Another example. What's the possessive case of Jesus, Jesus' or Jesus's? I assume I'd use Jesus' about a biblical Jesus, but what about people named Jesus from Spain? I consider myself a relatively competent English speaker, but I have no idea! Having examples with the possessive cases for names ending in -s would certainly help (especially if they had audio, because 's might have a vowel inserted sometimes).

More examples from other languages:

In Russian, there are two ways to say someone's: genitive case (комната Саши room Sasha's.GEN) and adjective (Сашина комната Sasha's.ADJ.FEM.SG room). However, adjectives are only formed from common local names, and sound strange with Tom and Mary. We have only 1 example of an adjective from Tom in the whole corpus, #922679. This makes it looks like adjectives are no longer used, so corpus with Tom misrepresents Russian grammar.

In Belarusian, some people don't know how to decline many uncommon names. For example, in Belarusian, the genitive case of Зміцер/Źmicier is Змітра/Źmitra, but many people would say Зміцера/Źmiciera. Having diverse names would help to understand the conjugation of the words.


This is even worse with languages and cities. Often, some languages have special words for some cities and languages, while others don't. For example,

- Russian has an illogical way of forming words like 'people of <city>' (like Boston > Bostonians), e.g.: Boston - bostonTSY, Moskva - moskvICHI, Minsk - minCHANIE, Varshava - varshavIANIE, Kiev - KievLIANIE, Odessa - odessITY, Arhkangelsk - arkhangelOGORODTSY, etc. If you only keep Boston, you'll miss most of these forms.

- Portuguese treats 'wine of Boston' (vinho de Boston) and 'wine of Porto' (vinho do Porto) as the same construction, while other languages might have a separate word for Porto wine (English Port wine, Russian портвейн), but don't have a word for Boston wine.

- Russian words for language can be placed either before the word (французский язык 'French language') or after (язык хинди 'Hindi language'). Usually you can guess it by the word form, but not always (коми язык 'Komi language' looks like Hindi, but is placed like French)

- Many languages have special words for '<language>-speaking', e.g. 'Francophone', 'Lusophone', but don't have words for other languages (Hindiphone??? Komiphone???)


All those things are not something we can expect from a 'moderately-competent language learner'.
13 days ago
Long story short (although the long story is important), some people wants to bend the source to fit their tools, instead of adapting their tools. That's not new.

Placeholders inside the source is not good for corpora, whereas placeholders inside tools are so awesome and beautiful. (because seeing flexions and stuff changing live is cool)

In many cases, redundancy does not equal waste. Around a few percents, that is quite negligible in my opinion.
9 days ago
Pardon my beginner's question. I see sentences that could be linked. However, by rules, basic users are requested to ask an advanced user:

Thus my question: what is the best way to "ask an advanced user"? Does one post the request here on the wall? Does one send a message to an advanced user? If so, whom would be the best person to?

9 days ago - 8 days ago
I updated the wiki page. It now says:

If you are an advanced contributor, you can do [the linking] yourself. If you are not, please leave a comment on the source or translation to ask an advanced contributor to do it for you. If you find that your requests are not being addressed, you can add the name of an advanced contributor, preceded by the @ symbol, to your comments. You can find advanced contributors on the "Community -> Languages of members" page. It is best to choose one who has been active recently.
8 days ago
Hey thanks! That helps a lot.
8 days ago
Glad to hear it! If you find anything else unclear, please let us know.
8 days ago
You can also add the sentence as a direct translation. The two copies of the sentence should be automatically merged.

Though in case of French, they are only merged if the spaces or lack thereof before any exclamation or question marks also match.
17 days ago
Here is a YouTube video that I just uploaded.

1,570 "Why" Sentences from the Tatoeba Corpus (2hrs 22mins) - Listen and Repeat

If you want to see existing translations and/or want to translate these sentences, you can use the following links.

These links will show the sentences in the same order that they are on the video.

9 days ago - 9 days ago
Last week, Guybrush88 focused on translating a lot of the sentences on this video into Italian, so a lot of the sentences on this video now have Italian subtitles.

Perhaps you would like to go there and choose Italian subtitles.

If anyone else would like to do the same for your language, just write to me after you've done a lot of them and let me know. You don't need to translate all of them. It's fairly easy for me to do after each Saturday's exports at
10 days ago - 10 days ago

Learn the usernames of the native speakers of the language that you are studying.

Get links for studying and translating their sentences.

This page will also show you the usernames of members with your native language.

This has been updated with sentence counts from the 2019-07-06 exported data.
12 days ago
Sorry for this naive question but I'd like to link Portuguese sentences into Spanish, English, French, German and Greek (if there's any translation available).

For now I have two options only

1 - search a specific query like "Eu quero" -

2 - On settings, choose only por, deu, fra, eng, spa, ell

So I have either to choose a specific query or limit the languages I'd like to link sentences always I'm going to do it. Isn't there an easier way? Isn't there a way to allow Advanced Search looks for sentences in more than one language?
12 days ago
I don't really understand your problem, Ricardo14. I have set my settings to "eng, epo, fra, dan, nob, deu, por, ita". This means that I see only English, Esperanto, French, Danish, Norwegian Bokmål, German, Portuguese and Italian sentences and translations. I do not see Russian or Turkish or Chinese or olther languages with which I am not familiar. If I wanted not to see Portuguese or Italian sentences, I would just delete the last two options from my list. So your option 2 is probably optimal for you.

I see that the great majority of sentences you have posted are in Portuguese, which is presumably your strongest language.

As you point out, you could, for example, choose to see only Greek sentences without a Portuguese translation. If you well understand the meaning of those untranslated sentences, you could furnish Tatoeba with good Portuguese-equivalent sentences, which would be a nice service.
11 days ago
but if I want to "fast-link" (only link), I'd have to make a query AND set up Tatoeba to the languages I want. Maybe there is an easier way but I don't know how...
11 days ago
It's only a 4-step process and is relatively fast.

Step 1.

Go to

Step 2.

Paste this into the "languages" field

por, spa, eng, fra, deu, ell

(Keep a copy of this somewhere, so you can just copy and paste it in the future.)

Step 3.

Make your searches.

Step 4.

After making all the searches you want, you can go back to and revert your language settings to have nothing.

11 days ago - 11 days ago
In this thread:

@Impersonator did a great job of describing the issues that are not covered when our proper nouns are limited to a small set of English names (John, Mary, Boston, and so on). He made me aware how much practice I'm missing in terms of declining names in Russian. As a result, I've added sentences containing Russian person and place names to a list of sentences that I'm hoping will be translated into Russian (and related languages):

In the same thread, @CK offered the argument that restricting names to a small set reduces the occurrence of near-duplicate sentences and fosters a more highly linked collection of sentences. I think there is a better way to avoid the accidental submission of near-duplicate sentences, namely to contribute distinctive sentences. That's what I've tried to do. But if the sentences, or parts of them, resemble other sentences that have been translated, I don't consider that a catastrophe. The mere fact that they contain these names mean they are covering issues that we cannot address with "Tom and Mary" sentences.
15 days ago
#8005447 >> Esperanto is a key to peace. (peace = béke)
How should I understand it?
If everyone speaks Esperanto,
the Kingdom of God will come?

Rövid válasz:

Az eszperantó a 125 éves rövid időszak után a világon beszélt
6800 nyelv közül a leghasználtabb 100 nyelv között van.

Esperanto, origine la Lingvo Internacia, estas la plej disvastigita internacia planlingvo. En 1887 Esperanton parolis nur manpleno da homoj;
Esperanto havis unu el la plej malgrandaj lingvo-komunumoj de la mondo.
Ĝi funkciis dekomence kiel lingvo de alternativa komunikado kaj de arta kreivo.
En 2012, la lingvo fariĝis la 64-a tradukebla per Google Translate.
En 2016, la lingvo fariĝis tradukebla per Yandex Translate[;
laŭ 2016, Esperanto aperis en listoj de lingvoj plej lernataj kaj konataj
en Hungarujo.<<<
La nomo de la lingvo venas de la kaŝnomo “D-ro Esperanto„ sub kiu la juda kuracisto Ludoviko Lazaro Zamenhofo en la jaro 1887 publikigis la bazon de la lingvo. La unua versio, la rusa, ricevis la cenzuran permeson disvastiĝi
en la 26-a de julio; ĉi tiun daton oni konsideras la naskiĝtago de Esperanto.
Li celis kaj sukcesis krei facile lerneblan neŭtralan lingvon, taŭgan por uzo en la internacia komunikado; la celo tamen ne estas anstataŭigi aliajn, naciajn lingvojn.

Pri tiu ĉi sono Prelego de Zamenhof en la 1-a monda kongreso de Esperanto en Bulonjo (1905)

Kvankam neniu ŝtato akceptis Esperanton kiel oficialan lingvon, Esperanto tamen eniris en la oficialan instruadon en pluraj landoj kiel Hungario kaj Ĉinio. Esperanto troviĝas en la listo de la 30 lingvoj, kies instruado akiris ie en la mondo la prestiĝan Eaquals-kvaliton.

La lingvo estas uzata de internacia komunumo, nombranta laŭ diversaj taksoj cent mil ĝis du milionojn da parolantoj (depende de la lingvonivelo); estas laŭ diversaj taksoj inter proksimume unu kaj pluraj miloj da denaskaj parolantoj[14].

Esperanto akiris kelkajn internaciajn distingojn, ekzemple du rezoluciojn de UNESCO aŭ subtenon de konataj personoj de la publika vivo. Nuntempe ĝi estas uzata por vojaĝado, korespondado, interkompreniĝo dum internaciaj renkontiĝoj kaj kulturaj interŝanĝoj, kongresoj, sciencaj diskutoj, origina kaj tradukita literaturo, muziko, teatro, kino, presita kaj interreta raportadoj, radia kaj televida elsendadoj.

La vortprovizo de Esperanto devenas plejparte el la okcidenteŭropaj lingvoj, dum ĝia sintakso kaj morfologio montras ankaŭ slavlingvan influon. La morfemoj ne ŝanĝiĝas kaj oni povas ilin preskaŭ senlime kombini, kreante diverssignifajn vortojn, Esperanto do havas multajn kunaĵojn kun la analizaj lingvoj, al kiuj apartenas ekzemple la ĉina; kontraŭe la interna strukturo de Esperanto certagrade respegulas la aglutinajn lingvojn, kiel la japanan, svahilan aŭ turkan.

Zamenhof célja az volt, hogy egy könnyen megtanulható
és politikailag semleges nyelvet hozzon létre, amely
„meghaladja” a nemzetkultúrák határait,
és előmozdítja a békét és a nemzetközi megértést az emberek között.

Zamenhof's goal was to create an easy and flexible language
that would serve as a universal second language to foster peace
and international understanding, and to build a community of speakers,
as he correctly inferred that one could not have a language without
a community of speakers.
12 days ago
15 days ago
Not sure if this would be easy to implement here but I'll mention it just in case.

Would it be feasible to implement here a way to show the content of a sentence when placing / hovering the cursor over a sentence number? In other sites, something similar can be achieved when hovering over certain images or links: a little box containing some text pops up.

Such a feature would be very useful to check the sentence without having to click on it to see its content.
15 days ago
Do you mean that you want to be able to mouseover a sentence in search results to see all the translations? Or, do you mean translations, comments, ratings and tags?

The former would require that same amount of database interaction as using the advanced search, with the setting "Show translations in: All languages," so perhaps doing that would be better since it's something already available.

The latter would require even more time connecting to the database, so the page would load in even slower.
15 days ago
If open a sentence, there are Logs block in the right part of the page. So Seael probably wants this: mouseover on hash-sentence will call an ajax query, and after server responce will create a fixed div block with sentence text.
14 days ago
I think fjay69's got it but, as I am not 100% sure about it, I'll just use the sentence on top of the previous discussion as an example of what I mean.

I'd like to hover the cursor over #8005447 and then get a little box popping up near it and containing the following text in: "Esperanto is a key to peace". That's all.
14 days ago - 14 days ago
:o And actually it is happening!!! 👏👏👏

Mmm... But it only seems to be working on this wall... 🤔

And in the comments, too!!! 👏👏👏 It was there were I mainly wanted them! Now I'm not truly sure if that feature already existed or it has just been implemented but it's great!