raeldor raeldor May 5, 2021 May 5, 2021 at 7:32:31 AM UTC link Permalink

Hi. Who is maintaining the Japanese indexes please? There are examples where the words appear to be tagged incorrectly. For example in 186382, it's tagged...

我々 が どの位{どのくらい} 原子力 に 頼る{頼っている} か 落ちつく{落ち着いて} 考える{考えて} 見る[05]{みよう}

Where 落ち着いて with two kanji is tagged to 落ちつく with one kanji instead of 落ち着く with two kanji.

bunbuku bunbuku May 5, 2021 May 5, 2021 at 8:06:09 AM UTC link Permalink

おちつく could be written as 落ち着く or 落ち付く, and some people would write 落ちつく for sure.

Google search results;
落ち着いて is 147, 落ち付いて is 183, 落ちついて is 153.

All of them are correct, it just differ among people.

raeldor raeldor May 5, 2021 May 5, 2021 at 11:57:57 AM UTC link Permalink

I understand what you are saying, but my point was that the sentence uses the 落ち着く variation (as 落ち着いて) and should therefore be tagged with THAT version, NOT the 落ちつく version that it's currently tagged with.

Pfirsichbaeumchen Pfirsichbaeumchen May 5, 2021, edited May 5, 2021 May 5, 2021 at 1:57:01 PM UTC, edited May 5, 2021 at 1:57:35 PM UTC link Permalink

What do you mean by "tagged"? Are you talking about furigana? Are you coming from another project that is using our sentences? In any case, 落ち着く and 落ちつく should not be treated separately, as bunbuku has explained.

raeldor raeldor May 5, 2021 May 5, 2021 at 10:08:31 PM UTC link Permalink

The indices link words in the sentence to dictionary entries. You are correct that there are three ways to write this verb. But I would expect the index to link to the CORRECT one of those three. Currently it isn't, as you can see from the definition.

bunbuku bunbuku May 5, 2021 May 5, 2021 at 11:25:11 PM UTC link Permalink

I think that kind of problems should be claimed to the administration on the dictionary side.

raeldor raeldor May 5, 2021 May 5, 2021 at 11:42:26 PM UTC link Permalink

Hence my original question as to who is maintaining the indexes. Are they generated, or are they manually edited? Who is responsible for them and who do we report problems to? Is it Jim Breen?

small_snow small_snow May 6, 2021, edited May 6, 2021 May 6, 2021 at 12:37:20 AM UTC, edited May 6, 2021 at 1:26:15 AM UTC link Permalink

I hope the following comments will solve your problem. Thank you.

raeldor raeldor May 6, 2021 May 6, 2021 at 4:19:48 AM UTC link Permalink

Ah, thank you. I guess I've been here before, haha.

jungnet jungnet May 3, 2021 May 3, 2021 at 1:39:09 PM UTC link Permalink

I want to add many sentences in my Conlang too.

Ricardo14 Ricardo14 May 3, 2021 May 3, 2021 at 3:24:53 PM UTC link Permalink

Hi there! Does that language have an ISO 639-3 code?

To request the addition of languages, please read the instructions here -

Keep in mind that we need

✔ As many sentences as possible in the target language;
✔ A suitable flag;
✔ The ISO 639-3 code;
✔ A list which contains sentences in this language (you can create one here -
✔ Any information about this language (e.g.: );

So, send this information and any question you have to or PM the Language Team -

Important: It may take weeks to have your language added on Tatoeba.

Cabo Cabo April 30, 2021, edited April 30, 2021 April 30, 2021 at 8:41:22 AM UTC, edited April 30, 2021 at 8:45:33 AM UTC link Permalink

I've never seen this message before on a Hungarian sentence: "We cannot determine yet whether this sentence was initially derived from translation or not."
The sentence where I found it: #6131864
Why the data is missing?

CK CK April 30, 2021 April 30, 2021 at 9:40:13 AM UTC link Permalink

Other sentences with nearby numbers have the same problem.

gillux gillux May 3, 2021, edited May 3, 2021 May 3, 2021 at 9:52:49 AM UTC, edited May 3, 2021 at 9:54:16 AM UTC link Permalink

This message appear for a few sentences, especially the ones created in the early days of Tatoeba. This sentence origin ("added as translation" or "original") has been added to Tatoeba on August 2018 [1]. Sentences added after this date won’t have this problem, but for sentences added past this date, we calculated the origin by analyzing the logs. However, in some cases, part of the logs are missing and we don’t have enough information to determine the origin. If somebody knows for sure the origin of one of these sentence, we can also add the information manually.


Jondel Jondel April 29, 2021 April 29, 2021 at 3:22:43 AM UTC link Permalink

There is a formal or bookish and informal or conversational mood in some languages. Wouldn't it be helpful if we indicate that?

Thanuir Thanuir April 29, 2021 April 29, 2021 at 7:03:39 AM UTC link Permalink

Tähän käytetään tunnisteita, esimerkiksi:

Myös kielikohtaiset tunnisteet ovat hyvä ajatus, jos tietty kieli käsittelee näita seikkoja omalla tavallaan.

sacredceltic sacredceltic April 30, 2021 April 30, 2021 at 7:45:03 PM UTC link Permalink

I sometimes choose to converse in a bookish style...

DJ_Saidez DJ_Saidez April 30, 2021 April 30, 2021 at 10:26:17 PM UTC link Permalink

That doesn't mean that everyone does that.

sacredceltic sacredceltic April 30, 2021 April 30, 2021 at 10:37:00 PM UTC link Permalink

I think most people do, at times...

Jondel Jondel May 1, 2021 May 1, 2021 at 6:52:32 AM UTC link Permalink

People here understand bookish style speaking but it is funny, strange and out of place. Only non-natives speak that way, or in formal dramas, plays, school etc

DJ_Saidez DJ_Saidez May 1, 2021 May 1, 2021 at 7:54:20 AM UTC link Permalink

Yes, I sometimes do it ironically or in a humorous way, but it's intentional, and I know it sounds weird and believe it does sound weird.

ZegPhig ZegPhig May 1, 2021 May 1, 2021 at 10:05:42 AM UTC link Permalink

Tio dependas de trajtoj ĉiu lingvo. Ekzemple en la rusa lingvo, la literatura lingvo tre simple povas uzi, se vi volos paroli kun iu kaj tio estas normala. Ĝi estas tiel ĉar nia literatura lingvo ofte koincidas kun parola lingvo. Sed mi faros unun rezervon. La rusa literatura lingvo ankaŭ havas siajn apartaĵojn, kiuj distingiĝas ĝin de la parola lingvo kaj kiuj pli bone uzi konkrete en la literatura lingvo, sed ne en la parola lingvo. Kaj mi povas diri samon pri la parola lingvo. Ĝi havas siajn trajtojn.
Kaj tamen en ĉiuj lingvo ĉio estas diverse kaj ĉiu lingvo havas siajn trajton. Kio estas normala en unu lingvo, en la sama tempo tio povas ne esti normalan por alia.

Tial mi opinias, ke kiam ni parolas pri iu lingvo, ni devas ĉefe apogi sur opinioj de denaskaj parolantoj kaj lingvistoj, kiuj studas tiun lingvon.

DJ_Saidez DJ_Saidez May 1, 2021, edited May 1, 2021 May 1, 2021 at 8:32:36 PM UTC, edited May 1, 2021 at 9:21:24 PM UTC link Permalink

You have a good point. I was mostly referring to English since that’s the language I deal with the most. And in my experience with young native speakers my age, some of them might use (in spoken conversation) more sophisticated language sometimes, but almost never the written language that I believe Jondel is referring to. And written English and spoken English aren’t that far apart either, but they usually still sound misplaced in the other domain.

Kaj ĝuste por la kriterioj, kiujn vi menciis, mi ne pensas, ke la memdeklarita fakulo “timaskelto” estas valida kritikanto de tio, kion mi diris. 😒

I’m not invalidating those that want to learn written language. I’m just saying that they should know that they should be prepared to have a distinct sound to typical speakers, which could be good or bad.

gillux gillux May 3, 2021 May 3, 2021 at 9:29:47 AM UTC link Permalink

As Thanuir said, tags are the proper way to add this kind of information to sentences. To add tags, you must become an "advanced contributor" first:

QAzaqQA QAzaqQA May 2, 2021 May 2, 2021 at 12:32:32 PM UTC link Permalink

Is there a method to add sentences to Tatoeba by Bulk?

Cabo Cabo May 2, 2021 May 2, 2021 at 1:07:29 PM UTC link Permalink

Are you an Albanian native speaker now?
You learn languages soooooo fast. What is your secret?

Thanuir Thanuir May 3, 2021 May 3, 2021 at 4:38:19 AM UTC link Permalink

Ota yhteyttä CK-käyttäjään tai muihin ylläpitäjiin.

gillux gillux May 3, 2021 May 3, 2021 at 9:22:34 AM UTC link Permalink

No, but there are plans to implement this feature at some point. You can follow the progress here:

wolfgangth wolfgangth May 2, 2021 May 2, 2021 at 12:00:32 PM UTC link Permalink

Why can't I find some sentences from the Tatoeba download file if I use the search field of the Tatoeba web mask?

example: Denmark has a prison.
but if I use one of the translations: La Danimarca ha una prigione.
"Denmark has a prison." is shown as translation.

(There are some of these sentences.)

brauchinet brauchinet May 2, 2021, edited May 2, 2021 May 2, 2021 at 1:40:11 PM UTC, edited May 2, 2021 at 1:40:33 PM UTC link Permalink

Ich glaube, die Erklärung ist - wieder einmal - , dass der englische Satz keinen Besitzer hat. "Verwaiste" Sätze werden standardmäßig nicht in die Suche einbezogen.
Man muss in die "erweiterte Suche" gehen und bei "verwaist/ is orphan?" auf "Beliebig / Any" umschalten.

QAzaqQA QAzaqQA May 2, 2021, edited May 2, 2021 May 2, 2021 at 1:34:34 PM UTC, edited May 2, 2021 at 1:35:26 PM UTC link Permalink

The Nuristani languages, formerly known as Kafiri languages, are one of the three groups within the Indo-Iranian language family, alongside the much larger Indo-Aryan and Iranian groups. They have approximately 130,000 speakers primarily in eastern Afghanistan and a few adjacent valleys in Khyber Pakhtunkhwa's Chitral District, Pakistan. The region inhabited by the Nuristanis is located in the southern Hindu Kush mountains, and is drained by the Alingar River in the west, the Pech River in the center, and the Landai Sin and Kunar rivers in the east. The languages were previously often grouped with Indo-Aryan or Iranian until they were finally classified as forming a third branch in Indo-Iranian.

Learn Nuristani Languages to Expert level.

The Sentences are in SOV in Nuristani Languages.

miroplan2000 miroplan2000 May 1, 2021 May 1, 2021 at 8:53:21 AM UTC link Permalink

lbdx lbdx May 1, 2021 May 1, 2021 at 8:15:36 AM UTC link Permalink

** Tatominer **

Thanks to Guybrush88, gillux, shekitten, AlanF_US, cojiluc, Julien_PDC, Cabo, danepo, quagliarella, Ivanovb, tsp_2, H_Liliom, carlosalberto, Shishir, small_snow, Esperantostern, JGGG, Johannes_S, GlossaMatik, Pfirsichbaeumchen, Nylez and alvations for their contributions that helped move the project forward this week.

Check out the most searched words that lack sentences or translations in your language at

QAzaqQA QAzaqQA April 29, 2021 April 29, 2021 at 10:48:35 AM UTC link Permalink

Is Berber considered a Macrolanguage here?

Shishir Shishir April 29, 2021 April 29, 2021 at 11:08:06 AM UTC link Permalink

It's considered a macrolanguage in the sense that it doesn't have an ISO 639-3 code but it's treated as one more language here, so that corpus won't be dismantled as you seem to want (I've seen your lists taking sentences from other users who consciously contributed to the Berber corpus). The languages that were added before the decision of adding only languages with ISO 639-3 code will stay and if people want to contribute to these languages they are free to do so.

I really don't get why you and Igider are so obsessed with the Berber corpus, it's there and it's not disturbing anyone, the users can choose if they want to contribute to it or to their Berber dialect/language. And considering people biased or politically involved for giving the users this choice seems quite unfair.

TRANG TRANG April 29, 2021 April 29, 2021 at 7:02:34 PM UTC link Permalink

You can read the following article to further understand the situation of Berber in Tatoeba:

sacredceltic sacredceltic April 29, 2021 April 29, 2021 at 9:14:34 PM UTC link Permalink

Il faut aussi faire preuve de moins de naïveté vis à vis des langues.
Le berbère est une langue opprimée par la colonisation arabe depuis 13 siècles. Les gouvernements maghrébins : Algérien, Marocain, Tunisien, dirigés par des élites « arabisantes » font tout pour NIER la culture berbère et comment mieux NIER qu’en DIVISANT.
Ils ont donc résolu, face à la renaissance de l’identité berbère, de la hacher en petits morceaux, afin de l’affaiblir. Lorsqu’ils finissent donc par reconnaître son existence, ils inventent, chaque fois, une communauté et une langue « différente «  parce que l’unité de la culture berbère, multi-millénaire, en Afrique du Nord, remet en cause leurs pouvoirs et leurs frontières artificielles...

samir_t samir_t April 30, 2021, edited April 30, 2021 April 30, 2021 at 1:24:56 PM UTC, edited April 30, 2021 at 1:29:13 PM UTC link Permalink

Juste pour apporter quelques éclaircissements. Nul doute que, comme vous le dites, les pouvoirs en place, en l’occurrence en Algérie et au Maroc, font tout pour effacer cette culture et la remplacer complètent par ce qu'ils ont importé en matière de panarabisme. Mais pour les langues berbères, si elles sont réellement opprimées et non sans l'aide de la France au départ, elles restent linguistiquement différentes au moins comme le sont les langues romanes d'aujourd'hui, le mieux serait donc de les promouvoir chacune dans son environnement, puisqu'il n'y pas vraiment d'intercompréhension possible à moins d'élever certaines au détriment d'autres.

sacredceltic sacredceltic April 30, 2021 April 30, 2021 at 1:49:21 PM UTC link Permalink

Je ne vois pas trop ce que la France a fait pour opprimer la culture berbère. L’intérêt de la France n’a jamais été d’imposer l’arabe aux Berbères, bien au contraire, d’autant que l’Arabie était dans la zone d’influence du concurrent britannique !
La culture berbère est reconnue et appréciée en France depuis bien longtemps et de nombreux Français parlent berbère. Je suis même bien sûr qu’il y a plus de locuteurs du berbère en France que de locuteurs de l’arabe, de l’allemand ou même de l’anglais 🙄

samir_t samir_t April 30, 2021, edited April 30, 2021 April 30, 2021 at 3:44:07 PM UTC, edited April 30, 2021 at 3:45:33 PM UTC link Permalink

Comment la France a contribué à opprimer la culture berbère, eh bien c’est à la fois simple et long à expliquer, puisque ça ne date pas d’hier. Au Maroc, c’est depuis la création du protectorat et l’alliance avec le sultan qu’ils ont aidé à réprimer les tribus berbères (guerre du Rif entre autres), tout en l’assistant à fonder l’État marocain « arabe » moderne avec toute l’arabité qu’il implique. En Algérie, Napoléon III avait le rêve de fonder un « royaume arabe » plus facile à contrôler, et depuis la colonisation, la France a commencé par créer les Bureaux Arabes (, sorte de points administratifs qui sont derrière toute l’arabisation qui s’en est suivie en Algérie, allant jusqu’à à arabiser les noms de nos villes et villages en Kabylie et même certains noms de famille, c’est pourquoi, bien que Kabyles nous passons toujours pour des Arabes en France à cause d’un tas de discours mensongers faits à travers le temps, sans oublier les panarabistes laissés à la tête de l’Algérie dite indépendante qui ont continué le travail plus activement, de Ben Bella à Bouteflika.
Je ne nie pas que les Berbéristes français ont fait beaucoup de recherches sur ces langues qu’ils désignaient d’ailleurs par leurs noms (kabyle, chleuh….), ils étaient poussés plutôt par l’intérêt scientifique et la nécessité de comprendre les « indigènes » en tant qu’empire, mais ils ont fourni une grande documentation que nous leur devons encore aujourd’hui.

sacredceltic sacredceltic April 30, 2021 April 30, 2021 at 7:15:43 PM UTC link Permalink

Napoléon le petit était un crétin, pantin des Anglais, qui s’est fait battre à plate couture par les Prussiens et que les Français vomissent.
C’est parce qu’il était anglophile qu’il était arabophile. En cela, il ne servait absolument pas les intérêts de la France, mais de l’Angleterre.

sacredceltic sacredceltic April 30, 2021 April 30, 2021 at 2:27:23 PM UTC link Permalink

Au XIXe siècle est né en France un mouvement « berbérophile », qui s’est prolongé jusqu’à nos jours. De grands écrivains et intellectuels français en relevaient, tel Charles de Foucauld. Beaucoup de Français voyaient chez les Berbères des similitudes avec les Français. On les comparait aux Auvergnats.
Évidemment, avec les guerres d’indépendance, et la prise en main des pouvoirs par les « Arabes », tout ceci a été effacé...à dessein. Mais les berbérophones et berbérophiles existent encore après toutes ces années et tous ces déchirements.
Ce n’est pas pour rien que le plat préféré des Français, année après année, reste le couscous...