{{}} No language found.
{{}} No language found.

gillux's messages on the Wall (total 442)

2019-04-03 04:49
Je comprends votre frustration. Pour répondre à la question du « pourquoi », la raison est que personne n’a encore travaillé à améliorer ça. Il se trouve que très peu de personnes travaillent sur Tatoeba et il y a des tonnes et des tonnes d’autres choses à améliorer et d’autres problèmes à corriger (dont certains bien plus critiques, qui rendent par exemple le site inaccessible).

Il existe une solution pour réduire l’inconfort en attendant que la situation s’améliore. Vous pouvez créer des phrases en CC-BY pendant un certain temps (par exemple une journée), puis changer la licence de toutes vos phrases CC-BY en CC0 d’un coup d’un seul, en allant sur la page
2019-03-04 11:12
CK recently brought to my attention that Tatoeba’s Twitter and Facebook accounts are not used. Does anybody would like to write news about Tatoeba? I’m thinking about having somebody in our community whose role would be to write "good news" or "updates" about Tatoeba in the Twitter and Facebook account, and maybe the blog.

Whenever a member has some valuable information (new audio contributions (CK), new stats (sharptoothed), new feature available (me or Trang), for example, he/she could pass it on to the news writer. The news writer could also just pick information from the Wall on his/her own.
2019-02-27 10:13 - 2019-02-27 10:17
I dream that Tatoeba is a project I can be proud of when I’m showing it to my friends: "Do you know this website, Tatoeba?" "No, let me check it out." The homepage loads instantly. Everything’s localized, neat, beautiful, self-explanatory and easy to use from a smartphone or a computer. It shows some inspiring and featured example sentences. My friend tries to makes a search. The results are very relevant and show up almost instantly.

I dream that Tatoeba is a worldwide reference among language enthusiasts. Most professional translators prefer it over closed-source solutions because the results are more diverse and accurate, and all of their colleagues are on it too. Popular dictionaries all include Tatoeba’s sentences to illustrate their definitions. Whenever people want to make a point whether a particular expression is correct or not, widely used or not, they don’t argue by showing Google’s number of results; they show Tatoeba’s results instead. Tatoeba no longer relies on the ISO to include a new language. It’s like the other way around: having a language listed on Tatoeba is a point that may convince the ISO folks to include it too.

I dream that Tatoeba is a key tool for most language teachers around the world to prepare their lessons. Just give Tatoeba a few grammatical concepts and vocabulary items to study, and it gives you the materials you need.

I dream that Tatoeba’s community is huge, diverse and everyone’s equal. There are many active members from all Asian countries, the Global South, and all the minorities on Earth are well represented. Countries that are threatening certain language minorities are constantly trying to block Tatoeba because they can’t stand that these languages are being listed as such on something as famous as Tatoeba. Tatoeba is regularly mentioned on the news whenever a language minority is being threatened.
2019-02-27 05:49
Yes, but see, Trang is probably the only person who expressed that, and yet it's not a dream, it’s a concept.

What do YOU think, CK? Which Tatoeba do you dream of?
2019-02-27 04:49
Dear Tatoeba contributors,

From this Friday, I will be working on Tatoeba again, thanks to our collaboration with Mozilla (thank you!). I will work to facilitate the use of sentences by Common Voice, but also to improve Tatoeba in general.

One of the ways I would like to achieve this goal is to first ask you what Tatoeba you are dreaming of. I think we focus too much on concrete details and forget to let ourselves dream. Yet dreams are one of the major forces driving us forward. Our Github is full of very concrete "little suggestions" and "little problems", but what are we really aspiring to in order to make Tatoeba a project useful to humanity?

I think that we, who are involved in Tatoeba in one way or another, all of us have in our heads a Tatoeba of our dreams, some big ideals, some big crazy ideas, a personal vision of how it should be, but that we refrain from expressing. So I am asking you to forget about the details, to forget about the quarrels, to think big and far, to let go a little and tell me frankly: which Tatoeba do you dream of?


Chers contributeurs de Tatoeba,

À partir du vendredi qui arrive, je vais travailler à nouveau sur Tatoeba, grâce à notre collaboration avec Mozilla (merci à eux !). Je vais travailler à faciliter l’utilisation des phrases par Common Voice, mais aussi à améliorer Tatoeba de manière générale.

Une des façons dont j’aimerais atteindre ce but, c’est de commencer par vous demander de quel Tatoeba vous rêvez. Je pense que nous nous attardons trop sur des détails concrets et que nous oublions de nous laisser aller à rêver. Pourtant, les rêves sont une des forces majeures qui nous poussent à aller de l’avant. Notre Github est rempli de « petites suggestions » et de « petits problèmes » très concrets, mais à quoi aspirons-nous vraiment pour que Tatoeba devienne un projet utile à l'humanité ?

Je pense que nous, qui nous impliquons de près ou de loin dans Tatoeba, nous avons tous dans notre tête un Tatoeba de nos rêves, de grands idéaux, de grandes idées folles, une vision personnelle de comment ça devrait être, mais que nous nous retenons d’exprimer. Alors je vous demande d’oublier les détails, d’oublier les querelles, de penser grand et loin, de vous lâcher un peu et de me dire franchement : de quel Tatoeba rêvez-vous ?
2019-02-18 05:47
> Have you seen the GitHub issue?

Sorry, I didn’t. I commented there too.

> Allowing Ottoman Turkish sentences in the Latin script will increase contributions in the old language and its readability.

I see. Let me try to understand the situation. Can you tell me if the following is correct?
1. Ottoman Turkish is not a living language any more (there are no native speakers alive).
2. Native speakers of Ottoman Turkish used the Arabic script only.
3. Most of the people who understand Ottoman Turkish are native speakers of Turkish.
4. Native speakers of Turkish are unfamiliar with the Arabic script.

If that is correct, I believe it makes sense to convert Ottoman Turkish from Arabic to Latin, but not the other way around, because Latin not is no more than a reading aid for native speakers of Turkish. In other words, I think all Ottoman Turkish sentences should stay in Arabic only, while we only attach Latin as a transcription of them.

> I created only one pair set as 'unknown' for demonstration.

I see. Next time, please use instead for demonstration purposes.
2019-02-17 13:10
As you pointed out, the current implementation assumes that Ottoman Turkish is written right-to-left using Arabic script.

I had a look at the English Wikipedia article about the Ottoman Turkish language, and I am a bit confused because it says that this language switched to the Latin script as it evolved into modern Turkish. Can you elaborate about the contemporary use of Arabic vs. Latin to write Ottoman Turkish?

One way to quickly solve the display problem is to set the direction of Ottoman Turkish to "auto". Another, much more complex way is to implement multiple script support in and auto-convert between, but only if it's worth, that is to say there are actually native speakers using Latin and Arabic, we want to be able find sentences written in Arabic by the searching in Latin and vice-versa, the conversion can be partly or fully automated, etc.

As you found out, the direction of sentences of "unknown" language is set to automatic. That said, this is not a reason to set the language of your Ottoman Turkish sentences written in Latin script to "unknown", just because they look better. I strongly discourage you from doing this because then these sentences are excluded from the Ottoman Turkish corpus, they won't show up in searches and statistics, which is preventing contributors/learners of Ottoman Turkish from finding them. What's worse, since *only you* know their actual language, if for some reason you forget about them or stop contributing, these sentences will never be assigned to the correct language and will be definitely lost.
2018-11-05 20:35 - 2018-11-05 20:38
Tu réponds de nouveau à côté de la plaque en évoquant les fonctions pratiques alors que je te parle de juridique… Ne le prends pas mal, mais tu sembles ignorant des questions de licence, aussi je te suggère de t’informer sérieusement sur le sujet et de décider clairement d’une licence *avant* de commencer à mettre en ligne quoi que ce soit. Regarde peut-être du côté des licences Creative Commons.

Choisir une licence revient à choisir une ligne directrice, une politique pour la diffusion de ton dictionnaire. Dans l’état actuel des choses, je ne sais pas si j’aurai le droit de copier et réutiliser le contenu de ton dictionnaire ou pas. Or, à l’ère du numérique où il est si facile et pratique de copier et réutiliser les données, cela laisse quand même un gros point d’interrogation.

Personnellement, je ne donnerai pas un centime à ton projet tant que la licence du dictionnaire ne sera pas fixée. En effet, je pense que ton dictionnaire, aussi bon soit-il, serait peu utile si d’autres projets ne pouvaient pas le réutiliser.

PS : je te conseille aussi de changer de juriste.
2018-11-04 22:31
Je crois que Grendayzer te suggérait de fonder ton site sur le même moteur que Wikitionnaire (à savoir MediaWiki), et non de contribuer au Wikitionnaire. MediaWiki permet de faire de la modération a priori des contributions, donc ce ne serait qu’une question de paramétrage.

Je suis plutôt d’accord, il existe probablement des moteurs de site qui répondent déjà à ton besoin, MediaWiki n’étant que l’un d’entre eux.
2018-11-04 22:13
Par ailleurs, puisque tu parles de la possibilité de contribuer au dictionnaire, la question de la licence s’applique également au contenu contribué. C’est typiquement la case "j’accepte" que l’on coche aveuglément au moment de s’inscrire. Mes contributions t’appartiendront-elles de fait, ou bien aurai-je le droit d’avoir mon nom dans le dictionnaire ? Et cætera et cætera.
2018-11-04 22:11 - 2018-11-04 22:15
> Quant aux licences, je prévois de créer ce dictionnaire avec des fonctions collaboratives

Je ne parlais pas des fonctions pratiques, mais de la licence au sens juridique du terme. C’est-à-dire de quels sont les droits et devoirs des gens qui utiliseront de près ou de loin à ton dictionnaire. Tu devras mettre sur ton site une page qui explique les conditions d'utilisation du contenu. Par exemple, sur Tatoeba, il s’agit de

Mettons que j’écris un programme qui télécharge l’ensemble de ton site, ou une partie, ou même juste une seule page. Puis, j’extrais le contenu (définitions, traductions, exemples etc.) de la ou les pages téléchargées. Dès lors, qu'aurai-je le droit de faire avec ce contenu ? Aurai-je le droit de :
• l’utiliser juste pour moi ?
• l’utiliser dans un cadre éducatif ?
• l’utiliser à des fins commerciales ?
• le réutiliser à l’intérieur d’un autre document écrit par moi ? Si oui, sous quelles conditions (mentionner ton nom, etc.) ?
• le republier sur un autre support ?

Voilà le genre de questions auxquelles répond la licence. Et tu devras y répondre tôt ou tard car des gens vont vouloir se servir de ton dictionnaire autrement qu’en faisant des recherches sur le site (certains le feront d’ailleurs impunément, et là la licence te protège légalement). Tu peux ne rien autoriser du tout, n’autoriser que certains trucs, ou autoriser presque tout. Il n’y a pas de bonne ou de mauvaise réponse, c’est ton choix.

(Note que je parle de télécharger les pages pour simplifier l’explication, mais cela peut passer par la mise à disposition de fichiers dictionnaire ou d’une API.)
2018-11-03 22:32
Bonjour Nicolas,

Voilà un beau projet, je te souhaite qu’il réussisse. J’ai quelques interrogations.

Quand tu parles du "chinois", tu veux dire mandarin uniquement ?

Sous quelle licence comptes-tu mettre le dictionnaire à disposition ?
2018-09-27 00:02 - 2018-09-27 00:57
Note to contributors: I’ve improved the language autodetection feature, so it should work better now. It should also become more accurate over time.

Long story:

For those who don’t know, when you add a new sentence and select "autodetect" for the language, there is a tool called Tatodetect that guesses the language of your sentence. Tatodetect works by making a statistical analysis of the Tatoeba corpus to learn what words are used in what languages. So basically the more sentences there is in a given language, the more accurately Tatodetect can autodetect it.

However, there was a limitation: Tatodetect can not learn from new sentences unless it performs a new (costly) analysis of the corpus. As a result, we had to manually start new analyses of the corpus every now and then, so that Tatodetect could learn from newly added sentences. The last analysis was from June 2017. I ran a new one today and I automated this process. The corpus is now going to be re-analysed on a weekly basis.
2018-09-15 22:44
Thanks for the improvements, it feels quite usable already.

About the profile languages. How about just bringing them on the top of the list, like on the current dropdown? This way, I can still use the mouse or tap on a touchscreen to easily select one of my profile languages, while the person in your example won’t be confused by seeing only two options. You could also put a different background color for the profile languages, to make them stand out of the rest of the list.

Other than that, I find the interline space a bit too large inside the list. After clicking on the field, the drop down shows "Any language" + 4 languages (the last one slightly truncated), while I think there is enough space for 6 or 7 languages there. This would be a significant improvement if you implement what I said about the profile languages.
2018-09-14 07:46
That’s a good point. This could make that new dropdown harder to use on devices without a physical keyboard, for example.
2018-09-13 06:41
Great! It is definitely more comfortable to use.

A few comments:

The highlight is only shown when the language starts with the entered text. For example, using the English UI, typing "rus" highlights "Rus" in "Russian" and "Rusyn" but not in other entries, like "Belarusian".

The sorting of the suggested values could be improved. In the above example, I think "Russian" should show up above "Belarusian".

I can type anything that is not a language name and press the search button. The result is that whatever wasn’t a language name is treated as "any language". This is quite misleading. I think the form shouldn’t allow clicking the search button without a properly selected value as language.

On the search bar, the keyword field, the language drop downs and the search button use to have a consistent height. Now, the drop downs are bigger than the keywords field and the button.
2018-09-10 15:36
Not that I want to argue about whether we should implement this feature or not, but I’m curious about the way you proofread sentences. I am not a corpus maintainer, so I don’t know what it takes to proofread many many sentences.

As a native speaker of French, I almost only add French sentences, but it doesn’t mean they are free of errors. I regularly get comments about mistakes here and there. It’s mostly more about orthography than naturalness, but still. This makes me think that the amount of trust I’d put in a sentence has more to do with the number and quality of proofreads than the nativeness of the author.

So my point is: shouldn’t sentences be equally checked whether they are from native speakers or not?
2018-09-10 15:21
As a general rule, as long as you can listen to something, it can be downloaded. It’s just a matter of whether the website makes it user-friendly or not. On Tatoeba, it isn’t user-friendly (yet), and the reasons include what Guybrush88 and deniko said.
2018-09-06 18:27
I totally agree with what deniko said.

I think that formality is just one of the many aspects of a language that can be confusing for learners the first time they see it. But once you get it, it’s not a problem any more. Correct me if I’m wrong, but what you said can apply to, say, future tense. It’s confusing for beginners who only know about the present tense to be shown sentences in future tense, so let’s separate sentences by tense (actually, some people are doing this already, using tags like

For more information about how to add tags, see

Personally, I wouldn’t make too much assumptions on how my sentences are going to be used and by who. I don’t like the idea of restraining or changing the way I write sentences just because maybe, a non-native speaker will not understand. Quite the contrary, I think Tatoeba is a good place to add colloquial sentences, because there are certainly enough textbooks out there full of formal sentences. Consider the following guidelines, from

• We don't want the awkward, unnatural-sounding translations seen in textbooks to help students understand how another language is constructed.
• We want sentences that a native speaker would actually use.
• Remember that others will be using the translation that you make into your own language to study your language.

If you’re still unsure, you can also ask @Silja’s opinion since she’s the corpus maintainer of Finnish.
2018-09-03 04:09
On WhatsApp, you let the whole group know about your personal phone number by just joining it. I believe some people are not okay with that.