clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search

gillux's messages on the Wall (total 442)

gillux
2016-03-09 21:29
We have a few lists that do not belong to anybody because the account of their creator was deleted. If anyone wants to “adopt” them, please tell me. Otherwise, I think I’ll delete them.

https://tatoeba.org/sentences_lists/show/497
https://tatoeba.org/sentences_lists/show/517
https://tatoeba.org/sentences_lists/show/650
https://tatoeba.org/sentences_lists/show/1300
https://tatoeba.org/sentences_lists/show/2188
gillux
2016-03-09 13:41 - 2016-03-09 13:44
> But, then again, renaming ISO 639-3 EGL ("Emilian") as "Bolognese/Ferrarese/Mantovano/Modenese/Parmigiano/Piacentino/Reggiano/Vogherese" is pushing it, isn't it?

So ultimately, the mere naming of a language can be a political issue too.

It’s not a perfect solution, but what if we had something like “language presentation” pages for each language (or one for all the languages), that would explain what our current language icons and names are unable to express? I’m thinking about a short page that could be reach by clicking on the flag or the language name (or even a large tooltip). Its goal would be to:
• define the language more clearly when its name is ambiguous or the language encompass several varieties (which would as well solve what I said here [1])
• explain that we’re using flags and single names only for their ability to efficiently convey information

It’s just about making what is currently obscure clearer: that we’re using language names and language icons that are sometimes incorrect, but that this choice shouldn’t be interpreted as political and that all the language varieties are welcome.

[1] https://github.com/Tatoeba/tatoeba2/issues/936
gillux
2016-03-08 19:16
J’ai finalement résolu le problème. Ça devrait être effectif à la prochaine mise à jour du site. Le bug était effectivement très imprévisible car lié à l’état d’indexation des phrases pour la recherche, qui est modifié par des tas de facteurs (la réindexation qui a lieu tous les quarts d’heure, une modification de la phrase, de ses liens directs et indirects, d’une autre phrase liée directement ou indirectement…).
gillux
2016-03-08 18:29 - 2016-03-08 18:30
The problem should be fixed next time we update the website.
gillux
2016-03-08 18:21
Thank you, I recorded the issue: https://github.com/Tatoeba/tatoeba2/issues/1050.
gillux
2016-03-08 18:13 - 2016-03-08 18:16
Pour mettre les choses au clair: je ne parlais que d’ajouter une classification par dessus celle de l’ISO existante, et non de reclasser toutes les langues. Mon but est d’offrir un moyen de mettre en évidence, au sein des langues existantes, les variations qui ne justifient pas à elles seules une langue à part entière (notamment pour le Valencien, et tant qu’à y être d’autres cas similaires). Ça ne prétend pas résoudre tous les cas de figure comme celui du Portuñol.

Tout comme alexmarcelo, je pense que ce n’est pas parce que le problème de la classification des langues en général est insoluble qu’il faut abandonner d’entrée de jeu. La situation actuelle (classement selon la norme ISO) est de toute manière bancale, alors pourquoi ne pas tenter de la rendre moins mauvaise ? Le problème des drapeaux, quant à lui, reste entier, avec ou sans classification des variations. Tatoeba s’affiche clairement comme une plateforme en faveur de la diversité des langues, aussi je ne pense pas qu’on s’expose à « des torrents de violence et de haine » en tentant de la faire valoir.
gillux
2016-03-08 13:59
> D'autres sites l'ont fait sans problème.

Lesquels ?
gillux
2016-03-08 13:53
> That we would have lots of near duplicates is no trouble. "Banana" is "banana" in dozens of languages and yet doesn't make them the same.

I do think it’s a problem when it comes to sentences that are equal in writing and meaning in different language varieties. Let’s say I want to search for sentences containing the word “banana” in English. (English is NOT a good example, but it’s just an example.) If I need to look it up in British English, then American English, and then Australian English etc. to find different sentences, it’s not usable. Equally, if I need to enter the sentence “I like bananas.” in all the English varieties, it’s very cumbersome.

A possible way to deal with this problem could be a hierarchical classification of languages and varieties, to allow some sentences to belong to the global “English” category, and others to one or more specific variant like “British English” or “American English”. This way, one could add or search for sentences in either English or one of its variants. At the moment, we’re using tags for this, but they are not systematically used, not usable by regular users and not visible enough. We could have a sort of subflag or something to indicate the language along with its possible variety. Do you think the Valencian problem could be solved by something like this?

Note that the ISO 639-6 standard was mean to classify language varieties, but it’s dead. We may as well establish our own classification scheme.
gillux
2016-03-08 01:46
I’m not really fond of this idea. Transifex has this exact feature and I find it annoying because I’m already using email notifications to keep track of things. Let me explain how I’m dealing with my emails. Whenever I finish reading a new email, if I need to do something about it (reply to it, fix a sentence, whatever) either I do it immediately, or if I decide to do it later, I mark the email as unread to remind me I have something to do about it. When it’s done, I set the email as read.

In Transifex, the notifications on the site are useless because I already read them as email, so they are just stacking up. At first, I constantly marked them as read, but I eventually stopped caring and the top-right number is just growing indefinitely. Either way, they are getting in my way.
gillux
2016-03-05 18:41
Je crois avoir compris. Lorsqu’il n’existe pas de « phrase suivante », car celle qu’on visualise est la plus récente du corpus, ou bien la plus récente du corpus dans la langue sélectionnée, alors le bouton « phrase suivante » agit comme le bouton « phrase précédente ».

Cependant, la phrase #4922015 que marafon a mentionnée a été créée il y a quinze jours, donc elle n’aurait pas dû poser de problème car ce n’était plus la plus récente aujourd’hui.

GrizaLeono, peux-tu me dire si mon interprétation correspond probablement aux cas où tu as rencontré le problème ?
gillux
2016-03-05 18:17
Hello Quielin, welcome to Tatoeba. As you guessed, the Pinyin converter currently used on Tatoeba is sinoparserd. I don’t understand Chinese, but it seems the generated Pinyin is tokenized: Pinyin “words” are separated by spaces.
gillux
2016-03-05 18:10
It’s possible to implement sorting by number of “characters” (that is to say: all the letters, punctuation, numbers, spaces etc. of a sentence). It probably depends on the language, but I think sorting by number of words is usually more relevant than number of characters. Can you elaborate on your need for sorting by number of letters?
gillux
2016-03-04 22:22
> How long should the event last? (1 week? 1 month? 3 months?)

Why not let the participant decide?

> What tasks will the participants work on? Do they make their own proposals? Or do we vote for a certain number of features, bugs, improvements from which the participants have to choose?

I think we should let the participants work on whatever they want, giving they know it. Maybe I’m wrong, but I don’t believe new contributors will come up from our community without knowing what to do, like during GSoC.

> What rewards would the participants get? (Goodies? Money? Or eternal gratitude?)

That’s a basic yet interesting question. I really can’t see how money could be a good incentive. For any major Tatoeba contributor, it never has been. Here is a 10-min video exploring that matter I mostly agree with: https://www.youtube.com/watch?v=u6XAPnuFjJc

Self-satisfaction of my work, learning things, and the feeling of contributing to something useful (especially for language learners) are my main incentives to develop Tatoeba. I think we should rather use that kind of things as a driving force. In other words, I think we should convince any potential contributor, even beginner, that he or she will receive our best guidance, learn a bunch of things, and work on something super useful. Maybe we could also display code contribution on people’s profiles, the same way sentences contributions etc. are displayed.

It’s a bit off-topic, but another possible approach to attract developers could be to make Tatoeba’s data more developer-friendly and more known, so that more developers use our data in their applications. The more users are developers, the more likely we should see spontaneous code contributions.

> What would be the requirements to be eligible to participate?

I don’t think we need to set any particular requirement.
gillux
2016-03-02 17:28 - 2016-03-02 17:30
It’s not possible to specifically search for hyphen, along with pretty much any punctuation character. They are simply ignored from searches (actually, if preceded by a space and immediately followed by a word, the hyphen stands for “not including this word”). The reason is that the search feature works on a word basis, so you may look up words, but not characters. (And it has nothing to do with stemming.)
gillux
2016-03-02 17:19
I confirm the user option “Languages” is not working any more. Thank you for letting us know. I recorded the issue: https://github.com/Tatoeba/tatoeba2/issues/1037 We’ll fix it when we’ll have the time to do it.
gillux
2016-02-28 19:46
How easy do you think it is to generate stressed versions of sentences?
gillux
2016-02-27 22:02
Tatoeba has been updated. Highlights:

• Keywords are now highlighted in search results
• New language: Morisyen

See the complete list of changes: https://github.com/Tatoeba/tato...e%3A2016-02-27
gillux
2016-02-21 14:47
Yes. We basically need more of these pairs. See the complete guide about requesting new transcriptions: http://en.wiki.tatoeba.org/arti...iption-request
gillux
2016-02-20 20:27 - 2016-02-21 01:00
*** Tatoeba has been updated ***

• New language added: Baybayanon
• Orphan and unapproved sentences will no longer be displayed as random sentence, in the home page and the “several random sentences” page
• Audio icons now displayed in lists too
• Minor fixes regarding furigana
gillux
2016-02-16 11:17
To me, the Github issues tracker is just a tool for developers to not forget about issues and enhancement requests, and discuss implementation details.

However, since developers are currently taking final decisions, yes, the Github issues tracker somewhat matches what pullnoseman said. But as I said, I don’t like that as a developer because I can’t and won’t assume big decisions. I would like things to be decided on tatoeba.org (you said the fact it’s not integrated into Tatoeba is a technically a minor problem, and I agree, but to me it’s a major symbolic problem) while Github would just be the place where developers execute what have been decided by the community. I’d like powers to be separated on Tatoeba (https://en.wikipedia.org/wiki/S...on_of_powers).