clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search

gillux's messages on the Wall (total 401)

gillux
2016-02-16 10:26
Looks like a limitation of Sphinx. I reported the problem to them.
gillux
2016-02-15 06:33 - 2016-02-15 06:35
I agree with you, but I feel like you kinda missed an important point. Pullnoseman was talking about taking decisions. I think it’s really important. Often, discussions end up with no clear result because we have no clear way of deciding what to do. For example, we never vote after discussing. At the moment, in practice only you and I have the actual right to decide, by starting implementing things (or not doing it), and I have to admit I’m not very comfortable with this. When everyone agrees on a solution, it’s easy to start implementing it. But when there are several propositions, or people not agreeing, or when the decision has important implications for the future of the project, or I just don’t feel confident with a particular topic, then I won’t do anything. You certainly can, as the author of Tatoeba, but I can’t. To sum up, at the moment you’re the only one who actually decide on important changes. Maybe you’re fine with this, but it makes Tatoeba growing really slowly regarding key issues like corpora quality etc. Just like pullnosman said in the previous thread:

> it appears to me like it's simply TRANG's site, and everyone else is just getting involved where they please, some more and some less, but generally unable to really get together to get something moving on a larger scale, and often times even against one another instead of together. A lot of energy simply evaporates, people shout their thoughts into the prairie and then mostly go on doing their own thing again.

If we had an official way to decide on what to do, it would free me from the responsibility of taking decisions. I think it would also force people to focus on finding clear solutions rather than “shouting their thoughts into the prairie”.
gillux
2016-02-14 21:06
I set it already.
gillux
2016-02-14 16:42
Merci pour cette analyse intéressante.

> Si l'on veut afficher une meilleure qualité, je suggère donc, sur la page d'accueil qui est le premier contact avec les nouveaux venus, de non seulement ne pas afficher les phrases non adoptées, mais de ne pas afficher non plus les phrases des nouveaux venus, qui ont plus de chances de n'être pas des phrases de leur langue natale, et donc présentent un plus grand risque d'erreur. Une sorte de mise en quarantaine.

C’est une bonne idée, mais sur quel critère détermine-t-on qu’un contributeur est ou n’est pas un « nouveau venu » ?
gillux
2016-02-14 02:57
I think It’s a bit too small. It fits for messages like on the Wall, but not for the profile page.
gillux
2016-02-10 15:29
I think your analysis is right and I have to say it is the same on the development side. Even though there are very few active developers (only me and Trang at the moment), we’re unable to coordinate and decide upon what to do. I remember I had trouble with this in the past [1], but I eventually stopped caring and I kept focusing on what matters to me (furigana for Japanese, advanced search…), while I sometimes fix bugs or add features on people requests when I’m in the mood.

> To make better use of the people's ideas, I think we need to have some kind of interface where a discussion can be started with the fixed goal that at the end of the discussion, all suggestions are taken into account and a decision is made.

I agree. This reminds me of the forum idea: https://tatoeba.org/wall/show_m...#message_19996

However, I think the tool is not the problem. If we’re unable to decide upon what do to after discussing a topic on the Wall, what would make using a different tool different? And like Trang said [1], how do we prioritize tasks? How do we gather people’s opinions in an efficient and relevant way? Since everyone have their own personal interests, I think it’s rather a political issue.

[1] https://tatoeba.org/wall/show_message/22454
gillux
2016-02-09 12:04
*** New feature on http://dev.tatoeba.org ***

When performing a search, matched words are now highlighted in the results. Note that due to technical limitations, searching on dev.tatoeba.org will only return old sentences.

Example: http://dev.tatoeba.org/sentence...h?query=puzzle
gillux
2016-02-06 21:23
Lately, I’ve been working at improving the readings of Japanese sentences (furigana), with the very helpful collaboration of tommy_san. Each reading is now placed independently on the top of each character, instead of being systematically grouped over each word, and a new syntax allows to distribute each part of the reading over each character. This image will explain better what I mean: https://i.imgur.com/1qGof66.png (you need to look closely at the furigana to catch the differences).

Actually, this feature has been around for some time but not announced. It may seem like a very picky change, but having furigana properly aligned is very helpful for learners of the Japanese language. It’s also the only proper way of displaying furigana, as it can be observed in any Japanese book, newspaper, placard… which hopefully makes Tatoeba looking a little bit more serious among Japanese people.
gillux
2016-02-02 05:21
J’ai essayé et je suis arrivé à ça: http://downloads.tatoeba.org/not_in_tatoeba/

C’est du bricolage, mais c’est utilisable. Je me suis basé sur le dictionnaire français de hunspell pour générer la liste « all ». Comme elle contient toutes les formes féminin/masculin, masculin/pluriel, conjugaisons etc., elle est très longue. Aussi j’ai tenté de l’ordonner et la filtrer à l’aide de listes de fréquence de mots que j’ai pu trouver sur le net. J’ai ainsi créé trois listes qui sont des sous-ensembles de « all »:

• 10k_lexique, combinée avec la liste de lexique.org et limitée aux dix mille premiers mots. C’est de loin la meilleure.
• 10k_opensubtitles, combinée avec une liste de fréquence basée sur les sous-titres d’opensubtitles.org. Ça donne une compilation de mots tout droit sortis des séries télé américaines. C’est plutôt moyen mais j’ai beaucoup rigolé en la lisant alors je la laisse.
• wortschatz, combinée avec la liste de fréquence de Wortschatz. Cette liste semble avoir des défauts et elle est courte donc le résultat est limité, mais exploitable.

Chaque liste est au format texte brut et HTML. Le contenu est le même. Dans la version HTML, un lien permet de chercher le mot sur Tatoeba. Ça permet de voir si des mots de la même famille sont déjà présents dans le corpus.
gillux
2016-02-01 21:20
Effectivement, c’est très curieux et ton problème m’a donné du fil à retordre. Mais j’ai trouvé. Dans ton message, tu n’écris qu’avec des espaces insécables (caractère unicode 202f), aussi ce que tu tapes est considéré comme un seul long mot. Seul le trait d’union de ton « ci-dessus » autorise une césure chez moi.
gillux
2016-01-24 20:49
That nice! However, there are sentences in which “I” is a female while the voice is male. Only to name a few: #3225955 #3273178 #3472275 #3534821 #3534825 #3217702 #3217699

I personally think these should be recorded by a female speaker instead.
gillux
2016-01-22 10:50 - 2016-01-22 10:52
I removed the IPA for Shanghainese while revamping the transcriptions system on Tatoeba. I’m sorry you miss it. I removed it because it wasn’t trustworthy and I assumed very few people understand IPA. I can restore it if you want.
gillux
2016-01-21 18:39
This behaviour was actually intended, but now I understand it was probably a mistake. I’ll revert it. In the meantime, I think you can use Chrome to select the text and you’ll get the furigana copied as well.
gillux
2016-01-15 16:37
And, well, I think Tatoeba is not a book, so, yes, it won’t work if you’re trying to use it like a book.
gillux
2016-01-15 16:34
Note that the order of one’ sentences used to be the way you want: chronological. At some point, it has been changed to the current order (reverse chronological), likely because someone requested it.

I understand your point, but I think it also makes sense the way it is now, to display the newest sentences first. It makes that page dynamic and I think newest sentences is the information many contributors are primary looking for. What new sentences X contributed? Anything new to translate from him/her? is a question I often ask myself. I remember before the order changed, I needed to first hit the first page and then click on the last page to see the newest sentences. Not only was it bothering, but then I started to remember by heart the first sentences just because I was always going through the first page first. Which is not what I wanted. That being said, there was no advanced search at that time. We can now use it to get someone’ sentences in whatever order.
gillux
2015-12-17 09:19
Problem solved, thank you for reporting. Sentences are reindexed every 20 minutes or so.
gillux
2015-12-14 17:42
I see. I’d like to generate furi with empty brackets after English words, like it is the case at the moment, so that the user needs to either fill them or remove them. This should work for other languages than English. However, it’s a bit harder than is sounds.

Currently, the validation rule is of the form “require furigana on anything except kana and punctuation”. Since I can’t think about all the classes of characters that should have furigana, I wanted to start with a strict rule so that we can soften it as we find exceptions. But if we allow foreign words, instead we need to explicitly list all the characters that require furigana, and the rule becomes “require furigana on kanji, numbers” and what else? I can’t think of everything. What about percent and other math symbols?
gillux
2015-12-14 12:39
Because there is no such thing.
gillux
2015-12-12 10:41 - 2015-12-12 19:32
> since people often search for sentences to make sure they don't exist before contributing new ones

I know you’re always doing this, but I don’t think it is true for the majority of people using Tatoeba (including people without an account).
gillux
2015-12-12 10:38
DostKaplan has a point though. I think people use interrogation marks in the search function more often intending that character than a wildcard. However it’s not possible to search any punctuation character in the first place (for instance you can’t search for commas). I personally disagree with the previous suggestions (I find them too intrusive). It’s not a easy problem, but maybe we could at least provide a more verbose message than “No results found for: <query>”. Like some hints about common pitfalls that prevent results from being returned: check selected language, wildcards, diacritics…