{{}} No language found.
{{}} No language found.

gillux's messages on the Wall (total 401)

2015-12-08 02:24 - 2015-12-08 02:26
Users of the DuckDuckGo search engine can now use the !tato bang to perform a search on Tatoeba. For instance, looking for “!tato question” on DuckDuckGo will search the word “question” on Tatoeba (in all languages). It’s just a simple redirection.
2015-12-08 02:12
Thank you for your feedback.

> Since this behavior doesn't happen with the Japanese comma, full stop, and quotation brackets (、 and 。 and 「」), I assume they are in one way or another categorized as exceptions. I would in that case add several other reading signs, such as ( ) ? ! to the same list.

Yes, you guessed right. I expected some bugs like furigana expanding leftward since the implementation isn’t great. The root of this problem is that bracket syntax. It’s easy to edit for humans, but hard to parse for computers, because it’s not clear which characters the furigana belongs to. Furiganas are actually internally stored using the computer-friendly syntax [漢字|かんじ], which is why autogenerated furiganas do not expand oddly. You can directly input furigana using that syntax if you want to work around expansion bugs, but of course the goal is not to have to use it.

I don’t know if we should enforce furigana over every word that is not Japanese. I’m tempted to say that we should, but I’d like to have the opinion of Japanese contributors.

About 8才, as you said cases like 10{}分{ふん} may be misleading. Since さい expanded over the whole 8才 is easier to spot, I think I’ll keep things the way they are now.
2015-12-03 06:49
Thank you for reporting this problem, it’s now solved.
2015-11-29 03:49
Yes, this is a known bug.
2015-11-28 18:05
> Why cannot the automatic transcription of Japanese be edited when adding translation?

It’s not available on yet, but we’re about to make it available soon. For the moment, you need to use to test that feature.

> By the way, is it legal to add a nonstandard dialect translation?

Yes. You may left a comment on the sentence page to give more details about that.
2015-11-28 16:00
As Wezel said, only Japanese and Chinese have editable transcriptions so far. Of course, it’s just a starting point, and the whole point is to add more transcriptions in the future. See also this wiki article about adding new transcriptions:
2015-11-28 01:03
Voilà. You can see the result on the dev website:
2015-11-26 16:44
*** Editable transcription update ***

I think the editable transcriptions feature is more or less ready to be published on I’d like people to test it as thoroughly as possible to detect any remaining bugs.

Main changes since the last post¹:
• New “transcription of X” page (e.g.
• Permission system is now as follows: regular users can only edit transcriptions of their own sentences, advanced users can review machine-generated transcriptions of others’ sentences, corpus maintainers can edit any transcription.
• Warning icon put the left, clicking it reviews the transcription.
• Warning message moved to the tooltip of the warning icon.
• Prettier pinyin, autogenerated furiganas removed on numerals.

2015-11-23 19:26
2015-11-23 14:50 - 2015-11-23 14:51
I’m still not satisfied with the permission system but I don’t know how to improve it.

The current system is as follows:
• New transcriptions can be added by anyone.
• When a transcription was last edited by someone else than the sentence owner, only the transcription author, the sentence owner, corpus maintainers and admins may edit it.
• When a transcription was last edited by the owner of the sentence, only the sentence owner, corpus maintainers and admins may edit it.

So the current system is rather open, in order to allow transcriptions to be added on sentences of inactive contributors or contributors who don’t want to bother transcribing their sentences or just can’t because they don’t know the transcription script. But it also allow vandalism while we don’t have any countermeasure. I also feel it’s a bit too complex to understand for newcomers.
2015-11-23 14:08
I created an article on the wiki for new transcription requests:
2015-11-22 17:01
It could. Actually, the way I implemented it makes it very easy to do it for other languages as well. However, whether adding romanization for “any other language that's not "romanized" (that don't use the Roman script)” is a good thing or not is something members seems to have different opinions about. See for example this thread:
2015-11-22 16:52
> 1. It would be nice if there was a button to verify a transcription with a single click.

I moved the warning icon to the left and made it a click-to-review button.

> 2. Since the machine-gendrated readings of Japanese numerals are almost always wrong, you may as well remove them. It's quite troublesome to turn "3{さん}0{ぜろ}分{ふん}" into "30分{さんじゅっぷん}".


> 3. Chinese automatic transcriptions could be improved a little.

I know. This is yet to be done.
2015-11-22 06:02 - 2015-11-22 17:07
*** Editable transcriptions progress ***

Lately, I’ve been working on the editable transcriptions feature and set it up on (don’t be afraid by the big blue announcement text, it’s another work in progress by Trang). I tried to address the comments of the previous thread¹. You’re welcome to check it out and comment.

Notable changes:
• Design update, new icons, (hopefully) better usability.
• Added a button to edit (instead of clicking on the text).
• Added a button to verify transcription (instead of having to click edit → OK).
• Changed the toggle transcription button with per-transcription toggle buttons.
• Helpful errors displayed on invalid transcriptions (only furigana has strict validation for the moment).
• Various bugs fixes.

2015-10-03 10:26
> the line extends beyond the vowel and looks bad
> These vowel+unicode combinations look good in other webpages

Could you define “good” and “bad” by uploading some screenshots? What you see on your screen is likely different from what others see.
2015-08-03 10:36
So you actually have a rough idea of scoring sentences by weighting votes based on trust, which requires asking users to vote, which is why you’re implementing a voting mechanism. But voting may not be the only way to tackle the quality problem. It all depends on how you’re going about it.

Ignoring the question of “what is correct and incorrect (and unsure)” will lead to a major confusion and approximation of the answers. The answers will depend on the language level of users and what they are using Tatoeba’s sentences for, what is “good enough” for them. “Correct” may be interpreted as “learners may still learn something from it”, “free of typos”, “grammatically correct”, “is used among people I know”, “is used in context X (academic, conversational, internet…)”, “is used nowadays”… Negate these assertions and you get as much interpretations for “incorrect”. I’m not even speaking of “unsure”.

Each of these assertions are characteristics we all partly use to judge “correctness” in our own subjective way. By mixing these opinions, you’re comparing apples and oranges. To me, when a sentence is believed to be incorrect, what matters is not the number of people who think so or who they are, it’s why. X people saying it’s incorrect (being weighted or not) has no value compared to an individual comment demonstrating what’s wrong.

On the other hand, claiming a sentence is correct is way more difficult to defend. A sentence is generally considered correct as long as nobody has a reason to say it’s not. That’s all. I happen to correct OK-tagged sentences that were otherwise considered correct because nobody spotted any error so far. And maybe someone else will in turn contradict me by having yet another thing to say.

Besides, as others mentioned I fear that dialect minorities on Tatoeba will likely to be threatened by their majority counterparts in such a system, because they are minorities on Tatoeba too. Weighting won’t help distinguishing between an valid minority and an irrelevant one in the score.

Because of all this, I’m thinking about the following way to improve the quality of sentences. It’s just an idea, but what if members could only be able to say either (1) “I was unable to find anything wrong with this sentence (proofread?)” or (2) “It’s wrong to me because [insert explanation]”. The more (1), the more likely the sentence is good, but it doesn’t mean it’s correct. Any (2) is to be solved by sentence modification, deletion, tag addition (like regional, slang…), mutual agreement, corpus maintainer decision… Instead of using bare comments, we could use a minimal issues tracker system to easily keep track of all the (2).
2015-08-02 18:26
I find that question quite central and I think it should be dealt *beforehand*. This data will be used as a base for an evaluation system which mechanism is yet to be clearly defined, so we don’t know what actual data we want to gather. By data, I mean to what questions users are answering, precisely, when clicking on these buttons. I think the approach “let’s gather data and we’ll see later what to do” is wrong, we should go with “let’s define a good evaluation system so that we know what data we need to gather”.
2015-07-27 09:57
I totally agree that the way sentences are displayed is not very intuitive for newcomers. If you, or anyone else, can suggest a better way, I’m all ears. I think the double arrow may be a good idea.
2015-07-27 09:53
Pour l’instant, tu peux toujours demander sur le Mur. J’ai ajouté quelques phrases d’exemple des mots que tu as mentionné.
2015-07-17 09:13
This is due to a technical limitation of the search. It’s not possible to filter by natives when the “from” language is set to “any”.