menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
gillux {{ icon }} keyboard_arrow_right

Profile

keyboard_arrow_right

Sentences

keyboard_arrow_right

Vocabulary

keyboard_arrow_right

Reviews

keyboard_arrow_right

Lists

keyboard_arrow_right

Favorites

keyboard_arrow_right

Comments

keyboard_arrow_right

Comments on gillux's sentences

keyboard_arrow_right

Wall messages

keyboard_arrow_right

Logs

keyboard_arrow_right

Audio

keyboard_arrow_right

Transcriptions

translate

Translate gillux's sentences

gillux's messages on the Wall (total 595)

gillux gillux February 9, 2016 February 9, 2016 at 12:04:24 PM UTC link Permalink

*** New feature on http://dev.tatoeba.org ***

When performing a search, matched words are now highlighted in the results. Note that due to technical limitations, searching on dev.tatoeba.org will only return old sentences.

Example: http://dev.tatoeba.org/sentence...h?query=puzzle

gillux gillux February 6, 2016 February 6, 2016 at 9:23:38 PM UTC link Permalink

Lately, I’ve been working at improving the readings of Japanese sentences (furigana), with the very helpful collaboration of tommy_san. Each reading is now placed independently on the top of each character, instead of being systematically grouped over each word, and a new syntax allows to distribute each part of the reading over each character. This image will explain better what I mean: https://i.imgur.com/1qGof66.png (you need to look closely at the furigana to catch the differences).

Actually, this feature has been around for some time but not announced. It may seem like a very picky change, but having furigana properly aligned is very helpful for learners of the Japanese language. It’s also the only proper way of displaying furigana, as it can be observed in any Japanese book, newspaper, placard… which hopefully makes Tatoeba looking a little bit more serious among Japanese people.

gillux gillux February 2, 2016 February 2, 2016 at 5:21:38 AM UTC link Permalink

J’ai essayé et je suis arrivé à ça: http://downloads.tatoeba.org/not_in_tatoeba/

C’est du bricolage, mais c’est utilisable. Je me suis basé sur le dictionnaire français de hunspell pour générer la liste « all ». Comme elle contient toutes les formes féminin/masculin, masculin/pluriel, conjugaisons etc., elle est très longue. Aussi j’ai tenté de l’ordonner et la filtrer à l’aide de listes de fréquence de mots que j’ai pu trouver sur le net. J’ai ainsi créé trois listes qui sont des sous-ensembles de « all »:

• 10k_lexique, combinée avec la liste de lexique.org et limitée aux dix mille premiers mots. C’est de loin la meilleure.
• 10k_opensubtitles, combinée avec une liste de fréquence basée sur les sous-titres d’opensubtitles.org. Ça donne une compilation de mots tout droit sortis des séries télé américaines. C’est plutôt moyen mais j’ai beaucoup rigolé en la lisant alors je la laisse.
• wortschatz, combinée avec la liste de fréquence de Wortschatz. Cette liste semble avoir des défauts et elle est courte donc le résultat est limité, mais exploitable.

Chaque liste est au format texte brut et HTML. Le contenu est le même. Dans la version HTML, un lien permet de chercher le mot sur Tatoeba. Ça permet de voir si des mots de la même famille sont déjà présents dans le corpus.

gillux gillux February 1, 2016 February 1, 2016 at 9:20:28 PM UTC link Permalink

Effectivement, c’est très curieux et ton problème m’a donné du fil à retordre. Mais j’ai trouvé. Dans ton message, tu n’écris qu’avec des espaces insécables (caractère unicode 202f), aussi ce que tu tapes est considéré comme un seul long mot. Seul le trait d’union de ton « ci-dessus » autorise une césure chez moi.

gillux gillux January 24, 2016 January 24, 2016 at 8:49:22 PM UTC link Permalink

That nice! However, there are sentences in which “I” is a female while the voice is male. Only to name a few: #3225955 #3273178 #3472275 #3534821 #3534825 #3217702 #3217699

I personally think these should be recorded by a female speaker instead.

gillux gillux January 22, 2016, edited January 22, 2016 January 22, 2016 at 10:50:33 AM UTC, edited January 22, 2016 at 10:52:40 AM UTC link Permalink

I removed the IPA for Shanghainese while revamping the transcriptions system on Tatoeba. I’m sorry you miss it. I removed it because it wasn’t trustworthy and I assumed very few people understand IPA. I can restore it if you want.

gillux gillux January 21, 2016 January 21, 2016 at 6:39:27 PM UTC link Permalink

This behaviour was actually intended, but now I understand it was probably a mistake. I’ll revert it. In the meantime, I think you can use Chrome to select the text and you’ll get the furigana copied as well.

gillux gillux January 15, 2016 January 15, 2016 at 4:37:52 PM UTC link Permalink

And, well, I think Tatoeba is not a book, so, yes, it won’t work if you’re trying to use it like a book.

gillux gillux January 15, 2016 January 15, 2016 at 4:34:59 PM UTC link Permalink

Note that the order of one’ sentences used to be the way you want: chronological. At some point, it has been changed to the current order (reverse chronological), likely because someone requested it.

I understand your point, but I think it also makes sense the way it is now, to display the newest sentences first. It makes that page dynamic and I think newest sentences is the information many contributors are primary looking for. What new sentences X contributed? Anything new to translate from him/her? is a question I often ask myself. I remember before the order changed, I needed to first hit the first page and then click on the last page to see the newest sentences. Not only was it bothering, but then I started to remember by heart the first sentences just because I was always going through the first page first. Which is not what I wanted. That being said, there was no advanced search at that time. We can now use it to get someone’ sentences in whatever order.

gillux gillux December 17, 2015 December 17, 2015 at 9:19:51 AM UTC link Permalink

Problem solved, thank you for reporting. Sentences are reindexed every 20 minutes or so.

gillux gillux December 14, 2015 December 14, 2015 at 5:42:45 PM UTC link Permalink

I see. I’d like to generate furi with empty brackets after English words, like it is the case at the moment, so that the user needs to either fill them or remove them. This should work for other languages than English. However, it’s a bit harder than is sounds.

Currently, the validation rule is of the form “require furigana on anything except kana and punctuation”. Since I can’t think about all the classes of characters that should have furigana, I wanted to start with a strict rule so that we can soften it as we find exceptions. But if we allow foreign words, instead we need to explicitly list all the characters that require furigana, and the rule becomes “require furigana on kanji, numbers” and what else? I can’t think of everything. What about percent and other math symbols?

gillux gillux December 14, 2015 December 14, 2015 at 12:39:47 PM UTC link Permalink

Because there is no such thing.

gillux gillux December 12, 2015, edited December 12, 2015 December 12, 2015 at 10:41:23 AM UTC, edited December 12, 2015 at 7:32:28 PM UTC link Permalink

> since people often search for sentences to make sure they don't exist before contributing new ones

I know you’re always doing this, but I don’t think it is true for the majority of people using Tatoeba (including people without an account).

gillux gillux December 12, 2015 December 12, 2015 at 10:38:35 AM UTC link Permalink

DostKaplan has a point though. I think people use interrogation marks in the search function more often intending that character than a wildcard. However it’s not possible to search any punctuation character in the first place (for instance you can’t search for commas). I personally disagree with the previous suggestions (I find them too intrusive). It’s not a easy problem, but maybe we could at least provide a more verbose message than “No results found for: <query>”. Like some hints about common pitfalls that prevent results from being returned: check selected language, wildcards, diacritics…

gillux gillux December 8, 2015, edited December 8, 2015 December 8, 2015 at 2:24:33 AM UTC, edited December 8, 2015 at 2:26:34 AM UTC link Permalink

Users of the DuckDuckGo search engine can now use the !tato bang to perform a search on Tatoeba. For instance, looking for “!tato question” on DuckDuckGo will search the word “question” on Tatoeba (in all languages). It’s just a simple redirection.

gillux gillux December 8, 2015 December 8, 2015 at 2:12:37 AM UTC link Permalink

Thank you for your feedback.

> Since this behavior doesn't happen with the Japanese comma, full stop, and quotation brackets (、 and 。 and 「」), I assume they are in one way or another categorized as exceptions. I would in that case add several other reading signs, such as ( ) ? ! to the same list.

Yes, you guessed right. I expected some bugs like furigana expanding leftward since the implementation isn’t great. The root of this problem is that bracket syntax. It’s easy to edit for humans, but hard to parse for computers, because it’s not clear which characters the furigana belongs to. Furiganas are actually internally stored using the computer-friendly syntax [漢字|かんじ], which is why autogenerated furiganas do not expand oddly. You can directly input furigana using that syntax if you want to work around expansion bugs, but of course the goal is not to have to use it.

I don’t know if we should enforce furigana over every word that is not Japanese. I’m tempted to say that we should, but I’d like to have the opinion of Japanese contributors.

About 8才, as you said cases like 10{}分{ふん} may be misleading. Since さい expanded over the whole 8才 is easier to spot, I think I’ll keep things the way they are now.

gillux gillux December 3, 2015 December 3, 2015 at 6:49:15 AM UTC link Permalink

Thank you for reporting this problem, it’s now solved.

gillux gillux November 29, 2015 November 29, 2015 at 3:49:06 AM UTC link Permalink

Yes, this is a known bug.

gillux gillux November 28, 2015 November 28, 2015 at 6:05:13 PM UTC link Permalink

> Why cannot the automatic transcription of Japanese be edited when adding translation?

It’s not available on tatoeba.org yet, but we’re about to make it available soon. For the moment, you need to use http://dev.tatoeba.org/ to test that feature.

> By the way, is it legal to add a nonstandard dialect translation?

Yes. You may left a comment on the sentence page to give more details about that.

gillux gillux November 28, 2015 November 28, 2015 at 4:00:41 PM UTC link Permalink

As Wezel said, only Japanese and Chinese have editable transcriptions so far. Of course, it’s just a starting point, and the whole point is to add more transcriptions in the future. See also this wiki article about adding new transcriptions: http://en.wiki.tatoeba.org/arti...ption-request.