menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
tommy_san {{ icon }} keyboard_arrow_right

Profile

keyboard_arrow_right

Sentences

keyboard_arrow_right

Vocabulary

keyboard_arrow_right

Reviews

keyboard_arrow_right

Lists

keyboard_arrow_right

Favorites

keyboard_arrow_right

Comments

keyboard_arrow_right

Comments on tommy_san's sentences

keyboard_arrow_right

Wall messages

keyboard_arrow_right

Logs

keyboard_arrow_right

Audio

keyboard_arrow_right

Transcriptions

translate

Translate tommy_san's sentences

tommy_san's messages on the Wall (total 320)

tommy_san tommy_san February 7, 2016 February 7, 2016 at 1:39:42 AM UTC link Permalink

Thanks for the hard work, gillux! Let me make some additional remarks.

First, you need to check "Always show transcriptions and alternative scripts" on the settings page to see machine-generated furigana. Note that these transcriptions with a warning sign are not always correct. Transcriptions without a warning sign have been added by human contributors and are much more likely to be correct. These manual transcriptions are also found on the downloads page. I plan to provide furigana for all the sentences I've written and proofread.

As gillux says, almost all furigana are now associated individually with each kanji, but some of the kanji compounds called 熟字訓 are exceptions. For example, there's a word 明日 (ashita, あした) at the beginning of gillux's example above. It's not that 明 reads "ashi" and 日 reads "ta", or 明 "a" and 日 "shita", so the three hiragana are placed evenly above two kanji. On the other hand, when one or more of the kanji are read the normal way, the furigana is divided as in normal compounds. For example, the reading of the word 時計 (tokei, とけい) is special because 時 doesn't have the reading "to". However, since 計 does normally read "kei", the furigana と is placed on top of 時 and けい on top of 計.

I think this new system will be most useful when you're looking for sentences where a specific kanji is read in a specific way. If you're interested in doing this kind of search on the website, reply to this message to let developers know.

tommy_san tommy_san February 2, 2016 February 2, 2016 at 2:12:38 AM UTC link Permalink

> I had discussed, a while ago, the case of the unadopted Japanese sentences and asked what if we simply delete them? The answer I got was basically that they are not harmful to the point that they should be deleted. Therefore in the case of Japanese, we will keep them.

There are actually some (though not many) sentences that I find plainly wrong or clearly unnatural, and thus harmful. I rate them "not OK" and "unsure" (even though I'm not unsure about anything) respectively to warn other users. However, most people don't see my ratings, so these sentences keep getting translated, especially often by new members.

If the community thinks it's better for me to delete these sentences, I can do so. In that case, you'd need to excuse me for accidentally deleting sentences that are correct in some variety of Japanese I'm not familiar with, or even ones that are correct in standard Japanese that include a word or phrase I don't know. You'd also need to excuse me for deleting sentences that could be turned into good example sentences with some changes. I don't have the time or ability to improve them and make sure the new sentences match all the translations, and there are many sentences that, in my opinion, wouldn't make good standalone example sentences anyway.

By the way, I think it's really important for us to tell new members what to translate and what not to translate. Since we have both good and bad sentences, they should translate only when they're sure it's a good sentence. If they cannot judge the quality of sentences themselves (which is the case for many non-native speakers), it's better to choose sentences owned or tagged/marked OK by a self-identified native speaker. Whenever I notice, I tell this to members who translate bad sentences, but it's something every contributor should keep in mind.

I also wonder if we could develop a set of good sentences that contributors of every language could consider translating. The set would surely include sentences like "Hello" and "Thank you", but it doesn't have to be phrasebook-like. It could include any sentence that you find good and real and makes good sense out of context (such as "You made the mistake on purpose, didn't you?" and "Does this dress make me look fat?"). It's not that all contributors should translate from this set, but if they don't have a particular preference, it might be better for them (especially contributors of sentences with few sentences) to translate sentences from such a set than to simply translate recent or random sentences, which are often not very good.

tommy_san tommy_san January 19, 2016 January 19, 2016 at 3:43:35 AM UTC link Permalink

Something seems to be wrong with Horus here. He deleted #4843034 and #4843035, but forgot to unlink them from #4843029, so the log of #4843029 looks strange.

tommy_san tommy_san January 11, 2016 January 11, 2016 at 4:03:35 AM UTC link Permalink

> At the moment I don't know if anyone really needs to be able to see others lists on the sentence's page

Lists could be really useful to share users' opinions on sentences if we could have non-collaborative lists shown to other users on sentence pages. For example, if you aren't satisfied with the current "collections" feature with three rating options, you could make lists to rate sentences according to your rules. If you want to show some characteristics of specific sentences but hesitate to use tags because it's not very objective, you could make lists to show your opinions to other users. Using lists, I think we could achieve something similar to your plan of more flexible "collections" (https://tatoeba.org/wall/show_m...essage_23892).

> A user who creates a list that is "unlisted" surely does not want their list to be found easily by others.

You may be right if you stop displaying other members' lists on sentence pages. What I meant is that I'd like to have the possibility to make a list listed while not shown to everyone on sentence pages.

tommy_san tommy_san January 8, 2016 January 8, 2016 at 1:23:47 AM UTC link Permalink

I'd suggest not displaying the existing "collaborative" lists on sentence pages by default, for the same reason as CK (http://tatoeba.org/wall/show_me...essage_25185).

I also wonder if it's necessary to remove all the non-"public" lists from the index of lists and the page with the lists of each user. I sometimes take a look at lists of Japanese sentences that people make for their personal use to find out what kind of sentences they are interested in.

tommy_san tommy_san January 5, 2016 January 5, 2016 at 2:53:00 AM UTC link Permalink

The website should be as simple as possible so that new members can easily get used to it, so if there's not really a need for the option, I'd prefer not to implement it. If someone wanted to make such a confidential list of sentences, they could simply work offline.

In my opinion, we need three checkboxes to let the creator of the list choose whether to make the list visible to others on sentence pages, whether to let others add sentences to the list, and whether to let those besides the creator of the list and the one who added the sentences to remove them from the list.

It would also be nice if we could know who added each sentence to (and who removed each sentence from) a list.

tommy_san tommy_san January 4, 2016 January 4, 2016 at 3:08:51 PM UTC link Permalink

Has there been a request for the option to make a list completely inaccessible to others?

tommy_san tommy_san January 4, 2016 January 4, 2016 at 9:32:57 AM UTC link Permalink

Does it mean that if I want to make a list of sentences I want to show someone, I need to make it visible to everyone on each sentence page?

tommy_san tommy_san December 31, 2015, edited December 31, 2015 December 31, 2015 at 3:47:17 AM UTC, edited December 31, 2015 at 7:28:38 AM UTC link Permalink

Hello. Thank you for using our data.

> I'm not sure how I should properly attribute the work done by the Tatoeba Project.

Take a look at our terms of use: https://tatoeba.org/terms_of_use.
I think the best way is to include links to each sentence page (for example https://tatoeba.org/sentences/show/4851).

> Also, I'm considering to include audio in the sentences that have it. What's the proper way to do it?

I'd suggest linking to the profile page of the member who contributed each audio file. So far, all the Japanese audio files have been contributed by Yomi-san (https://tatoeba.org/user/profile/yomi). If you find in the future Japanese sentences with audio that are not in the list https://tatoeba.org/sentences_lists/show/4023/, that means they have been recorded by someone else. In that case, take a look at this page: http://bit.ly/tatoebavoices.

In addition, you should ideally write somewhere in the website that all our (written) sentences are licensed under CC BY 2.0 FR and Yomi-san's recordings are licensed under CC BY-NC 4.0.

I'm not perfectly sure, but I guess you can link directly to audio files on the Tatoeba server. (Someone correct me if I'm wrong.)

Since there are tens of thousands of bad sentences here, I'd advise you to use only the sentences written or proofread by native speakers. See http://bit.ly/nativespeakers for the list of self-identified native speakers together with the "user skill" data on the downloads page. (Note, however, that many Japanese sentences owned by native speakers are actually stilted, if not wrong.) The data of sentence ratings that you find at the bottom of the downloads page would also be helpful. You'd better avoid using any sentence with the rating 0 or -1.

tommy_san tommy_san December 15, 2015 December 15, 2015 at 12:35:31 AM UTC link Permalink

Do you need to require anything at all? Wouldn't it be enough to show an alert message like "Are you sure you don't want furigana on the characters ...?" and just leave these characters without furigana if the user wants it that way? If you'll make a special page for corpus maintainers that lists such sentences, we'll be able to easily correct the sentences that actually lack necessary furigana.

tommy_san tommy_san December 8, 2015, edited December 8, 2015 December 8, 2015 at 4:06:47 AM UTC, edited December 8, 2015 at 2:39:57 PM UTC link Permalink

> I don’t know if we should enforce furigana over every word that is not Japanese.

I think that depends on how we pronounce it. When I see the word "Twitter" in a Japanese text, I pronounce it as ツイッター, and you should do so, too, otherwise you might not be understood, so we definitely need a furigana here. On the other hand, I'd rather pronounce the English words in #236357 as in English, so I don't feel like adding furigana to them.

Something seems to be wrong with this sentence, by the way. I get the error "The provided sentence differs from the original one near “と”." when I try to edit the furigana. I guess it's because of the space before "heir". See also #4720072.

tommy_san tommy_san November 22, 2015, edited November 22, 2015 November 22, 2015 at 11:09:37 PM UTC, edited November 22, 2015 at 11:51:20 PM UTC link Permalink

When I visited the site yesterday, I didn't notice the X mark next to the warning message that auto-generated transcriptions might be wrong. A text like "Don't show again" might be more user-friendly.

tommy_san tommy_san November 22, 2015 November 22, 2015 at 1:01:32 PM UTC link Permalink

1. It would be nice if there was a button to verify a transcription with a single click.

2. Since the machine-gendrated readings of Japanese numerals are almost always wrong, you may as well remove them. It's quite troublesome to turn "3{さん}0{ぜろ}分{ふん}" into "30分{さんじゅっぷん}".

3. Chinese automatic transcriptions could be improved a little.
For example, "fǎyǔ zhōng ,“soleil” shì tàiyáng de yìsi 。" should be turned into "Fǎyǔ zhōng, "soleil" shì tàiyáng de yìsi."

tommy_san tommy_san November 22, 2015 November 22, 2015 at 11:28:08 AM UTC link Permalink

When someone writes my username in a comment to my sentence or a sentence where I also posted a comment, I get two email notifications. When someone mentions me there twice, I even get one more notification. This needs to be changed.

tommy_san tommy_san October 28, 2015 October 28, 2015 at 9:15:14 AM UTC link Permalink

+1

If thése dìacrítics aren't úsed when matúre nátive spéakers wríte the lánguage to commúnicàte with each óther, you shóuldn't úse them hére.

tommy_san tommy_san October 2, 2015 October 2, 2015 at 2:30:03 PM UTC link Permalink

プロの声優さんではありませんが、趣味で声の活動をされている方のようです。

tommy_san tommy_san September 27, 2015 September 27, 2015 at 12:30:52 AM UTC link Permalink

I thought CK was talking about cases where a sentence is neither incorrect nor obviously unnatural, but yet not really good either. In such cases, it is often not very easy to come up with a better alternative, at least for me.

This is actually related to the discussion about the rating system.
https://tatoeba.org/wall/show_m...#message_24217
I think it's not enough to have only three options of "OK", "not OK" and "unsure" (which in fact means for many users "I'm sure this sounds unnatural"), so I've been suggesting adding at least one more option between "OK" and the current "unsure".

tommy_san tommy_san September 26, 2015 September 26, 2015 at 2:53:36 AM UTC link Permalink

When I see someone linked two sentences, I tend to assume s/he considers both sentences to be good, so if one or both of the sentences aren't good enough, I'd rather you just left them unlinked. (To tell the truth, I don't really like it that you often link less unnatural Japanese sentences to English ones.)

tommy_san tommy_san September 26, 2015 September 26, 2015 at 2:41:15 AM UTC link Permalink

> I think, we should distinguish two types of learners

You're right. I was thinking of this comment by Pfirsichbaeumchen that we play the role of teachers rather than students on Tatoeba (https://tatoeba.org/sentences/s...mment-428296). I hope the primary motivation of most contributors is to share their knowledge with the rest of the world. If the primary interest was to learn (to get taught and corrected), they usually wouldn't get what they expect.

> As for the latter, I don't really think that lack of those tools and info would stop much of them.

You're probably right here, too, considering for example Korean sentences on Tatoeba, which are said to be of horrible quality even though there's no transcriptions. Actually, I guess most of the bad Japanese sentences and bad translations of Japanese sentences are added by those who know some kanji and overestimate themselves.


(Let me write everything here because I don't want to mess up the Wall by posting too many replies.)

I think the classification I wrote here (https://tatoeba.org/wall/show_m...message_21480) would be useful for the current discussion.

(Original) Он очень похож на своего отца.
(1) On ochen' pohozh na svoego otca.
(2) ohn OH-cheen' pah-KHOZH nuh svuh-ee-VOH aht-TSAH.
(3) Он о́чень похо́ж на своего́ отца́.
(4) [on ˈot͡ɕɪnʲ pɐˈxoʐ nə svəjɪˈvo ɐtˈt͡sa]

I agree with oyd11 that when more than one script is used (more or less) officially, we should display transliterations of the type (1). Members should be allowed to contribute using the script they prefer.

I'm not sure if it's a good idea to display Romanization like this for any language that doesn't use the Latin alphabet. For example, I feel this transliteration for the Russian sentence isn't very useful to anyone, since no one would be able to read it properly unless they know Russian. If we decide to display transliterations for some languages and not for some others, what would be the criterion to divide them?

(2) is actually much more useful, but it's not suitable for an international project.

As sharptoothed says, I pretty much like pronunciation aids like (3). They cannot be generated by a machine alone, which means it's worth providing them manually. I believe they belong to the kind of data we want to collect on Tatoeba. When the system for editable furigana (https://tatoeba.org/wall/show_m...message_22870) is completed, we could consider applying it to other languages as well.


I admit it's rather hard for me to understand that a competent speaker is not necessarily literate. We do want to welcome those who can speak and translate well enough even if they're not literate in the language, but it's not that easy since our project is based on written sentences (of spoken and written language). I'm not really sure to what extent transliterations would help them. Would you suggest, for example, that we should let Amharic speakers who don't know the Ge'ez script add sentences using the Latin alphabet?

tommy_san tommy_san September 24, 2015, edited September 25, 2015 September 24, 2015 at 10:57:01 PM UTC, edited September 25, 2015 at 1:44:30 AM UTC link Permalink

** Should we have transcriptions on tatoeba.org? **

There's a discussion going on on GitHub about transcriptions[1]. I open this thread to continue it because I think it deserves wider attention.

As I said before[2], I see tatoeba.org as a place to contribute and I think it shouldn't be too nice to language learners. It's competent translators (and example sentence writers) that we want to attract, not language learners. Of course, any translator is a learner at the same time, but not all learners are good translators.

We sometimes get bad translations of Japanese sentences by users who don't even know hiragana. This is obviously because we provide Romanization of Japanese sentences (which we've decided to do away with[3]). So what would happen if there were transcriptions in many more languages?

[1] https://github.com/Tatoeba/tatoeba2/issues/280
[2] https://tatoeba.org/wall/show_m...#message_21488
[3] https://tatoeba.org/wall/show_m...#message_22899