menu
Tatoeba
language
Luo käyttäjätili Kirjaudu sisään
language Suomi
menu
Tatoeba

chevron_right Luo käyttäjätili

chevron_right Kirjaudu sisään

Selaa

chevron_right Näytä satunnainen lause

chevron_right Selaa kielen mukaan

chevron_right Selaa listan mukaan

chevron_right Selaa tunnisteen mukaan

chevron_right Selaa äänitteitä

Yhteisö

chevron_right Seinä

chevron_right Luettelo kaikista jäsenistä

chevron_right Jäsenten kielet

chevron_right Äidinkieliset puhujat

search
clear
swap_horiz
search
tommy_san {{ icon }} keyboard_arrow_right

Profiili

keyboard_arrow_right

Lauseet

keyboard_arrow_right

Sanasto

keyboard_arrow_right

Arvostelut

keyboard_arrow_right

Listat

keyboard_arrow_right

Suosikit

keyboard_arrow_right

Kommentit

keyboard_arrow_right

Käyttäjän tommy_san lauseiden kommentit

keyboard_arrow_right

Seinäviestit

keyboard_arrow_right

Lokit

keyboard_arrow_right

Äänitteet

keyboard_arrow_right

Transkriptiot

translate

Käännä käyttäjän tommy_san lauseita

Käyttäjän tommy_san viestit seinällä (yhteensä 320)

tommy_san tommy_san 7. helmikuuta 2016 7. helmikuuta 2016 klo 1.39.42 UTC link Ikilinkki

Thanks for the hard work, gillux! Let me make some additional remarks.

First, you need to check "Always show transcriptions and alternative scripts" on the settings page to see machine-generated furigana. Note that these transcriptions with a warning sign are not always correct. Transcriptions without a warning sign have been added by human contributors and are much more likely to be correct. These manual transcriptions are also found on the downloads page. I plan to provide furigana for all the sentences I've written and proofread.

As gillux says, almost all furigana are now associated individually with each kanji, but some of the kanji compounds called 熟字訓 are exceptions. For example, there's a word 明日 (ashita, あした) at the beginning of gillux's example above. It's not that 明 reads "ashi" and 日 reads "ta", or 明 "a" and 日 "shita", so the three hiragana are placed evenly above two kanji. On the other hand, when one or more of the kanji are read the normal way, the furigana is divided as in normal compounds. For example, the reading of the word 時計 (tokei, とけい) is special because 時 doesn't have the reading "to". However, since 計 does normally read "kei", the furigana と is placed on top of 時 and けい on top of 計.

I think this new system will be most useful when you're looking for sentences where a specific kanji is read in a specific way. If you're interested in doing this kind of search on the website, reply to this message to let developers know.

tommy_san tommy_san 2. helmikuuta 2016 2. helmikuuta 2016 klo 2.12.38 UTC link Ikilinkki

> I had discussed, a while ago, the case of the unadopted Japanese sentences and asked what if we simply delete them? The answer I got was basically that they are not harmful to the point that they should be deleted. Therefore in the case of Japanese, we will keep them.

There are actually some (though not many) sentences that I find plainly wrong or clearly unnatural, and thus harmful. I rate them "not OK" and "unsure" (even though I'm not unsure about anything) respectively to warn other users. However, most people don't see my ratings, so these sentences keep getting translated, especially often by new members.

If the community thinks it's better for me to delete these sentences, I can do so. In that case, you'd need to excuse me for accidentally deleting sentences that are correct in some variety of Japanese I'm not familiar with, or even ones that are correct in standard Japanese that include a word or phrase I don't know. You'd also need to excuse me for deleting sentences that could be turned into good example sentences with some changes. I don't have the time or ability to improve them and make sure the new sentences match all the translations, and there are many sentences that, in my opinion, wouldn't make good standalone example sentences anyway.

By the way, I think it's really important for us to tell new members what to translate and what not to translate. Since we have both good and bad sentences, they should translate only when they're sure it's a good sentence. If they cannot judge the quality of sentences themselves (which is the case for many non-native speakers), it's better to choose sentences owned or tagged/marked OK by a self-identified native speaker. Whenever I notice, I tell this to members who translate bad sentences, but it's something every contributor should keep in mind.

I also wonder if we could develop a set of good sentences that contributors of every language could consider translating. The set would surely include sentences like "Hello" and "Thank you", but it doesn't have to be phrasebook-like. It could include any sentence that you find good and real and makes good sense out of context (such as "You made the mistake on purpose, didn't you?" and "Does this dress make me look fat?"). It's not that all contributors should translate from this set, but if they don't have a particular preference, it might be better for them (especially contributors of sentences with few sentences) to translate sentences from such a set than to simply translate recent or random sentences, which are often not very good.

tommy_san tommy_san 19. tammikuuta 2016 19. tammikuuta 2016 klo 3.43.35 UTC link Ikilinkki

Something seems to be wrong with Horus here. He deleted #4843034 and #4843035, but forgot to unlink them from #4843029, so the log of #4843029 looks strange.

tommy_san tommy_san 11. tammikuuta 2016 11. tammikuuta 2016 klo 4.03.35 UTC link Ikilinkki

> At the moment I don't know if anyone really needs to be able to see others lists on the sentence's page

Lists could be really useful to share users' opinions on sentences if we could have non-collaborative lists shown to other users on sentence pages. For example, if you aren't satisfied with the current "collections" feature with three rating options, you could make lists to rate sentences according to your rules. If you want to show some characteristics of specific sentences but hesitate to use tags because it's not very objective, you could make lists to show your opinions to other users. Using lists, I think we could achieve something similar to your plan of more flexible "collections" (https://tatoeba.org/wall/show_m...essage_23892).

> A user who creates a list that is "unlisted" surely does not want their list to be found easily by others.

You may be right if you stop displaying other members' lists on sentence pages. What I meant is that I'd like to have the possibility to make a list listed while not shown to everyone on sentence pages.

tommy_san tommy_san 8. tammikuuta 2016 8. tammikuuta 2016 klo 1.23.47 UTC link Ikilinkki

I'd suggest not displaying the existing "collaborative" lists on sentence pages by default, for the same reason as CK (http://tatoeba.org/wall/show_me...essage_25185).

I also wonder if it's necessary to remove all the non-"public" lists from the index of lists and the page with the lists of each user. I sometimes take a look at lists of Japanese sentences that people make for their personal use to find out what kind of sentences they are interested in.

tommy_san tommy_san 5. tammikuuta 2016 5. tammikuuta 2016 klo 2.53.00 UTC link Ikilinkki

The website should be as simple as possible so that new members can easily get used to it, so if there's not really a need for the option, I'd prefer not to implement it. If someone wanted to make such a confidential list of sentences, they could simply work offline.

In my opinion, we need three checkboxes to let the creator of the list choose whether to make the list visible to others on sentence pages, whether to let others add sentences to the list, and whether to let those besides the creator of the list and the one who added the sentences to remove them from the list.

It would also be nice if we could know who added each sentence to (and who removed each sentence from) a list.

tommy_san tommy_san 4. tammikuuta 2016 4. tammikuuta 2016 klo 15.08.51 UTC link Ikilinkki

Has there been a request for the option to make a list completely inaccessible to others?

tommy_san tommy_san 4. tammikuuta 2016 4. tammikuuta 2016 klo 9.32.57 UTC link Ikilinkki

Does it mean that if I want to make a list of sentences I want to show someone, I need to make it visible to everyone on each sentence page?

tommy_san tommy_san 31. joulukuuta 2015, muokattu 31. joulukuuta 2015 31. joulukuuta 2015 klo 3.47.17 UTC, muokattu 31. joulukuuta 2015 klo 7.28.38 UTC link Ikilinkki

Hello. Thank you for using our data.

> I'm not sure how I should properly attribute the work done by the Tatoeba Project.

Take a look at our terms of use: https://tatoeba.org/terms_of_use.
I think the best way is to include links to each sentence page (for example https://tatoeba.org/sentences/show/4851).

> Also, I'm considering to include audio in the sentences that have it. What's the proper way to do it?

I'd suggest linking to the profile page of the member who contributed each audio file. So far, all the Japanese audio files have been contributed by Yomi-san (https://tatoeba.org/user/profile/yomi). If you find in the future Japanese sentences with audio that are not in the list https://tatoeba.org/sentences_lists/show/4023/, that means they have been recorded by someone else. In that case, take a look at this page: http://bit.ly/tatoebavoices.

In addition, you should ideally write somewhere in the website that all our (written) sentences are licensed under CC BY 2.0 FR and Yomi-san's recordings are licensed under CC BY-NC 4.0.

I'm not perfectly sure, but I guess you can link directly to audio files on the Tatoeba server. (Someone correct me if I'm wrong.)

Since there are tens of thousands of bad sentences here, I'd advise you to use only the sentences written or proofread by native speakers. See http://bit.ly/nativespeakers for the list of self-identified native speakers together with the "user skill" data on the downloads page. (Note, however, that many Japanese sentences owned by native speakers are actually stilted, if not wrong.) The data of sentence ratings that you find at the bottom of the downloads page would also be helpful. You'd better avoid using any sentence with the rating 0 or -1.

tommy_san tommy_san 15. joulukuuta 2015 15. joulukuuta 2015 klo 0.35.31 UTC link Ikilinkki

Do you need to require anything at all? Wouldn't it be enough to show an alert message like "Are you sure you don't want furigana on the characters ...?" and just leave these characters without furigana if the user wants it that way? If you'll make a special page for corpus maintainers that lists such sentences, we'll be able to easily correct the sentences that actually lack necessary furigana.

tommy_san tommy_san 8. joulukuuta 2015, muokattu 8. joulukuuta 2015 8. joulukuuta 2015 klo 4.06.47 UTC, muokattu 8. joulukuuta 2015 klo 14.39.57 UTC link Ikilinkki

> I don’t know if we should enforce furigana over every word that is not Japanese.

I think that depends on how we pronounce it. When I see the word "Twitter" in a Japanese text, I pronounce it as ツイッター, and you should do so, too, otherwise you might not be understood, so we definitely need a furigana here. On the other hand, I'd rather pronounce the English words in #236357 as in English, so I don't feel like adding furigana to them.

Something seems to be wrong with this sentence, by the way. I get the error "The provided sentence differs from the original one near “と”." when I try to edit the furigana. I guess it's because of the space before "heir". See also #4720072.

tommy_san tommy_san 22. marraskuuta 2015, muokattu 22. marraskuuta 2015 22. marraskuuta 2015 klo 23.09.37 UTC, muokattu 22. marraskuuta 2015 klo 23.51.20 UTC link Ikilinkki

When I visited the site yesterday, I didn't notice the X mark next to the warning message that auto-generated transcriptions might be wrong. A text like "Don't show again" might be more user-friendly.

tommy_san tommy_san 22. marraskuuta 2015 22. marraskuuta 2015 klo 13.01.32 UTC link Ikilinkki

1. It would be nice if there was a button to verify a transcription with a single click.

2. Since the machine-gendrated readings of Japanese numerals are almost always wrong, you may as well remove them. It's quite troublesome to turn "3{さん}0{ぜろ}分{ふん}" into "30分{さんじゅっぷん}".

3. Chinese automatic transcriptions could be improved a little.
For example, "fǎyǔ zhōng ,“soleil” shì tàiyáng de yìsi 。" should be turned into "Fǎyǔ zhōng, "soleil" shì tàiyáng de yìsi."

tommy_san tommy_san 22. marraskuuta 2015 22. marraskuuta 2015 klo 11.28.08 UTC link Ikilinkki

When someone writes my username in a comment to my sentence or a sentence where I also posted a comment, I get two email notifications. When someone mentions me there twice, I even get one more notification. This needs to be changed.

tommy_san tommy_san 28. lokakuuta 2015 28. lokakuuta 2015 klo 9.15.14 UTC link Ikilinkki

+1

If thése dìacrítics aren't úsed when matúre nátive spéakers wríte the lánguage to commúnicàte with each óther, you shóuldn't úse them hére.

tommy_san tommy_san 2. lokakuuta 2015 2. lokakuuta 2015 klo 14.30.03 UTC link Ikilinkki

プロの声優さんではありませんが、趣味で声の活動をされている方のようです。

tommy_san tommy_san 27. syyskuuta 2015 27. syyskuuta 2015 klo 0.30.52 UTC link Ikilinkki

I thought CK was talking about cases where a sentence is neither incorrect nor obviously unnatural, but yet not really good either. In such cases, it is often not very easy to come up with a better alternative, at least for me.

This is actually related to the discussion about the rating system.
https://tatoeba.org/wall/show_m...#message_24217
I think it's not enough to have only three options of "OK", "not OK" and "unsure" (which in fact means for many users "I'm sure this sounds unnatural"), so I've been suggesting adding at least one more option between "OK" and the current "unsure".

tommy_san tommy_san 26. syyskuuta 2015 26. syyskuuta 2015 klo 2.53.36 UTC link Ikilinkki

When I see someone linked two sentences, I tend to assume s/he considers both sentences to be good, so if one or both of the sentences aren't good enough, I'd rather you just left them unlinked. (To tell the truth, I don't really like it that you often link less unnatural Japanese sentences to English ones.)

tommy_san tommy_san 26. syyskuuta 2015 26. syyskuuta 2015 klo 2.41.15 UTC link Ikilinkki

> I think, we should distinguish two types of learners

You're right. I was thinking of this comment by Pfirsichbaeumchen that we play the role of teachers rather than students on Tatoeba (https://tatoeba.org/sentences/s...mment-428296). I hope the primary motivation of most contributors is to share their knowledge with the rest of the world. If the primary interest was to learn (to get taught and corrected), they usually wouldn't get what they expect.

> As for the latter, I don't really think that lack of those tools and info would stop much of them.

You're probably right here, too, considering for example Korean sentences on Tatoeba, which are said to be of horrible quality even though there's no transcriptions. Actually, I guess most of the bad Japanese sentences and bad translations of Japanese sentences are added by those who know some kanji and overestimate themselves.


(Let me write everything here because I don't want to mess up the Wall by posting too many replies.)

I think the classification I wrote here (https://tatoeba.org/wall/show_m...message_21480) would be useful for the current discussion.

(Original) Он очень похож на своего отца.
(1) On ochen' pohozh na svoego otca.
(2) ohn OH-cheen' pah-KHOZH nuh svuh-ee-VOH aht-TSAH.
(3) Он о́чень похо́ж на своего́ отца́.
(4) [on ˈot͡ɕɪnʲ pɐˈxoʐ nə svəjɪˈvo ɐtˈt͡sa]

I agree with oyd11 that when more than one script is used (more or less) officially, we should display transliterations of the type (1). Members should be allowed to contribute using the script they prefer.

I'm not sure if it's a good idea to display Romanization like this for any language that doesn't use the Latin alphabet. For example, I feel this transliteration for the Russian sentence isn't very useful to anyone, since no one would be able to read it properly unless they know Russian. If we decide to display transliterations for some languages and not for some others, what would be the criterion to divide them?

(2) is actually much more useful, but it's not suitable for an international project.

As sharptoothed says, I pretty much like pronunciation aids like (3). They cannot be generated by a machine alone, which means it's worth providing them manually. I believe they belong to the kind of data we want to collect on Tatoeba. When the system for editable furigana (https://tatoeba.org/wall/show_m...message_22870) is completed, we could consider applying it to other languages as well.


I admit it's rather hard for me to understand that a competent speaker is not necessarily literate. We do want to welcome those who can speak and translate well enough even if they're not literate in the language, but it's not that easy since our project is based on written sentences (of spoken and written language). I'm not really sure to what extent transliterations would help them. Would you suggest, for example, that we should let Amharic speakers who don't know the Ge'ez script add sentences using the Latin alphabet?

tommy_san tommy_san 24. syyskuuta 2015, muokattu 25. syyskuuta 2015 24. syyskuuta 2015 klo 22.57.01 UTC, muokattu 25. syyskuuta 2015 klo 1.44.30 UTC link Ikilinkki

** Should we have transcriptions on tatoeba.org? **

There's a discussion going on on GitHub about transcriptions[1]. I open this thread to continue it because I think it deserves wider attention.

As I said before[2], I see tatoeba.org as a place to contribute and I think it shouldn't be too nice to language learners. It's competent translators (and example sentence writers) that we want to attract, not language learners. Of course, any translator is a learner at the same time, but not all learners are good translators.

We sometimes get bad translations of Japanese sentences by users who don't even know hiragana. This is obviously because we provide Romanization of Japanese sentences (which we've decided to do away with[3]). So what would happen if there were transcriptions in many more languages?

[1] https://github.com/Tatoeba/tatoeba2/issues/280
[2] https://tatoeba.org/wall/show_m...#message_21488
[3] https://tatoeba.org/wall/show_m...#message_22899