menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Wall (6,753 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages subdirectory_arrow_right

janTuki

yesterday

subdirectory_arrow_right

deyta

yesterday

subdirectory_arrow_right

janTuki

yesterday

feedback

deyta

yesterday

subdirectory_arrow_right

Nuel

2 days ago

subdirectory_arrow_right

Cangarejo

2 days ago

subdirectory_arrow_right

Nuel

2 days ago

subdirectory_arrow_right

Cangarejo

2 days ago

subdirectory_arrow_right

Nuel

2 days ago

subdirectory_arrow_right

Cangarejo

2 days ago

Scott Scott May 15, 2010 May 15, 2010 at 11:43:59 PM UTC link Permalink

Is there anything that indicates whether a sentence is a source sentence or a translation?

I'm a bit confused as I'm never sure which sentence is the original and which is the translation.

{{vm.hiddenReplies[840] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul May 16, 2010 May 16, 2010 at 7:02:50 AM UTC link Permalink

To answer the original question, there are very many English translations of Japanese and there are lots of Japanese translations of English. The only thing you can say for certain of the Tanaka Corpus is that a Japanese student originally entered both sentences (although the English may well have been corrected since). So, there is the general hope that the Japanese is quite likely to be accurate.

{{vm.hiddenReplies[853] ? 'expand_more' : 'expand_less'}} hide replies show replies
Scott Scott May 16, 2010 May 16, 2010 at 1:39:42 PM UTC link Permalink

Thank you everybody for your answers. The point that I get is that currently sentenced are not flagged as source or translation. I understand that's due to the origin of the Tanaka corpus.

I think it would be a (major) improvement to be able to flag a sentence as a translation of another one. With the current system, I see potential edit wars happening where one sentence is translated and then the translation is used as a source to modify the original, etc.

It would clarify which sentence is authoritative and which sentence may need adjusting. I see it as especially crucial when dealing with idioms.

Take this sentence pair from Eijiro:

She'd be blinging [bling-blinging] tonight. : 彼女は今夜、派手な宝飾品を身に着けるつもりだろう。

You could very well translate the sentence back to English with something like: Tonight, she intends on wearing flashy jewellery. So you have to know (though of course you can guess) that the English is the original and that the Japanese is just an approximation. That information is also useful for the users of the sentences who might otherwise be confused.

{{vm.hiddenReplies[860] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul May 16, 2010 May 16, 2010 at 1:51:21 PM UTC link Permalink

> I think it would be a (major) improvement to be able
> to flag a sentence as a translation of another one.

I 100% agree with that. In fact an earlier variant of Tatoeba did include the ability to denote one sentence as 'the original', but that seems to have got lost in the midst of other changes.

JimBreen JimBreen May 16, 2010 May 16, 2010 at 2:11:18 AM UTC link Permalink

I have found that assuming that the English is a translation of the Japanese is a good starting place. On occasions the Japanese will be a translation of an English idiom.

Then you have to allow for typos, 変換ミス, etc. in the Japanese.

{{vm.hiddenReplies[841] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK May 16, 2010, edited October 25, 2019 May 16, 2010 at 2:17:14 AM UTC, edited October 25, 2019 at 8:05:48 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[842] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul May 16, 2010 May 16, 2010 at 2:27:09 AM UTC link Permalink

> Since the source of many, if not all, of the
> original Tanaka Corpus sentences was from students
> studying English, it would probably be better to
> assume that the English was the original and the
> Japanese is the translation.

Er, because Japanese students studying English are never set the task of writing English sentences? I'm not quite sure of your logic there.

{{vm.hiddenReplies[843] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK May 16, 2010, edited October 25, 2019 May 16, 2010 at 3:03:22 AM UTC, edited October 25, 2019 at 8:04:24 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[847] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul May 16, 2010 May 16, 2010 at 3:22:33 AM UTC link Permalink

Yeah, well there are just under 150,000 of the original Tanaka Corpus sentence pairs left. Picking out one, or one hundred, or ten thousand, sentences that are E->J (or J->E) doesn't really prove much.

JimBreen JimBreen May 16, 2010 May 16, 2010 at 2:38:42 AM UTC link Permalink

I agree with Paul. The overuse of 彼 and 彼女 comes (I think)
from a mindset that says: "gaijin use pronouns, so they have to be used in the Japanese too." I have tackled NSJs teaching Japanese on why they keep using 私/彼/あなた/etc. in examples when the result is quite unnatural Japanese. The reply is usually that the students are uncomfortable without them, and it's not actually incorrect, so ...." The result is 外人日本語 of course.

So, IMNSO, the Japanese part of the Tanaka set is more likely to be specially constructed 外人日本語 than actual translations from English, although the distinction can be a bit fine.

{{vm.hiddenReplies[844] ? 'expand_more' : 'expand_less'}} hide replies show replies
JimBreen JimBreen May 16, 2010 May 16, 2010 at 2:46:47 AM UTC link Permalink

I meant IMNSHO.

I'm working through a long list of broken English sentences. The current one is:

この会社はテレビを製造しています。
This company products the television.

Certainly not an EJ translation.

sysko sysko May 16, 2010 May 16, 2010 at 3:00:08 AM UTC link Permalink

for the others the creation date helps (but yep one day we will really handle the translated and translation difference)

TRANG TRANG May 16, 2010 May 16, 2010 at 10:32:55 PM UTC link Permalink

@Scott, since you are new here but seems to want to participate, I'll recommend you to read this:
http://blog.tatoeba.org/2010/02...n-tatoeba.html

It should clear some things up about the way Tatoeba works :) I know can be a bit confusing at first, but once you understand the concept behind it, it all makes sense.

blay_paul blay_paul May 16, 2010 May 16, 2010 at 1:00:11 PM UTC link Permalink

Export glitches.

See http://tatoeba.org/eng/sentences/show/237436

The English is ...

"Clean up in front of the shop first." "OK!" "Sprinkle some water out there too."

However by the time it arrives in my spreadsheet it seems to have lost the first pair of quotes.

Clean up in front of the shop first. "OK!" "Sprinkle some water out there too."

I think the csv format should be

"""Clean up in front of the shop first."" ""OK!"" ""Sprinkle some water out there too."""

Could you check it?

blay_paul blay_paul May 16, 2010 May 16, 2010 at 10:40:30 AM UTC link Permalink

World English Bible and 公協訳聖書

See http://c11n.net/ and http://ebible.org/

I think that special handling would be required to import them in full, but it would probably be no problem from a copyright point of view.

Because it is important to have the text in the right order and to be able to search through it by verse there should probably be a special interface implemented. It would probably also benefit from keeping a reserved number space as well.

{{vm.hiddenReplies[856] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul May 16, 2010 May 16, 2010 at 12:44:54 PM UTC link Permalink

> I think that special handling would be required to import
> them in full

Also Tatoeba is limited to (IIRC) a maximum of 500 characters for an example - some verses might exceed that.

{{vm.hiddenReplies[858] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko May 16, 2010 May 16, 2010 at 2:07:51 PM UTC link Permalink

For longer text, there will be a special section, that way one will be able to post longer paragraph, short novel, speach etc.

Scott Scott May 16, 2010 May 16, 2010 at 4:42:54 AM UTC link Permalink

I have another question. Hopefully it will prove to be less controversial than the last one.

Are there any Japanese sentences that need to be translated into English? I might translate some if I have some time but all the sentences that I find with the serial translation tool already have an English translation. Thanks.

{{vm.hiddenReplies[849] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko May 16, 2010 May 16, 2010 at 10:53:27 AM UTC link Permalink

Trang can create a list of Japanese sentence which have no English translation

{{vm.hiddenReplies[857] ? 'expand_more' : 'expand_less'}} hide replies show replies
Scott Scott May 16, 2010 May 16, 2010 at 1:47:45 PM UTC link Permalink

Thanks for the offer. I don't know how much time I can spare really so no need to make that list just for me.

It might a good idea to have a feature in the serial translation tool to find sentences in source language X with no translation in destination language Y. That would enable users to do mass translations of sentences.

{{vm.hiddenReplies[861] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko May 16, 2010 May 16, 2010 at 2:10:11 PM UTC link Permalink

Yep that's something we plan to do ... when we will have time (because in fact it's no so tricky to implement this in sql in an efficient way)

saeb saeb May 16, 2010 May 16, 2010 at 5:06:04 AM UTC link Permalink

How about this list http://tatoeba.org/eng/sentences_lists/show/24

not really sure how up to date it is...

blay_paul blay_paul May 16, 2010 May 16, 2010 at 7:31:06 AM UTC link Permalink

WWWJDIC is looking in particular for sentences that cover words or grammatical phrases that are poorly represented at present. (Tatoeba is not so restricted)

What I find works best is to read something and check interesting words in WWWJDIC. If you find one with no, or few, examples then that would be a good one to add. Note that were words have several senses in WWWJDIC then poorly represented senses are just as important as poorly represented words.

Scott Scott May 15, 2010 May 15, 2010 at 11:40:35 PM UTC link Permalink

Nice job on improving the website. It looks much prettier and more functional than the last time I visited.

blay_paul blay_paul May 15, 2010 May 15, 2010 at 10:16:20 PM UTC link Permalink

Quick fix idea.

This is a simple idea that should make things a little simpler. Each Japanese sentence should have zero, or one, set of index data. Spurious extra sets can be left over when duplicates are merged.

If you could generate the equivalent of the wwwjdic.csv file including only records with more than one set of index data the day _before_ you export the whole file (e.g. on Friday of each week) then me and Jim would have a chance to fix things before the weekly update.

blay_paul blay_paul May 12, 2010 May 12, 2010 at 10:10:01 AM UTC link Permalink

Quick request with regard to sentence edit behaviour.

There's one thing that bothers me with how the sentence editing function works - if you look at another tab (Firefox) the edit in progress disappears.

So what happens is that I'm half way through translating a sentence when I decide to check something, and when I go back it's all gone and I have to remember it from scratch.

{{vm.hiddenReplies[827] ? 'expand_more' : 'expand_less'}} hide replies show replies
Demetrius Demetrius May 12, 2010 May 12, 2010 at 2:39:48 PM UTC link Permalink

Use Opera. ;)

saeb saeb May 12, 2010 May 12, 2010 at 7:07:09 PM UTC link Permalink

I didn't notice it until you mentioned it now :D

CK CK May 12, 2010, edited October 25, 2019 May 12, 2010 at 4:10:31 PM UTC, edited October 25, 2019 at 8:06:11 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[829] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul May 12, 2010 May 12, 2010 at 5:32:38 PM UTC link Permalink

I'm not sure they need correcting as such. It's just a sign that a typical English bible uses 'him' more and a typical Japanese bible uses 'イエス' more. What you can do is note "[Bible, Psalms 166:68]" (or whatever) if it is an actual quote and you can work out where it's from.

{{vm.hiddenReplies[830] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK May 12, 2010, edited October 25, 2019 May 12, 2010 at 6:20:36 PM UTC, edited October 25, 2019 at 8:06:04 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[831] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul May 12, 2010 May 12, 2010 at 6:31:12 PM UTC link Permalink

> If you want to do something like "[Bible, Psalms
> 166:68]", then it would probably be faster to delete
> all the existing Bible sentences and find public domain
> version of the Bible in various languages and dump lots
> of those sentences into the database.

Using PD Bibles as sources sounds like a good idea to me, but the existing sentences have the advantage of already having index data so I wouldn't just delete them.

CK CK May 12, 2010, edited October 25, 2019 May 12, 2010 at 2:37:42 AM UTC, edited October 25, 2019 at 8:06:28 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[824] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul May 12, 2010 May 12, 2010 at 10:00:31 AM UTC link Permalink

> This Google search only gets "Tanaka Corpus" results.

Yeah, that's because it should be "いうところによれば"

> Doing similar searches might be an interesting
> approach to check the accuracy and/or naturalness
> of sentences in this database.

I think that's pretty much a standard approach here. Google has quite a few quirks that you need to know to get the best results, but overall it's pretty good.

blay_paul blay_paul May 11, 2010 May 11, 2010 at 11:04:43 AM UTC link Permalink

Could we have an official decision on using -1 in the meaning field to mean 'not for WWWJDIC' ?

*bump*

{{vm.hiddenReplies[816] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG May 11, 2010 May 11, 2010 at 11:39:44 PM UTC link Permalink

As I said in my email, I'm okay with it. It all depends on Jim :)