saeb saeb 2010年3月13日 2:38 2010年3月13日 2:38 link permalink

*shock* just figured out TRANG is a she, he he.

TRANG TRANG 2010年3月13日 12:15 2010年3月13日 12:15 link permalink

Ah, who told on me, that was supposed to be a secret.

saeb saeb 2010年3月13日 12:45 2010年3月13日 12:45 link permalink

lol, I never said this before but I actually stalked this website for quite a while before I finally decided to join, and I always imagined that you'd be like these programmers who like anime and have studied japanese for 5 yrs in their know..with a cool blog about every obsessive detail of their life...and know the whole shabang... :D

P.S. guys like that do really exist :D

saeb saeb 2010年3月13日 1:30 2010年3月13日 1:30 link permalink

There's a lot of english sentences that are grammatically correct but I don't think anyone will ever say them, use them, or even see them in any english know they're just "out of this world". What do you think we should do with these? Should we just ignore them for the moment, and focus on those that are totally wrong?

my take is, I'm gonna stay away from translating these and stop reporting them as wrong. I'm just hoping arabic natives can use sentences I'm translating to learn english.

what do you guys think? trang? sysko?

blay_paul blay_paul 2010年3月13日 8:53 2010年3月13日 8:53 link permalink

> There's a lot of english sentences that are grammatically
> correct but I don't think anyone will ever say them, use
> them, or even see them in any english media.

I think that it's more correct to say "any _current_ English media". The Tanaka corpus is old, and it used even older sources of sentences. Quite a few of them would not be out of place in books published before 1940, but are rather confusing to those of us in 2010.

I think those that are old-fashioned or highly idiomatic should be kept as demonstrating historical usage but should not be used as guides to writing English (or for translating into English). I think they are good candidates for an [Old-fashioned] tag or something. ;-)

Another problem is those that are written like dictionary entries (lots of 'one' usage) and those that are not really whole sentences. I think these are worth improving, as time permits, but are probably not a high priority for translation into other languages.

sysko sysko 2010年3月13日 16:27 2010年3月13日 16:27 link permalink

I agree to tag them in the future as "old fashioned" "40's english" "book-style" etc... rather than just "modernize"/"oralize" them

xtofu80 xtofu80 2010年3月12日 12:22 2010年3月12日 12:22 link permalink

I am not sure how promising this is, but there is a Japanese-German sentence database hosted by the University of Hiroshima (Katsumi Iwasaki). It seems to have been created in 2004, without major updates since then. Maybe there could be a collaboration with tatoeba, thus increasing the number of sentence pairs. Of course, I am not sure about whether they want to publish the corpus, I am especially unfamiliar with data policies in Japan.

Here are the links to the search engine, the data description and the researcher's website:

saeb saeb 2010年3月12日 23:07 2010年3月12日 23:07 link permalink

And there's one also for spanish, I think it's free:

Maybe there could be a collaboration with tatoeba with that too :)

sysko sysko 2010年3月13日 16:11 2010年3月13日 16:11 link permalink

for the spanish, yep I need to contact the guy for a long time, but hmmm never find the motivation to write a email :blush: I will try to do so, I promise

saeb saeb 2010年3月13日 17:10 2010年3月13日 17:10 link permalink

you can do it sysko! :)=)

blay_paul blay_paul 2010年3月11日 18:38 2010年3月11日 18:38 link permalink

Some more suggestions.

I know that time is limited, so I shall try to keep a high ratio of usefulness to time required to implement. ;-)

1. Add a prominent link from the Tatoeba Project home page to the Tatoeba Project Blog. Actually I think it's worth adding a "Links" item to the list of headings on the top of the page. Useful links would include popular dictionaries, language sites, and sites that host collections of sentences.

2. Wish list. Maybe best as a blog article? I think it would be nice to have an idea of what features are planned, how likely they are to be implemented and how soon. Users could comment on possible features and suggest new ones.

3. Active dictionary links. This would be a long term and high effort suggestion but I think it would be useful to have active linking available from words in example sentences. Some languages (Japanese, Chinese) would require more effort than others, but I think it would be well worth it in the long run.

TRANG TRANG 2010年3月12日 18:49 2010年3月12日 18:49 link permalink

1. Like sysko said, the top menu has reached its limits. Someone with a 20 characters username (which I think is the maximum length) and using the French interface... might actually not even "fit" up there if (s)he's a Linux user... Something we'll have to check.
What I can do for now though, is to have a link to the blog from the "What's new" (along with the Twitter link). And in the blog, there's a "Links" section which only has Tatoeba, but I can add other things.

2. You will have more hints on what we are working on in the next blog post. I can't be writing about everything we have in mind because there are just so many things. But I can at least mention what's planned for the next few weeks :)

3. This is actually not very easy to implement because each sentence itself is already a link, and clicking on it leads to the page where you can see the comments and the logs. But I agree that it would be definitely useful.

blay_paul blay_paul 2010年3月12日 20:18 2010年3月12日 20:18 link permalink

> This is actually not very easy to implement because each
> sentence itself is already a link, and clicking on it
> leads to the page where you can see the comments and the
> logs.

Yeah, I thought about that. What you could do, though, is implement the links in 'tooltip' style windows. For Japanese it could look rather like ...


click on 成る(する) to get the full dictionary entry in a separate window / tab.

sysko sysko 2010年3月11日 18:55 2010年3月11日 18:55 link permalink

1 - in french version and also regarding to ergonomic issue, 7 items is already a maximum numbers, but in the same time I agree it will be better to have the links in more visible place, but what if we add a wiki, "dialogs" and so ? So I think it need us to review what is needed, where, and to make is as much pratical as possible, I don't really the top menu to be over bloated (but who wants ^^)

2 - In a first time yep it can be a temporary solution waiting a wiki (after finishing all the "small issue" I makke it my 1st priority)

3 - For chinese, adso (which is definitely my swiss army knife for chinese) give th possibility to segment a chinese sentences into "words", at least consistant n-grams, so it would not be "so" difficult, and I'm sure such tool exist also for japanese

blay_paul blay_paul 2010年3月11日 20:06 2010年3月11日 20:06 link permalink

> I'm sure such tool exist also for japanese

It does, but it's not 100% accurate. In any case this is the primary reason for the existence of the 'index' data for Japanese sentences.

saeb saeb 2010年3月11日 20:32 2010年3月11日 20:32 link permalink

I think that commenting on English sentences that do not sound right is very inefficient since there's a load of them. so...I created a list where everyone can dump them into for native speakers to correct.

TRANG TRANG 2010年3月12日 18:11 2010年3月12日 18:11 link permalink

Yes, I think once a sentence has been corrected, it should be removed from the list. Otherwise you would end up with a very long list...

Anyway, this is a good initiative! The only problem is that this list won't have a lot of visibility (for now), contrary to posting a comment, because the latest comments are displayed on the homepage.

So until we have time to do something about it, what you can do, I guess, is to contact other members who are native English speakers, and ask them if they could help with those sentences.

saeb saeb 2010年3月13日 1:37 2010年3月13日 1:37 link permalink

true, will do both then...

I just hope it doesn't look like I'm hijacking the homepage :D

saeb saeb 2010年3月11日 21:11 2010年3月11日 21:11 link permalink

Any suggestions on what to do with sentences that have been corrected, commented on, etc.. shall I just remove them from the list to keep it clean? (I think everyone should just correct them directly and then remove them from the list...)

saeb saeb 2010年3月12日 0:48 2010年3月12日 0:48 link permalink

There is a formatting issue with Arabic.

Whenever I type a full stop at the end of a sentence it appears at the beginning.

Now, should I just let it appear at the beginning even though it would appear at the end if copied to a word processor?

or Should I put the full stop at the beginning on purpose so it appears at the end when viewed on tatoeba?

sysko sysko 2010年3月12日 9:06 2010年3月12日 9:06 link permalink

We used to have the issue but I still don't know why, unfortunately it's Tatoeba independant :( but the work around given in the message will fix the problem when it happens

saeb saeb 2010年3月12日 9:24 2010年3月12日 9:24 link permalink

will I need to add &#8207 every time I add sentence?

sysko sysko 2010年3月12日 9:26 2010年3月12日 9:26 link permalink

‏ (with the ; )for sentences which have problems

saeb saeb 2010年3月13日 16:58 2010年3月13日 16:58 link permalink

sysko I give up. It seems I'll have to do that for every sentence I enter...

sysko sysko 2010年3月13日 19:45 2010年3月13日 19:45 link permalink

no problem, I will add in my looong todo list "handle problem with right to left language"

saeb saeb 2010年3月12日 12:09 2010年3月12日 12:09 link permalink

merci bcp sysko :)

saeb saeb 2010年3月12日 0:51 2010年3月12日 0:51 link permalink

or better yet not bother at all with the full stop...

saeb saeb 2010年3月12日 0:53 2010年3月12日 0:53 link permalink

This also happens to all other symbols not followed by text :O

saeb saeb 2010年3月11日 18:41 2010年3月11日 18:41 link permalink

I know this might be a stupid question, but what's the percentage that appears to the top right corner of a lot of sentences? err..I mean what is it supposed to indicate?

blay_paul blay_paul 2010年3月11日 20:08 2010年3月11日 20:08 link permalink

I believe it is an indication of how good a match the sentence is for the search used to find it. I have no idea how it is calculated (particularly for searches on Japanese text ;-)

saeb saeb 2010年3月11日 20:20 2010年3月11日 20:20 link permalink

thanks :)

xtofu80 xtofu80 2010年3月9日 13:43 2010年3月9日 13:43 link permalink

I think we should remove requests for error fixes after the error is fixed, because the comments rather confuse people who read the corrected sentences and the outdated comments.

sysko sysko 2010年3月9日 14:05 2010年3月9日 14:05 link permalink

Yep we totally agree with you, that's why we've included possibility to remove comments on sentences, maybe we will in the future permit people to view all their own comments in order to delete more easily

contour contour 2010年3月21日 23:10 2010年3月21日 23:10 link permalink

I don't like just deleting comments like that, it removes history and obscures the workings of the site.
You're left with changes that were directly caused by comments that are no longer there.

I'd rather have the ability to archive the comments by marking them as no longer relevant, and a message like "This sentence has 3 archived comments, click here to show them". Then they wouldn't clutter up the page, but you could still unhide them if you wanted to.

sysko sysko 2010年3月21日 23:14 2010年3月21日 23:14 link permalink

maybe we can imagine a system like wikipedia, comments which are about correction or so will be in a "discussion" page or something like that, and only comments which bring further information about a sentence (for example if there's some important grammar point or if this is a famous quote etc...) will be directly visible from the sentence page

JimBreen JimBreen 2010年3月22日 23:57 2010年3月22日 23:57 link permalink

That's a very good idea. We are planning something like that for the online edit system for JMdict/EDICT.

MUIRIEL MUIRIEL 2010年3月10日 15:31 2010年3月10日 15:31 link permalink

don't you think, it's more confusing, when there are comments without context, because someone deleted his comment and someone else didn't delete his comment referring to the first one?!

xtofu80 xtofu80 2010年3月10日 16:14 2010年3月10日 16:14 link permalink

I would agree to Muriel. I suppose the cleanest way to do this is to allow the sentence owner to delete comments. Thus, after correcting a mistake, he could remove all the outdated comments.

TRANG TRANG 2010年3月10日 23:16 2010年3月10日 23:16 link permalink

Same as Muiriel, I wouldn't feel comfortable letting the sentence owner deleting the comments. Having a moderator or admin delete your comment can be tolerable because they are people who (are supposed to) know the rules, who know how things work, and wouldn't be deleting things that should have been kept. But a simple user, even if it wasn't a bad intention, can end up deleting important things.

Anyway, one of the things we have thought of is to have some sort of "public notes" associated to each sentence. These notes can be edited by everyone and would only contain essential information for the learners.

People can then say whatever they want in the comments. And whenever there's something worth noting, then someone can write in on the notes.
When you will browse a sentence, the comments will not appear below it anymore. Instead, the public notes will be displayed.

That's the basic idea but don't expect to have that implemented anytime soon though. I think for at least six months, the members will have to organize themselves as they can with the comments...

As far as I'm concerned, I'm fine with deleting comments like "There's an 's' missing" when the mistake has been corrected. But well, I'm certainly not going to hunt for those... It wouldn't not a very productive way to spend my time ^^; I think everyone can be self-responsible and take care of deleting their own comments when it is appropriate to do so. You can always try to write private messages to people, to ask them to delete a certain comment they posted.

xtofu80 xtofu80 2010年3月11日 12:34 2010年3月11日 12:34 link permalink

Comments on grammar and expression might be helpful, but only if the original sentence was visible at that time.
Comments about a sentence which has already be changed are confusing. So either there should be a "sentence stamp" showing the sentence at the time the comment was made, or we have to include the sentence in our comment if we want to write something "noteworthy" which should be kept for a longer time.

MUIRIEL MUIRIEL 2010年3月10日 16:59 2010年3月10日 16:59 link permalink

that would allow censorship :S.

lilygilder lilygilder 2010年3月10日 21:04 2010年3月10日 21:04 link permalink

Most comments about wrong spelling do only take up space, so it would look "cleaner" if they were removed after the sentence is corrected. But I think comments about expressions or grammar can still be helpful to users even after the sentence is corrected.

TRANG TRANG 2010年3月9日 17:29 2010年3月9日 17:29 link permalink

You have time and patience, you can always browse through the whole list of comments and send me a private message to indicate the comments that you feel should be deleted...

Or at least, you can search for your own outdated comments and delete them.

blay_paul blay_paul 2010年3月11日 12:59 2010年3月11日 12:59 link permalink

I don't think the comment stuff is such a big deal. The sentence log shows previous versions anyway and I don't think it's that difficult to work out that old comments may no longer apply.

saeb saeb 2010年3月10日 18:21 2010年3月10日 18:21 link permalink

Is there a way to extract sentences of a certain language? any plans in the works? How about exporting to anki or iKnow...or importing?

saeb saeb 2010年3月10日 19:04 2010年3月10日 19:04 link permalink

How about deleting sentences or translations (at least your own sentences)?

I've got this problem where I add a translation and three copies of the very same translation get added, happened twice. Don't get it. What should I do now?

TRANG TRANG 2010年3月10日 22:23 2010年3月10日 22:23 link permalink

Don't worry about duplicate sentences. They will get deleted, eventually, by a script that cleans up the database from duplicates.

Also, if no one has translated your duplicate sentences, you can always replace them by another sentence. But only if no one has translated them (otherwise you will make their translations "wrong").

I know it may be a bit frustrating not to be able to delete your sentences, but like sysko said, we have a lot of things to do but not a lot of time, and deleting sentences is not really the most urgent feature ^^' You'll have to bear with us.

saeb saeb 2010年3月10日 23:50 2010年3月10日 23:50 link permalink

You guys are doing a great job. I just hope to see this project grow into something much bigger that everyone can benefit from.

sysko sysko 2010年3月10日 20:45 2010年3月10日 20:45 link permalink

except the files in the download section, no you don't have a way to generate yourself a list

for relation with anki, as I also daily use it, I really want to have a way to exchange data between anki/tatoeba, the problem is here is about time rather than "we don't want", it's just as we're a small small team, and we all do this on our spare time, so it's hard to find time to do everything.

btw, if we were to create something like, how would you like it ?

*a plugin in anki to search for example sentences for particulars words ? (I mean in a field you have your words, and when validating you, it popup some sentences of tatoeba containing this word)
*a plugin to sync your anki sentence deck with tatoeba ? (for example 2 fields , each with a sentence in a language which is the translation of the other, and to able to access them from tatoeba, and from anki if you correct it here ?

other ?

If you or you know someone who are willing to help us creating such a plugin (either by propose idea or with programming skill) don't hesitate :)

Wolf Wolf 2010年3月10日 22:05 2010年3月10日 22:05 link permalink

Funny thing... some days ago I _have_ written a plugin for anki that shows example sentences and their translations for my japanese vocabulary. I am currently in the final testing/optimizing stage and hope to release it somewhere around next week.

It uses the index lines to identify example sentences, so at the moment it works with Japanese only. Handling other languages would be a bit more difficult without some kind of index data (e.g. to find example sentences that contain an inflected form of the word you are studying)

sysko sysko 2010年3月10日 22:17 2010年3月10日 22:17 link permalink

lucene project already handle this project, and it seems it can be embedded quite easily (but it require a jvm)

by the way are you interested in helping us with a "tatoeba for anki" plugin ? we can provide an api for tatoeba that you can easily use with anki, you tell us what kind of data you send and what kind of data you want to receive and we will see what can do

Wolf Wolf 2010年3月11日 8:45 2010年3月11日 8:45 link permalink

Sounds interesting, I will contact you later with some questions :)

sysko sysko 2010年3月11日 9:22 2010年3月11日 9:22 link permalink

ok no problem, my email allan.simon at supinfo dot com

MUIRIEL MUIRIEL 2010年3月7日 9:44 2010年3月7日 9:44 link permalink

Great updates :)!!!

cburgmer cburgmer 2010年3月7日 10:07 2010年3月7日 10:07 link permalink

+1 :)

TRANG TRANG 2010年3月9日 21:19 2010年3月9日 21:19 link permalink

+1 :D

sysko sysko 2010年3月9日 22:27 2010年3月9日 22:27 link permalink

hope you will love the next ones too ^^