*shock* just figured out TRANG is a she, he he.
Ah, who told on me, that was supposed to be a secret.
lol, I never said this before but I actually stalked this website for quite a while before I finally decided to join, and I always imagined that you'd be like these programmers who like anime and have studied japanese for 5 yrs in their university..you know..with a cool blog about every obsessive detail of their life...and eyeglasses...you know the whole shabang... :D
P.S. guys like that do really exist :D
There's a lot of english sentences that are grammatically correct but I don't think anyone will ever say them, use them, or even see them in any english media...you know they're just "out of this world". What do you think we should do with these? Should we just ignore them for the moment, and focus on those that are totally wrong?
my take is, I'm gonna stay away from translating these and stop reporting them as wrong. I'm just hoping arabic natives can use sentences I'm translating to learn english.
what do you guys think? trang? sysko?
> There's a lot of english sentences that are grammatically
> correct but I don't think anyone will ever say them, use
> them, or even see them in any english media.
I think that it's more correct to say "any _current_ English media". The Tanaka corpus is old, and it used even older sources of sentences. Quite a few of them would not be out of place in books published before 1940, but are rather confusing to those of us in 2010.
I think those that are old-fashioned or highly idiomatic should be kept as demonstrating historical usage but should not be used as guides to writing English (or for translating into English). I think they are good candidates for an [Old-fashioned] tag or something. ;-)
Another problem is those that are written like dictionary entries (lots of 'one' usage) and those that are not really whole sentences. I think these are worth improving, as time permits, but are probably not a high priority for translation into other languages.
I agree to tag them in the future as "old fashioned" "40's english" "book-style" etc... rather than just "modernize"/"oralize" them
I am not sure how promising this is, but there is a Japanese-German sentence database hosted by the University of Hiroshima (Katsumi Iwasaki). It seems to have been created in 2004, without major updates since then. Maybe there could be a collaboration with tatoeba, thus increasing the number of sentence pairs. Of course, I am not sure about whether they want to publish the corpus, I am especially unfamiliar with data policies in Japan.
Here are the links to the search engine, the data description and the researcher's website:
And there's one also for spanish, I think it's free:
Maybe there could be a collaboration with tatoeba with that too :)
for the spanish, yep I need to contact the guy for a long time, but hmmm never find the motivation to write a email :blush: I will try to do so, I promise
you can do it sysko! :)=)
Some more suggestions.
I know that time is limited, so I shall try to keep a high ratio of usefulness to time required to implement. ;-)
1. Add a prominent link from the Tatoeba Project home page to the Tatoeba Project Blog. Actually I think it's worth adding a "Links" item to the list of headings on the top of the page. Useful links would include popular dictionaries, language sites, and sites that host collections of sentences.
2. Wish list. Maybe best as a blog article? I think it would be nice to have an idea of what features are planned, how likely they are to be implemented and how soon. Users could comment on possible features and suggest new ones.
3. Active dictionary links. This would be a long term and high effort suggestion but I think it would be useful to have active linking available from words in example sentences. Some languages (Japanese, Chinese) would require more effort than others, but I think it would be well worth it in the long run.
1. Like sysko said, the top menu has reached its limits. Someone with a 20 characters username (which I think is the maximum length) and using the French interface... might actually not even "fit" up there if (s)he's a Linux user... Something we'll have to check.
What I can do for now though, is to have a link to the blog from the "What's new" (along with the Twitter link). And in the blog, there's a "Links" section which only has Tatoeba, but I can add other things.
2. You will have more hints on what we are working on in the next blog post. I can't be writing about everything we have in mind because there are just so many things. But I can at least mention what's planned for the next few weeks :)
3. This is actually not very easy to implement because each sentence itself is already a link, and clicking on it leads to the page where you can see the comments and the logs. But I agree that it would be definitely useful.
> This is actually not very easy to implement because each
> sentence itself is already a link, and clicking on it
> leads to the page where you can see the comments and the
Yeah, I thought about that. What you could do, though, is implement the links in 'tooltip' style windows. For Japanese it could look rather like ...
click on 成る（する） to get the full dictionary entry in a separate window / tab.
1 - in french version and also regarding to ergonomic issue, 7 items is already a maximum numbers, but in the same time I agree it will be better to have the links in more visible place, but what if we add a wiki, "dialogs" and so ? So I think it need us to review what is needed, where, and to make is as much pratical as possible, I don't really the top menu to be over bloated (but who wants ^^)
2 - In a first time yep it can be a temporary solution waiting a wiki (after finishing all the "small issue" I makke it my 1st priority)
3 - For chinese, adso (which is definitely my swiss army knife for chinese) give th possibility to segment a chinese sentences into "words", at least consistant n-grams, so it would not be "so" difficult, and I'm sure such tool exist also for japanese
> I'm sure such tool exist also for japanese
It does, but it's not 100% accurate. In any case this is the primary reason for the existence of the 'index' data for Japanese sentences.
I think that commenting on English sentences that do not sound right is very inefficient since there's a load of them. so...I created a list where everyone can dump them into for native speakers to correct.
Yes, I think once a sentence has been corrected, it should be removed from the list. Otherwise you would end up with a very long list...
Anyway, this is a good initiative! The only problem is that this list won't have a lot of visibility (for now), contrary to posting a comment, because the latest comments are displayed on the homepage.
So until we have time to do something about it, what you can do, I guess, is to contact other members who are native English speakers, and ask them if they could help with those sentences.
true, will do both then...
I just hope it doesn't look like I'm hijacking the homepage :D
Any suggestions on what to do with sentences that have been corrected, commented on, etc.. shall I just remove them from the list to keep it clean? (I think everyone should just correct them directly and then remove them from the list...)
There is a formatting issue with Arabic.
Whenever I type a full stop at the end of a sentence it appears at the beginning.
Now, should I just let it appear at the beginning even though it would appear at the end if copied to a word processor?
or Should I put the full stop at the beginning on purpose so it appears at the end when viewed on tatoeba?
We used to have the issue but I still don't know why, unfortunately it's Tatoeba independant :( but the work around given in the message will fix the problem when it happens
will I need to add ‏ every time I add sentence?
‏ (with the ; )for sentences which have problems
sysko I give up. It seems I'll have to do that for every sentence I enter...
no problem, I will add in my looong todo list "handle problem with right to left language"
merci bcp sysko :)
or better yet not bother at all with the full stop...
This also happens to all other symbols not followed by text :O
I know this might be a stupid question, but what's the percentage that appears to the top right corner of a lot of sentences? err..I mean what is it supposed to indicate?
I believe it is an indication of how good a match the sentence is for the search used to find it. I have no idea how it is calculated (particularly for searches on Japanese text ;-)
I think we should remove requests for error fixes after the error is fixed, because the comments rather confuse people who read the corrected sentences and the outdated comments.
Yep we totally agree with you, that's why we've included possibility to remove comments on sentences, maybe we will in the future permit people to view all their own comments in order to delete more easily
I don't like just deleting comments like that, it removes history and obscures the workings of the site.
You're left with changes that were directly caused by comments that are no longer there.
I'd rather have the ability to archive the comments by marking them as no longer relevant, and a message like "This sentence has 3 archived comments, click here to show them". Then they wouldn't clutter up the page, but you could still unhide them if you wanted to.
maybe we can imagine a system like wikipedia, comments which are about correction or so will be in a "discussion" page or something like that, and only comments which bring further information about a sentence (for example if there's some important grammar point or if this is a famous quote etc...) will be directly visible from the sentence page
That's a very good idea. We are planning something like that for the online edit system for JMdict/EDICT.
don't you think, it's more confusing, when there are comments without context, because someone deleted his comment and someone else didn't delete his comment referring to the first one?!
I would agree to Muriel. I suppose the cleanest way to do this is to allow the sentence owner to delete comments. Thus, after correcting a mistake, he could remove all the outdated comments.
Same as Muiriel, I wouldn't feel comfortable letting the sentence owner deleting the comments. Having a moderator or admin delete your comment can be tolerable because they are people who (are supposed to) know the rules, who know how things work, and wouldn't be deleting things that should have been kept. But a simple user, even if it wasn't a bad intention, can end up deleting important things.
Anyway, one of the things we have thought of is to have some sort of "public notes" associated to each sentence. These notes can be edited by everyone and would only contain essential information for the learners.
People can then say whatever they want in the comments. And whenever there's something worth noting, then someone can write in on the notes.
When you will browse a sentence, the comments will not appear below it anymore. Instead, the public notes will be displayed.
That's the basic idea but don't expect to have that implemented anytime soon though. I think for at least six months, the members will have to organize themselves as they can with the comments...
As far as I'm concerned, I'm fine with deleting comments like "There's an 's' missing" when the mistake has been corrected. But well, I'm certainly not going to hunt for those... It wouldn't not a very productive way to spend my time ^^; I think everyone can be self-responsible and take care of deleting their own comments when it is appropriate to do so. You can always try to write private messages to people, to ask them to delete a certain comment they posted.
Comments on grammar and expression might be helpful, but only if the original sentence was visible at that time.
Comments about a sentence which has already be changed are confusing. So either there should be a "sentence stamp" showing the sentence at the time the comment was made, or we have to include the sentence in our comment if we want to write something "noteworthy" which should be kept for a longer time.
that would allow censorship :S.
Most comments about wrong spelling do only take up space, so it would look "cleaner" if they were removed after the sentence is corrected. But I think comments about expressions or grammar can still be helpful to users even after the sentence is corrected.
You have time and patience, you can always browse through the whole list of comments and send me a private message to indicate the comments that you feel should be deleted...
Or at least, you can search for your own outdated comments and delete them.
I don't think the comment stuff is such a big deal. The sentence log shows previous versions anyway and I don't think it's that difficult to work out that old comments may no longer apply.
Is there a way to extract sentences of a certain language? any plans in the works? How about exporting to anki or iKnow...or importing?
How about deleting sentences or translations (at least your own sentences)?
I've got this problem where I add a translation and three copies of the very same translation get added, happened twice. Don't get it. What should I do now?
Don't worry about duplicate sentences. They will get deleted, eventually, by a script that cleans up the database from duplicates.
Also, if no one has translated your duplicate sentences, you can always replace them by another sentence. But only if no one has translated them (otherwise you will make their translations "wrong").
I know it may be a bit frustrating not to be able to delete your sentences, but like sysko said, we have a lot of things to do but not a lot of time, and deleting sentences is not really the most urgent feature ^^' You'll have to bear with us.
You guys are doing a great job. I just hope to see this project grow into something much bigger that everyone can benefit from.
except the files in the download section, no you don't have a way to generate yourself a list
for relation with anki, as I also daily use it, I really want to have a way to exchange data between anki/tatoeba, the problem is here is about time rather than "we don't want", it's just as we're a small small team, and we all do this on our spare time, so it's hard to find time to do everything.
btw, if we were to create something like, how would you like it ?
*a plugin in anki to search for example sentences for particulars words ? (I mean in a field you have your words, and when validating you, it popup some sentences of tatoeba containing this word)
*a plugin to sync your anki sentence deck with tatoeba ? (for example 2 fields , each with a sentence in a language which is the translation of the other, and to able to access them from tatoeba, and from anki if you correct it here ?
If you or you know someone who are willing to help us creating such a plugin (either by propose idea or with programming skill) don't hesitate :)
Funny thing... some days ago I _have_ written a plugin for anki that shows example sentences and their translations for my japanese vocabulary. I am currently in the final testing/optimizing stage and hope to release it somewhere around next week.
It uses the index lines to identify example sentences, so at the moment it works with Japanese only. Handling other languages would be a bit more difficult without some kind of index data (e.g. to find example sentences that contain an inflected form of the word you are studying)
lucene project already handle this project, and it seems it can be embedded quite easily (but it require a jvm)
by the way are you interested in helping us with a "tatoeba for anki" plugin ? we can provide an api for tatoeba that you can easily use with anki, you tell us what kind of data you send and what kind of data you want to receive and we will see what can do
Great updates :)!!!