Mur
-
I think we should remove requests for error fixes after the error is fixed, because the comments rather confuse people who read the corrected sentences and the outdated comments.
-
Yep we totally agree with you, that's why we've included possibility to remove comments on sentences, maybe we will in the future permit people to view all their own comments in order to delete more easily
-
You have time and patience, you can always browse through the whole list of comments and send me a private message to indicate the comments that you feel should be deleted...
Or at least, you can search for your own outdated comments and delete them.
-
-
A general question about Chinese:
Is there a policy regarding simplified and traditional Hanzi?
I just found a sentence (nº346168) with traditional Hanzi posted by someone from Hongkong. So far the "policy" seems to be that both scripts will be put under the same category. This might be a burden for people learning the language who do not recognize the differences.
-
Advice for those leaving corrections in comments.
I strongly suggest you 'favorite' any sentences that you leave corrections for so you can check whether they are actually corrected or not. -
Does anyone else have problem with the updated version as well?
I use Ubuntu 9,10 with firefox 3.5.8, and since the update today, I cannot add new sentences. For each sentence I see this turning circle on the left (before I saw it only when I edited the sentence), and pressing "submit translation" does not seem to work.-
- #
-
- lilygilder
- il y a 3 jour(s)
Yeah, me too. I work with Vista and Firefox 3.5.7 and can't submit any translations either. -
I suspect some sort of database glitch. I'd give TRANG and sysko an while to fix it before worrying too much.
-
Ah, well try doing CTRL+F5 on the homepage (or any page where you systematically see the loading animation when it shouldn't be there).
The problem is that your browser is still displaying from an old CSS file, and so the layout will look strange. You have to force Firefox to update its cache.
-
-
I found some incorrect sentences linked together, but don't know how to resolve the problem:
Sentence nº313285 "no-smoking area" is correctly translated in
Sentence nº90428 as "禁煙区域" but incorrectly in
Sentence nº90427 as "禁猟区", which means no-fishing zone, or no-hunting zone.
The best solution would be to cut the link between nº313285 and nº90427. Or should I delete nº90427 and add it again as the correct translation of my German sentence about the hunting zone?
Greetings -
WWWJDIC index line.
I suggest adding links from words in the Japanese sentence to WWWJDIC entries using the information in the index line. That would be a useful 'first step' towards adding furigana to the sentence.
The basic set-up is relatively straight forward, but there is one complication - namely 'deliberately non-indexed text'. Punctuation, English words, place names and other proper nouns are not generally included in EDICT and so do not have entries in the Index line. Jim Breen should have a 'no index' field that includes all non-indexed text (although it may not be up to date). In order to parse a sentence properly you need both the index line and the non-indexed text.
Adding furigana to place names etc. should probably be left for later. -
I've looked through a lot of contributions, and I've come to the realization that there are a LOT of contributions in English made by non-native speakers. I assume the same is the case for other languages, especially Japanese. There needs to be some sort of indicator for each sentence on whether or not the last editor was a native speaker. I've seen a lot of English sentences that are perfectly grammatical, with no errors at all, that I have never in my entire life heard someone utter -- correct or not, a native speaker would never say them.
-
As the major part of both japanase and english come from the tanaka corpus, I can understand that the english is not really reliable, but for most of others language, Spanish, German, Polish, Chinese, I can say that for these languages 99% has been added by native
I agree we need a way to precise if the sentence has been added or reviewed by a native, we're currently thinking about a nice way to do that, maybe something to tag some sentences as "trust"
anyway for the moment one can assume that sentences which belong to someone are much more reliable than orphans-
> As the major part of both japanese and english come from
> the tanaka corpus, I can understand that the english is
> not really reliable.
The Tanaka Corpus was, initially, generated by students submitting pairs of sentences with the intent that the Japanese and English meant the same thing. So the Japanese is marginally more reliable than the English because the person entering it was Japanese.
However you cannot assume that the Japanese is correct and the English unreliable all the time. It's more complicated than that.-
Being a native German speaker, I came across both Japanese and English sentence which I felt were not correct, however I was not 100% sure.
It would be a cool feature if non-natives could mark a sentence as "questionable", and then this sentence could be checked and corrected or verified by a native speaker. I suppose this would be rather easy to implement using the word list feature. So a non-native speaker would not correct a sentence which he is not 100% sure about, but put it into this list, and native speakers could occasionally go through the list and check for grammatical errors. This would drastically improve the quality of the sentences, if the feature is known and used by most users.
-
-
-
-
Favourite'd sentences not working.
I set a number of sentences as 'favourite', but when I use the link in my account page it says "This user does not have any favorites." -
-
yep it's from the kakasi project
http://kakasi.namazu.org/
as said before, the project seems more than dead :(-
Oooh yes. I remember this now.
The bad news is that kakasi probably isn't really fixable. I think you'd need to re-writing the code in a major way, not just add a few lines to the dictionary, to fix it.
The good news? Removing the line
ぜつ 絶
from the file 'kakasidict' may correct one romaji error in generated romaji.-
I think we can also try to find if there's people motivated to start a project for automatic romanization of japanese, or looking if there's not an embryon of such project and see how we can help
-
For now, if there was the possibility to enter the romaji explicitly, and if manually entered and automatically generated romaji could be separated, that should make for a good test set for evaluating different methods for automatic generation.
I think that ideally one would start with a mature project, and automatically add corrections to the training set.-
> I think that ideally one would start with a mature
> project, and automatically add corrections to the
> training set.
There are six main approaches that could be taken.
1. Drop romaji support.
2. Allow manual correction of romaji.
3. Develop romaji generation code that uses the WWWJDIC index line.
4. Further develop kakasi
5. Look for alternative romaji conversion software.
6. Develop romaji conversion software from scratch.
I would recommend 1, 2, or 3.
4. Could be done, but I think you would soon reach limits on what is achievable.-
(I don't speak japanese at all, so excuse me if i speak non sense)
Nemo talk about JUMAN to replace kakasi, which can output in kana,
is kana not better as that way we're sure people who can't write japanese will not "accidently" mess up the "romanization", by restricting the "reading" part to kana characters , and we're also sure people use the same convention as there's only one kana per "sound"
(Trang always take about different way to write the romaji)
what do you think ? Trang ?-
If Juman's kana/categorization output is accurate, it can produce 100% perfect romaji output. Kana give a representation of how something is said, along with its syntactical representation. There are ambiguities in kana, but JUMAN gives enough information that the pronunciation and syntax can be reconciled to provide a perfect, phonetic, romanization.
-
I should give a little more information than I have in the past posts I have, I think, because there seems to have been little progress. I don't really want to come off as being harsh, but the reality is that Kakashi is a lost cause. Whoever coded the program did so in a very naive way, and to use sed to correct its errors would take an inordinate amount of both human and CPU time, and in the best case scenario, it would cause such undue load on the server so as to make tatoeba unusable. I've gotten the impression that Kakashi was chosen with little to no consideration of other options (c.f. below), despite the fact that there exist ways to accurately dissect Japanese text into parseable units, which could be further changed into romaji. The reality is, Kakashi is nowhere near mature enough to produce accurate results, and as an abandoned project there is little hope of it reaching that maturity -- its output will never get any better than it is. In contrast, Juman seems to be near-perfect, though I will admit that I have not tried the other romanizers suggested in the blog post, nor have I done extensive testing of Juman. Regardless, Juman seems to be acceptable, even optimal. Kakashi falls so far short of the mark that I'm not sure why it is even in use. I would even go so far as to state that if Kakashi remained the method of conversion, that by the time tatoeba becomes popular, greasemonkey scripts will be produced which correct romaji via some other means, if that's even feasible. (Here's the blog post I referenced: http://blog.tatoeba.org/2009/02/tools-for-japanese-romanization.html )
-
Yes, to be honest, KAKASI was chosen with no consideration of other options. It was the first one I found that when I searched for a romaji converter, so I picked it.
And only later I wrote this blog post where actually searched and I listed other solutions. Solutions that I should have explored but never had the time to =/
I completely agree with you that KAKASI is not the long term solution.
Anyway, considering you have been taking the time to write all these posts, I will take a look at Juman ;). But if you can just tell me quickly what command to use to get a Japanese sentence parsed and converted into kana, that can save me some time from going through the documentation. Ah but, does JUMAN supports UTF-8...?-
From a quick look, it looks like you have to convert to and from EUC-JP. Piping a sentence through "juman -b -c" gives one line per word, with readings in the second of the space-separated columns.
-
There's a powerpoint tutorial, I'll look at it when I have time and translate it. The translated user guide focuses on the whole idea behind the system, and why it was/how it was developed, and then when it comes to the syntax, it's just a bunch of "I don't know this word" and "If you break this down, it would mean something like..."
-
-
-
-
-
My ideal approach would be using WWWJDIC indices, combined with a better software for conversion into romaji or kana.
As for making romaji editable, if we were to make anything editable, I'd rather it be kana, like what sysko suggested.
If the purpose is to provide something useful for learning, then it's obviously better to have a sentence in kana, with spaces so that the learner knows how the sentence is composed. And of course we can use the sentence in kana to generate correct romaji.
-
-
-
-
-
-
-
That took me forever to write but hopefully it will prevent us from explaining certain things over and over again: http://blog.tatoeba.org/2010/02/how-to-be-good-contributor-in-tatoeba.html
-
WOooOW i haven't read it entirely, and you've did a really damn good job, many thanks to Trang :)
-
- #
-
- lilygilder
- il y a 13 jour(s)
Thanks Trang, this clears up a lot of problems. Very helpful. =) And kudos for writing all of this.
-
-
Autolinking broken (See latest comment in http://tatoeba.org/eng/sentences/show/181800#comments )
link to
http://mitleid.cool.ne.jp/tonegawa.htm
ends up pointing to
http://tatoeba.org/eng/sentences/show/%5C%27http://mitleid.cool.ne.jp/tonegawa.htm%5C%27 -
Simple suggestion.
In a break from the difficult and / or controversial suggestions I have one simple one to offer.
I suggest that the sentence list pages (e.g.
http://tatoeba.org/eng/sentences_lists/edit/24
) should use their description as their page title (e.g.
Sentence lists: jpn->eng translations needed
instead of just
Sentence lists
) -
Seriously - romaji editing now. ;-)
I don't think there's any point in waiting for "a serious Japanese contributor". Most of the romaji errors are very obvious and either I, or half a dozen or so regulars here, would be well able to correct them if they had the chance.
I would go as far as to say that it would be better not to have romaji AT ALL rather than leave them in the current state.-
Then it may be no romaji at all... But I want opinion from more users first. Is it better to have no romaji at all, or is it better to have something even if it's not 100% correct?
I know Nemo is against romaji as well, but if I have added it, it was because more than two people had requested it in the past.
Regarding editable romaji, I'd rather avoid having people to waste time on correcting romaji which is why I don't want to make it editable.
Most of the time it's a systematic error that can be found in more than 100 other sentences. If I made romaji editable, you'd have to edit them one by one.
You'd also have to make sure everyone agrees on the romanization rules and follows them, which is again more work.
I think it's better to improve the software (not necessarily KAKASI) to the point where it can't get any better. It would save time for so many other people in the world...
Perhaps there is someone out there who is actively developing an open source Japanese parser and furgina-romaji converter. I haven't had time to search, but if you do find one (and by "you" I mean anyone who is reading this), by all means, let me know.-
I'm for having all the romaji on the site be accurate. If the best way to do that is deleting all of the romaji, then I'd say do that. If you really want an accurate romaji representation, it will probably need to be written ad hoc. I don't think this should be too hard though, so long as it is written for this project specifically, and it is done soon. This site is currently comprised mostly of the Tanaka Corpus, so far as I am aware, so almost every word in the Japanese examples should be also present in EDICT, which has the reading of every word in it in kana. If there are multiple readings, I would just make the output something like:
僕は市場へ行った
*** boku wa (shijyou | ichiba) e itta
So that the edge cases could be fixed. It's still a lot of work, but it's doable. (In this case, the difference is irrelevant, but in many it could be relevant). You could then dump the database into a text file of all beginning with ***. I believe EDICT even has the readings listed in order of frequency, so if you wanted to you could have it just guess the first one every time, and fixing the few that got put in incorrectly would not be a huge ordeal. I would recommend keeping some automatic conversion in place, and storing things in the database as:
僕は市場へ行った
ぼくはしじょうへいった
and having the conversion take place from the kana to romaji on-the-fly. Also, force those editing the romaji to use kana. Basically introduce a learning curve that will discourage those who don't know better from thinking they do. Also, changes in romanization could be implemented very easily. I personally use wapuro romaji whenever I do, which is rare still, but I know this is less than ideal for learning.-
My whole post is a waste of time, lol. The software you are using has an output to kana mode, which would not be subject to the pitfalls that romaji is. I suggest we use that. Kana is not that difficult to learn, and there's no sense in learning grammar/sentences before kana anyway.
-
We need post editing, haha. JUMAN does exactly what you need. It converts from kanji to hiragana, and labels each word with what it is. So, if it says は is a 助詞 (particle), you can output wa, and the same for all of the others. I'm not sure that it outputs romaji (The sample set-up does not), but with kana and part of speech, romaji is just a lookup table away.
-
-
-
> I'd rather avoid having people to waste time on
> correcting romaji which is why I don't want to make
> it editable.
That's basically another way of saying that the romaji isn't important. If the romaji isn't important I'd rather it wasn't there than be there and often incorrect. ;-)
Having a kana version or furigana would be a nice alternative. kana would get rid of the
o / wo
e / he
wa / ha
confusion. Note that a combination of Edict and the Index information could be used to generate pretty-much-correct furigana or kana. (Not that easy, but doable)-
- #
-
- JeroenHoek
- il y a 8 jour(s)
I agree with Paul that furigana might me preferable to broken rōmaji. Learning the basics of kana shouldn't take you more than a month or two, after that, kanji readings become the hard part. Furigana should, in my opinion, eliminate the need for rōmaji for learners of Japanese.
Rōmaji is mostly useful for transcribing Japanese for a public that cannot read any Japanese at all. Also, the rōmaji generated by Kakasi is wāpuro-rōmaji.
-
-
-
-
May I disturb the silence once again. Consider the situation where somebody translates sentence A into B, then somebody later comes along to translate B into C. It then turns out that B is wrong and is consecutively changed. This invalidates C. Any ideas/plans for that?
-
Part of the answer is in the comment I wrote here:
http://tatoeba.org/eng/sentences/show/126
And in my todo list for the weekend: write some guideline so that users know how contribute correctly. -
this have already been discussed ^^
soon, we will add an unlink feature, so you will be able to say "these sentence are no longer translations to each other"
so what to do in your case
in fact you're not supposed to change the B sentence, as long as the sentence is by itself correct, because as you've said, it will make translation of B erroneous too
so you just add a B2 sentence and add a note that the B sentence will need to be unlink to A sentence
-
-
+1 feature request:
Tag sentences for a special form. Say your language has several possible translations for a given sentence, tag these sentences for the given feature. Example: http://tatoeba.org/eng/sentences/show/366466 I would add one for informal you (french 'tu') and one for formal you (french 'vous').
A wiki would be nice to list those feature request, I think this "Wall" might get a bit to muddled.
-
Finding contributors (to the code): Why don't you guys (& girls) share (i.e. open source) your site's code on say github? First that would make this a truly "open source" project, and secondly people could help add features. Think about it.
Some guys here seem pretty eager to get their features implemented ;)-
In fact the code is already available on a svn public repository (with read only access and a AGPL licence (I'm a bit FOSS fanatic), but I don't think we're against code contributors, so write access can be granted for motivated code-contributors)
after why it's not explicity written somewhere on the website, hmm I think (Trang has maybe much relevant reason than me) it's because the project lacks documentation and is in a rewriting / cleaning phase, so we prefer to show a pretty reviewed code ^^
but if you want to take a look, I can give you the repo in private message -
Like sysko said, we are actually open source. The reason why it's not promoted anywhere is because:
1) The code hasn't met my standards of elegance yet... Still too many parts that make me cringe when I look at them.
2) We still don't have a sound methodoly and organization in our way of working and I really don't have time to manage more people ^^'
(we're in PHP though, sorry :P)-
-
Something this thread brings up, the user interface should be stored in a cookie, not as part of the URL. French is easy enough, but what if I click http://tatoeba.org/chi/sentences/show/366507/ and I'm new and/or don't read Chinese? I'm stuck and have to go back, close the window, or start over at the main page. Quick and Dirty fix would be change all of the languages to something like "English - English; Francais - French; zhongwen - Chinese; nihongo - Japanese" etc. (Not suggesting they be romanized, I just don't feel like dealing with my IME.) Not trying to be Anglocentric, but most people can decipher language names in English.
-
it is stored in the session,
moreover at the top of each pages you have a menu to change the language interface ;-) with the name of each language in it's own way (français,english, 中文 ...) and when changing it reload the page with the new language, and change the language of your session
and all new pages you want to show will be displayed in this language-
My point is, not everyone knows Chinese/Japanese. In fact the vast majority cannot read a single character. So they arrive at a page, seeing "中文" at the top is of little or no help, nor is "日本語". It's not a problem for you, not a problem for me, but sit some random people down at the page at tell them to navigate it.
-
-
-
-
-
-
Change Request
For the 'wish list' I would like to suggest a couple of features.
1. People who don't own a sentence can click an icon (check a checkbox) when posting a comment to make it an official request to change a sentence.
2. Add an extra line to the "Your Links" section of the profile.
* View all my sentences
* View sentences with change requests
* View my favorite sentences
* Sentences with undetected language
Reason:
It is too easy for sentence owners to miss comments. For example I suspect fcbond hasn't noticed the comment I made in the following link.
http://tatoeba.org/eng/sentences/show/79465-
Maybe that's a bit too complicated. At least there are several other feature requests in the pipe, that might have more impact.
As you indicated in the 2nd note already though, adding a "news feed" for requests on one's sentences is surely needed.
I would propose to change your request into something with a broadend scope: "disown request" to disown sb by taking over their sentence. This could also be used on inactive users. Some time without reaction *zing* you go ahead.-
I also was considering recommending a way of taking over neglected sentences, but I considered the 'change request' to be a less controversial idea. After all not everybody who's away for a month or two has actually given up.
-
maybe in a more general way, we can make something "a la twitter", I mean, when you want a comment to be notify to someone, you just @userNickName, and maybe add a "feed" section in the profile
anyway in future release we plan to review a bit the architecture, add some category in the profile, and make your profile a more central page
what do you think ?
PS: for disowning, I will see with Trang how she planned to handle it (we used to talk about that, but i've a weak memory :( )
-
-
The way I envision it is not really "disowning", it's more about finding a better owner if the current owner doesn't do his job (kind of like, if you're a bad parent, your kids are taken away).
But I still don't know what would be the best solution because we don't really have a real need for that yet. I mean, it's not very frequent that you'd want to disown someone from his/her sentences. Which means it's not an urgent feature either.
-
-
Normally he fcbond should have received a notfication when you commented his sentence, but the notification system was broken :'( ...about 100 notifications that should have been sent but weren't, *sigh*.
Anyway, having a link to a page that lists the comments posted on your sentences is something I should have done a long time ago... When I decided to integrate an "ownership/adoption" system in replacement of the moderation system we used to have, it was obvious that users should be able to quickly access comments on their sentences.
But I think that having a checkbox would be add unnecessary complexity. People will usually read all the comments on their sentences, so there's no need to filter out specifically those that require a correction.
-
-
Even if it's written in the terms of use, which is supposed to be accepted by everyone who contribute
it is FORBIDDEN to add sentence which come from books / dictionaries, for the simple reason it does not belong to you, and by the way, you can't deliver them under a CC-BY licence
thanks to take care-
-
no problem if they're in the public domain, even if sometimes it's hard to say, espcially for book which are in public domain after its author death, as the period change from a country to another
but yep, no problem for copyright-free books, or for books you've written yourself, or for those the author give you the right to use extract in Tatoeba :)-
I take it, though, that you can't use "Fair use exemption" ?
I understand that whether fair use is included in copyright law depends on the country in question. (Japan doesn't, yet, but might soon http://www.cartoonleap.com/2009/03/25/japan-may-adopt-the-fair-use-principlefinally/ )
Whether fair use applies also depends on a number of other factors (US case http://www.masters.edu/DeptPageNew.asp?PageID=1744&minimal=true )-
for the "fair use", it will not say an absolute "NO, we can't". Because for example France, even without the "fair use" notion, authorize to make small quote of books as soon as the quotation is justified by the scientific, pedagogic information they provide to the work they're incorporated in. But after as we have no lawyer in the team, and to be honnest, other works we find more important to do than to check if we can have a safe "fair use".
Moreover I think it's better in a first time to say "no, no quotes from copyrighted books", to avoid quote from books which can not be considered as a "fair use", rather than playing with fire.
But one day, when Tatoeba will be quite near complete in feature (i.e when we will not have dozen of feature request to code and bugs to fix ^^), I try to see if it's possible or not, and then give you a clear and absolute answer.
so my answer "for the moment we will considered we can't"
-
-
-
-
-
Profile suggestion:
I suggest that a selection box for [Native Language] be added to the profile page. It might also be a good idea if English (UK) and English (US) had separate entries.-
Well, the question of language is not always simple...
You've pointed out how it would be a good idea to separate British English from American English. Same thing would apply for Portuguese, probably for Spanish too, and many other languages.
Then there will be people who will be native speakers of more exotic languages (or dialects).
And then there will be native speakers in two languages (or more?) because they have an international background.
So for now the best thing is just to use your "description" field to indicate your native language (and in general, the languages you speak), just like you did.
We won't have any search feature on users before a long time, and this is not the type of information we'll force users to provide, so having it in a special "Native language" field, or having it in the "description" field won't change much... besides perhaps the fact that we may get more people to tell their native language if we actually do have a "native language" field.-
> besides perhaps the fact that we may get more people to
> tell their native language if we actually do have a
> "native language" field.
I think that would be one of the most important advantages.
On your other points I would suggest the following:
A good source of a quite complete list of languages with major variants would be the list used for the System Locale used by Windows. (See 'sublanguage' list at http://msdn.microsoft.com/en-us/library/dd318693%28VS.85%29.aspx )
You could have allow either selection from a list or a custom entry.
Multiple native languages and/or notation of secondary languages could either be noted in free text in the profile, or have additional optional fields.
The main reason for wanting this information (if not necessarily specifically in special fields) is that it is useful to know how much the person entering sentences / adding comments knows about the language used before arguing with them. ;-)
If somebody says "は is ungrammatical in this sentence" it makes a big difference to my response if I know whether: a) the writer is Japanese, b) he has good second-language skills in Japanese, or c) he is a learner. -
TRANG, I proposed tagging of sentences above (and maybe others have done so before). While my usecase was tagging specifics of a language in a particular language this could be extended to express nearly anything and everything. A sentence could be tagged as "AE" or "BE" for American and British English or even from which century an example was.
-
-
-
How do I set the language I translate into?
If I click on "死ね" and try to add a translation, it comes up as German, while I mean it to be English, ...-
for the moment (but we will change it soon) you have no way to directly specify the language
But, if Tatoeba misdetects the language, then you simply click on the flag next to the incriminate sentence and set it to the right one. It can be done whenever you want, as soon as the sentence belong to you (like editing)
-
-
Which languages to add, and which not to add.
Just a short note to say that Wikipedia adopted the policy to only create Wikipedias for languages that have an ISO 639-3 Code. There might be exceptions. I think this decision helped them pretty much ease the process for new languages.-
Yep we used to have a discussion about that, and we don't plan to only add languages which have an ISO 639 alpha 3 code, wikipedia as to do this because they have tons of contributors, and an encyclopedia need much more data than just a database of example sentences, so I can understand why they don't prefer to have a lot of articles rather than a lot of dialects or so
but for us, I think as soon as the language is enough different to be not totaly intelligible with an other, then we can add it as a specific language (that's the case for shanghainese for example, the closest ISO 639 code is for Wu language, but the Wu language, for which shanghainese is a "dialect", is divided in some other "dialects" (even i don't really like the word "dialect"), which are not intelligible with shanghainese)
and as I think tatoeba can be used to keep a trace of language, especially endangered ones, I would find really pityfull to not add a small language, only because it has no iso 639-3 code (moreover I've heard iso 639-4 code will be released)
-
-
A->B or B->A
There doesn't seem to be an easy way to tell whether a pair of sentences are
A(Japanese) translated to B(English)
or
B(English) translated to A(Japanese)
I would like to make sure this feature is firmly placed in the wish list for future development.-
Is there a specific case where you would need this information?
One not too difficult way is to look at the creation date of each sentence (which you can see in the first entry in the logs). If A was created before B, then it must have been A->B.-
> Is there a specific case where you would need this information?
Not as such, but I can explain _why_ I want this information to be recorded / displayed.
One common use of the translated example sentences is to explain what the original sentence means. So it is quite normal to have, for example, obscure English translated (explained) into normal Japanese.
Imagine you have this:
http://tatoeba.org/eng/sentences/show/270663
A. Many a mickle makes a muckle. [Proverb]
B. 塵も積もれば山となる。
B is the Japanese equivalent (and 'translation') of A.
Some well meaning person might decide that hardly anybody knows what "Many a mickle makes a muckle." means and 'correct' it into a different English sentence.
A more common example would be the
http://tatoeba.org/eng/sentences/show/71044
A. I'll make you a present of a doll.
B. あなたに人形をお贈りします。
If B. is the original and A. the translation then you could well say that the English in A is a little odd and should be changed. If A is the original and B is the translation then you could say that the A demonstrates a somewhat old-fashioned phrasing and B explains is.-
In those cases it would be desirable to have both phrasings, with uncommon ones marked somehow, so users should be encouraged to add alternate translations rather than 'correcting' existing ones when the translation is not erroneous.
I think this could be handled by attaching more metadata to sentences. Properties like masculine/feminine, proverb, quotation, polite, slang, dated, etc. could be tagged onto a single sentence, so you could have e.g. both a polite and a colloquial translation, and mark them as such.
That would also address the problem where sentences from the Tanaka Corpus are currently marked with tags like [F] on the English sentence, even though the property belongs to the Japanese sentence. That doesn't work that well when the sentences aren't restricted to pairs.
-
-
-
-
Procedure when replacing (not correcting) sentences.
See http://tatoeba.org/eng/sentences/show/218263
If a sentence is pretty much completely replaced it's position in the "Sentence X is translation of Sentence Y" system may need to change.
Example:
BEFORE
Sentence A (Japanese) was (allegedly) translated into Sentence B (English) which was in turn translated into Sentence C (German)
AFTER
Japanese is noted to be unrelated to English and so completely replaced with a new Japanese sentence.
So _now_
Sentence B (English) is translated into
Sentence C (German) and also translated into
Sentence D (Japanese).
So A -> B -> C
changes to
B -> C
-> D -
- #

- JeroenHoek
- il y a 27 jour(s)
Here's a nice user-test for the system:
Sentence nº164914 was a Japanese sentence I wanted to edit. I edited its English translation, which went well, but because tatoeba.org responded too slow I accidentally added a new Japanese sentence (nº361150) instead of amending nº164914. I can't delete translations, so I just changed that old Japanese one to a Dutch translation of the sentence. Meanwhile, someone else added a German translation too! :)
Questions:
* What happens to the indices for the former Japanese sentence?
* Should user be able to do this?
As a developer I suspect that changing languages on an existing (Japanese) sentence is bound to cause issues, but as a user it makes perfect sense to solve the issue I ran into. Your thoughts?-
> What happens to the indices for the former Japanese sentence?
It gets left behind. I manually copied it to the new sentence in this case, but Trang will need to clear up things. I'm not sure if I can delete index entries.
> Should user be able to do this?
It's probably a bad idea.
Suppose you have
A(English) translates to B(Japanese) translates to C(German)
If you change the Japanese to a new, Dutch, sentence you get
A(English) translates to B(Dutch) (doesn't really) translates to C(German)
Because C is really the translation of the (vanished) Japanese sentence not the (new) Dutch sentence.
I think it would have been best to have left the Japanese duplicate and added a "Please delete me" comment.-
- #
-
- JeroenHoek
- il y a 26 jour(s)
> I think it would have been best to have left the Japanese duplicate and added a "Please delete me" comment.
Agreed. (Since Tatoeba is in beta, I try to actively break things by using it from a novice user's perspective.)
Trang:
Perhaps "nominate for deletion" could be added as explicit functionality? A way for user's to flag a sentence as undesirable (with optional comment). In time the comment system will become hard to monitor for "delete me" type messages.
-
-
> What happens to the indices for the former Japanese sentence?
As Paul said, it gets left behind. We don't have (yet) strong mechanisms that would help keep the database consistent.
> Should user be able to do this?
No, they shouldn't. Ideally, there should be guidelines (which I'm hoping to be able to write by the end of the month) to help users understand better how things work and how they can contribute in a way that doesn't give us (developers) more work than we already have ^^'
I wrote down some of the ideas in my comment here : http://tatoeba.org/eng/sentences/show/126
> Perhaps "nominate for deletion" could be added as explicit functionality?
Yes, in general, we could have various status for a sentence. Actually in the previous version of Tatoeba we used to have that, but I haven't re-implemented it. A sentence could be marked as "to delete", "checked" or "locked" (and perhaps other things, I don't remember). When a sentence was checked, it meant you could rely on it for not having mistakes. When it was locked, no one could edit it anymore.
But this is not urgent compared to other things we have to do. My priority at the moment is to make sure that people understand clearly that when they translate, they have to translate from the sentence written in big letters. I'm pretty sure that very often, people are adding translations to a Japanese sentence when they were actually translating from the English sentence.
We also have to enable people to link and unlink sentences. There are many sentences that are linked to each other without being translations of each other, and there are many sentences that could be translations of each other but are not linked to each other.
Once all of that is settled, and people understand that they have to view the corpus as a GRAPH and not a table, it will be less likely that they behave in a way that we don't want them to behave, like what you did. And perhaps "nominate for deletion" will not be *that* useful because instead of deleting, you could just edit your sentence into whatever you want and unlink it from any sentence it was linked to.-
- #
-
- JeroenHoek
- il y a 26 jour(s)
Educating users is desirable of course, but opportunistic contributors will make mistakes. In the case of incidental contributions, not being able to delete an entry that should not have been created, nor nominate it for deletion is likely to frustrate the user. An alternative may be to offer a grace period for sentences you created yourself, being able to delete them within a certain period as long as they are not linked to by other new sentences.
On the topic of linking translations: is it possible to link a sentence to multiple sentences? There are many cases where the translated sentences actually do function as proper translations of each other, as well the sentence they are linked to.
Visualizing the graph is challenging within the confines of HTML/CSS, good luck there. Further indenting of the non-direct translations might help.
-
-
We could do with an option to show sentences in language X that do not have translations in language Y. (E.g. Japanese sentences with no English translation).
-
Yep we plan to do so, in fact we plan to add an "advance research panel" with this kind of research option
-
As an stop-gap measure how about creating a public list of sentences that need Japanese translating to English? That would let me get started contributing.
-
yep sure you can, anyway even with the capability to search sentence untraslated in language X, it's always better to directly have the possibility to see which sentences is reclaim by others,
(by the way some of our users used to post comment like "if someone could translate it in japanese it would be great", so the need is already there)
-
Someone with admin access has heard you and here is the list:
http://tatoeba.org/eng/sentences_lists/show/24
(And I notice that some sentences in this list should be deleted)
-
-
-
-
-
For now you don't :
http://tatoeba.org/eng/wall/index#message_24
As long as there isn't at least 2 or 3 "hardcore contributor" in Japanese, I will not implement the possibility to edit the romaji.-
That's your decision. But do note that there are some very wrong romaji entries.
-
Don't worry, I'm very aware of this. I just think that people can wait to have 99,99% reliable romaji.
And there used to be a warning tooltip, when you moved your mouse over the romaji, that said it was not reliable... but you made me notice that it's not there anymore. I guess we took it out by mistake while cleaning our code.-
You say that as if we are now at 99.98% accuracy. If you intend to have this project be a serious attempt for people to learn, please hide the Romaji if it's not going to be fixable. Currently, much of it is an abomination of on/kun mixups, and idiosyncratic spacing. Any beginners will just be confused by it, and anyone who knows how wrong it is will be frustrated by it. Also, what if a Japanese sentence is wrong? The romaji will be completely different, with added, subtracted, or moved words. If nothing else, add Romaji as a "Language". Long-term however, we're going to need some solution. I assume Chinese has the same issue, and Shanghainese likely does or will, too (I haven't seen any Shanghainese with ruby text yet). Eventually we'll be swimming in meta-languages. I'd say just nuke all romanization until it's done right. Also, doesn't wwwjdict already have every example sentence broken down into kana? I've used it in the past, but it was far too long ago for me to remember.
-
"Also, what if a Japanese sentence is wrong? The romaji will be completely different, with added, subtracted, or moved words."
I said this before I discovered where the romaji was coming from, forgot to go back and remove it before I posted, oops. -
in fact chinese doesn't have the same issue
the problem with japanese romanization in tatoeba, is that it come from Kakasi, which is no longer a "living project", so we have no way to report the error to someone, as nobody maintain it anymore,
for chinese it comes from adso, develloped by the guy of popupchinese.com, I've regular contact with him, and as he have a strong background in both linguistic and programming, he's tool is really reliable and support by a reliable community
moreover when a bad romnization is found, a new version is released in the day most of the times
so for chinese there's no really problem
for shanghainese yep we will need to make it editable for the simple reason currently there's just no tool do this automatically :p (I'm working wit the adso guy to support shanghainese, but as I don't expect reliable result before a long time, I will make it editable)
for Japanese, I let Trang answers, as I don't speak japanese at all-
Well, by "problem" I meant that the ruby can't be edited. So it does then? I understand that the mitigation of the problem is better for Chinese, but it's still read-only, correct? I also know that it's easier for Chinese, since the reading variations are more predictable, less common, and just plain easier to get right.
-
no it's not, and to be honnest i don't know if it's requested for chinese (for shanghainese absolutly it will need)
as I used adso also for other works than Tatoeba, and even in tatoeba, my chinese friends as not reported me yet a single sentence with a bad romanization, so I think letting people able to edit chinese will bring more drawbacks than advantages
* if a error is reported, i will notify it to adso guy, and waiting, the patch, we can add an hardcoded pinyin for the sentence
* if a chinese learner show a sentence where a chinese character has a very rare pronunciation, he will maybe correct it , thinking it's wrong, which is not intented
* the chinese romanization use characters with tones, and most of them cannot be input directly, neither with a IME
but after things are never stated for ever, and if we see the amount of errors from adso is too large to be covered by our current way of working, then we will make it editable
what do you think ?
(I only speak about shanghainese and chinese)
-
-
-
I've replied to Paul regarding the issue of romaji:
http://tatoeba.org/wall/index#message_223
> Also, doesn't wwwjdict already have every example sentence broken down into kana
Not exactly broken down into kana. It looks more like this: 誰も[01] が 私|1(わたし)[01] は|1 間違う{間違っている} と|1 言う{いった}
For this sentence: 誰もが私は間違っているといった。
But I know I could use this to make the romaji more reliable.
-
-
-
-
-
-
How are the tags used by WWWJDIC handled?
For example the following sentence should have a sense 1 tag on 誰も, but I don't see it in the edit sentence options.
http://tatoeba.org/eng/sentences/show/136692-
If by "tags" you mean the indices, or "B line", then they are accessible from a special page. I'll send you the link in a private message, you can also ask Jim for more details.
If you are referring to the feature from the old version, that was using the indices to link words from the Japanese sentences to WWWJDIC, then I haven't had time to re-implement it yet.
-
-
I do not appear to be able to change my password. Are passwords limited to a certain number of letters or something?
-
Paul, you're back! :O (Welcome back ^^)
I just tried to change my password and it worked... Passwords shouldn't be limited to a certain number of character, although I haven't tried anything above 20ish characters.
All I can tell you is to try again... Just be careful that there isn't any leading or trailing space if you're copy-pasting your "old" password.-
Hmm, still doesn't seem to work.
One of the passwords I tried was 11 characters with 8 letters + 4 numbers. The other was 8 characters with six letters and two symbols.-
Just tried with
testing123
(not a real password) and it didn't work.
I get the error message "An error occured while saving." each time.-
I tried with testing123 too, but didn't get any error.
Well, perhaps you can try to log out and log in again. This will make sure that the password you were entering in "old password" is the right password. I mean, perhaps you were entering a password that you tought were your current password but actually isn't.
If it's not that, then for now I don't know what it is... -
Actually, sysko made me realize that if the message is "An error occured while saving." and not "Password error. Please try again." then it means Tatoeba won't save your new password. I'm looking into this.
-
-
-
@Trang maybe it's related to the biptaste's bug, maybe it's not really closed
-
-
-
As how translations relation works in Tatoeba it's not so obvious and I've explain it a lot of times, i post it here, waiting a better place
some of you have been puzzled by sentences which are disapearing etc...
that's come from the fact that a sentence has "translation" and "translation of translation"
and you always have a "main sentence" (at the top and bigger)
so to be a bit theoric
AA is a sentence, i add a translatio "BB"
now we have AA <----> BB
if i translate BB as CC
now we have
AA<--->BB<--->CC
so BB is a translation of AA
and CC a translation of translation AA (it will displayer a bit more gray when AA is the main sentence)
so now if translate CC by DD
we will have
AA<--->BB<--->CC <---->DD
and when clicking on sentence AA , DD will not appear, but when clicking on CC or DD it will
why?
in fact you know that when you translate, even how good you are in translation, due to language/cultural limitation/difference, we can't have a 100% exact meaning (for example in english you have only "they" but in france you can translate it by "elles" (female) or "ils" (male)
so
Ils <--> they <---> elles <---> etc.
as you can see "ils" and "elles" are both correct translation of "they" but they have not the same meaning, that's why we need when "ils" is the main sentence, to say "hey elle is an undirect translation meaning is likely to be a bit different but still enough near to help you"
so now you can imagine with complex sentences, translation of translationf of translation etc.. are really likely to be totally different, so that's why we don't display them
that's why it's important to only translate sentences in the language you know, not translate it because it has a translation you know the meaning of
anyway in the next update of tatoeba we will review a bit the way it's displayed in order to make it more obvious
so that's globally why somes translations dissapear depending of which is the main sentences and why it's important to have the right main sentence
(I Hope i was not too long)
feel free to ask questions, I know I'm not really good for explanating :$ -
Please add support for deleting and editing comments. I just keep on posting translations as comments by mistake ... I've seen others do this too.
-
It's in our todo list.
Like I said here : http://tatoeba.org/eng/wall/index#message_142
"We're currently in a phase where we're basically cleaning up the code, optimizing the system and fixing bugs. And then we have a bunch of small improvements to integrate."
The possibility to delete comments is part of those small improvements. You can expect to have that sometime by the end of the month. -
Thats great. Good luck with that!
This site has the potential to be the #1 site for anyone learning a language! I can't wait to see it improved.
-
-
Hi, I'm new here, so this is tentative, but I have 2 gripes:
1. Some entries are just phrases, not "sentences", where by
sentence I just mean complete thought, informal though it
may be. It seems to me that we should sell whole loaves.
2. Each sentence should make sense on its own, shouldn't it?
Sorry if these were already discussed and decided; please
point me to such information. Perhaps I'm just too rigid.
The idea, it seems to me, is that if one goes to Tatoeba
or WWWJDIC to get a sample sentence, then one should GET a
comprehensible complete sample sentence.
-Dave
-
Yes, ideally each sentence should make sense on its own, even though everyone can pretty much submit anything...
But entries that are not "sentences" are likely to be deleted in the future. They will remain here for now because no one can delete them (except me) and we haven't really decided yet on what should be deleted and what shouldn't.-
Hmm, how about making them into sentences? I mean, the phrases are here because they are significant, right? If the principle is agreed upon that entries should be context-free full sentences, then I and others would feel comfortable changing them into sentences (it would seem pretty easy in most cases).
-
-
-
Are the moderators planning to add sound support for sentences in the future? I am very curios. I don't think its necesarry, but it would definitely kick things up a notch!
-
-
it's planned (like a lots of things :D) but for the moment i can't give you a precise date, but yeah we will also really enjoy providing people the possibility to hear what they read. And we're eager to hear our dear contributors' voice.
-
Wow sounds great! This site is going to be a great hit! Keep up the good work!
-
-
Now only full database download is avaliable.
I wonder if there will be tool (some kind of checkbox), that you check some sentences and then download them?
F.ex. I check spanish, english, and japanese and then I download generated CVS file with TAB or semicolon separated sentences.-
Yes, someday there will be something like that (that is, more customizable downloads). But it's not in our priorities at the moment (sorry).
We're currently in a phase where we're basically cleaning up the code, optimizing the system and fixing bugs. And then we have a bunch of small improvements to integrate. So I'd say you can't expect to have such a tool before April or May...
But you know, if you have time and some programming knowledge, feel free to code yourself such a tool ;)
And in case you actually do decide to code your own tool and put it online, let us know, so we can add a link to it.
-
-
Hi, I spotted a small bug in the list which you get when searching for "Example sentences with the words:".
When giving a new version of the original language, this sentence appears "Are you sure you want to translate this sentence into a sentence in the same language?", but I cannot click on the OK-button. In other views it works fine.
I use the latest Firefox on Mac OS X.
Cheers. -
hi, I'm new at this, and I wonder if there's an easier way to do simple edits. In using wwwjdic, i spotted some obvious errors and Trang told me I could fix them, so I did. Just now I tried the "Contribute" link, then 15 random Japanese sentences. But how do I get back to the sentence page to see if others need to be fixed? Page back gave a lot of Resends.
Also, I'd really like to go through them systematically, say a 100 at a time, not random so I wouldn't redo stuff. I'm not talking rocket science - my Japanese isn't good enough for complicated stuff - just things like my last few, "grand mother" -> "grandmother" and "he seems to very happy" -> "he seems to be very happy". Not sure if these are that useful, but I'm a finicky type and would be glad to do them.
-Dave
-
hmm, sorry, wasn't thinking - of course you can get many sentences compactly arranged by going directly to WWWJDIC.
-
Hi Dave,
I admit it's extremely annoying to do simple edits at the moment and believe me, if I had more time it would not be like that. It will be much easier someday, I cannot tell you when, but I can promise you it will be.
In the meantime, you can of course use WWWJDIC as you figured. But I want also want to point out that if you are using the "serial translators" page, the best way to proceed is to open a new tab whenever you adopt (that is, instead of left-clicking on the "adopt" icon, right-click on it and choose "open in a new tab"). This way you keep the list of sentences in one tab while editing the incorrect sentence in the other tab. And when you're done editing, you can just close the tab.
Also, don't be afraid to adopt as many sentences as possible, and remain their "owner". Of course it's better to understand the translations of the sentences you own, but it's NOT a requirement.
The thing is, by adopting a sentence, you will prevent others from changing them into something that might be incorrect. So you can also adopt sentences even if you're not going to edit them. I'd even encourage you to do that.
Ideally, each sentence should belong to someone. Even more ideally, each sentence should belong to someone who understands it perfectly and will respond quickly if there's a comment on their sentence.
And don't think that you're not helpful, all the corrections you are doing are actually very useful. Translating is not the only way to help in Tatoeba. We want to provide quality content but we know there are still many mistakes, and it's a huge task to correct them.-
Hi Trang,
Thanks! I tried opening a new window and that helps, tho by now I'm in the habit of just left-clicking, so I'll have to unlearn it. I have to disagree in a good way with "extremely annoying" - this site is so addictive! It's fun. On the adoption point, I read a book a while back on the early wikipedia that tried to have people controlling subjects, and it was an enormous drag on the project and slowed everything down. So they opened it up and things took off. That's why I release all my sentences. I could be wrong. But you know, I feel that I may be correcting valid Briticisms, for example, so open may be best.
いつもありがとう。
Dave
-
Yes, I agree that if there is too much control, it slows everything down. It was actually what I experienced as well, back in the old version of Tatoeba where there were moderators and each contribution had to be validated by a moderator. I had written a little bit about that here : http://blog.tatoeba.org/2009/01/new-validation-system.html
Anyway adoption was not intended to increase control, it was more intended to increase the involvment of contributors, as well as their responsibilities. And as stated in that blog post, it's also part of the validation process. The fact that a sentence belongs to someone doesn't mean it's error-free, but at least it indicates the sentence is less likely to have a mistake and you can trust it more.
But imagine: I want to translate English sentences into French, and I see this sentence which sounds strange to me, but I'm not sure if it's actually incorrect or if it's just that still lack of vocabulary. If I post a comment to ask about it, but there's no owner, the comment will just appear for a short time on the homepage and if no one answers quickly enough, it will basically go unseen. But if there's an owner, (s)he will receive a notification email and I will have more of a chance to get an answer to my question.
Also, if someone disagrees with your correction and post a comment on it, then you may learn something that maybe you wouldn't have seen if you weren't the owner of the sentence.
If you are correcting valid Briticisms, it doesn't really matter. If someone wants to have the British version back, instead of reverting your correction, they can just add it as a new sentence.
PS: Glad to know you're addicted :P
-
-
-
-
I was just wondering. The corpus is under CC:BY licence but when you're downloading the database dump, there is no way to re-associate a sentence with its original contributors.
Anyway, I guess it could be interesting to know who's contributing on which languages for collaborative translation research purposes.-
Well, the fact that the corpus is under CC-BY means you have to mention Tatoeba if you are going to reuse it, not that we have to indicate the original contributor in the dump.
It works like this : people provide their work under CC-BY, so the attribution of the work of each user is mentioned in Tatoeba itself, through the logs. But then we're reusing the work of everyone to make something else: the corpus, which we also redistribute under CC-BY.
But now, if you also need to have the username of the original contributor, or any other information that is not provided by default, you can just ask and I'll see what I can do =)
-
-
Hey guys. Finnish in the house.
Love your interface by the way. Great job.-
Hey, hey! I'm bourdu, collaborating with parent poster to have a finnish-french-english-japanese corpus. Keep up the good job!
-
Wow seems we have found the Batman and Robin of translations :D
Yep nice to find Finnish active again, it's been a long time without people contributing in this language, so congrats to have already added so many sentences and don't hesitate to report things you find strange / improvement or behaviour which bother you, we're eager to make Tatoeba a better place :)
-
-
-
- #

- lilygilder
- Jan 28th 2010, 20:45
To my fellow German translators:
+5,000 sentences! (And about 1,000 sentences in 10 days, wow!) German is now #5! :D Let's have some cake. -
Hi, new guy here,
Have you ever considered adding Brazilian Portuguese language?
Brazilians and Portuguese can understand each other pretty well, but the "natural" sounding totally difers between them.-
Hi vbkun, and welcome :)
We haven't considered adding Brazilian Portuguese, for one thing : because no one requested it. But also, it's not exactly a new language.
I feel it is a problem similar to the distinction between British English and American English. Perhaps the difference is more significant in the case of Portuguese "Brazil vs. Portugal" though, but technically, it is still considered as the same language...
So right now, the best I solution I can suggest is to add a tag [Brazil] at the end of sentences that are in Brazilian Portuguese.-
I also don't think it's really a big problem, I was mainlly just worried bout people learning the language VS natural sounding ;).
Because I guess in most cases (maybe 90% of them, or even a bit more) the sentences are understandable for both sides.
Guess I'll then add those tags in my future contributions for pt-br ;)
-
-
-
Coquille :
Il n'y a pas (encore) de résultats pour cette recheche mais vous pouvez nous aider à alimenter (...)
=>Il n'y a pas (encore) de résultats pour cette recheRche mais vous pouvez nous aider à alimenter (...) -
There are a number of sentences where some words are tagged with weird bracket syntax, like this one;
"There are days where I feel like my {brain}{1} wants to abandon me."
Do these have any meaning to the system, and should they be preserved when adding new translations?-
No, you don't have to preserve them when adding new translations. They used to be used, and they may be used again someday, but right now they don't mean anything anymore to the system.
You can find a short explaination at the very bottom of this page:
http://tatoeba.org/eng/pages/download-tatoeba-example-sentences
-
-
Hi, how about adding Latin as a separate language? It appears to be a core language for almost all of Europe's languages (from a cultural point of view, of course, not the linguistic one). Latin proverbs and sayings are actually original versions of correspondent idioms in many languages.
-
As for all other language, we're opened, we just need people adding some sample sentences in this language, as soon as you can provide us around 5 sentences, we can consider adding it.
-
Great, how about:
http://tatoeba.org/eng/sentences/show/349115
http://tatoeba.org/eng/sentences/show/349096
http://tatoeba.org/eng/sentences/show/349347
http://tatoeba.org/eng/sentences/show/349457
http://tatoeba.org/eng/sentences/show/349463
As for flag, consider 'SPQR' symbol :)
-
-
-
Looks like there’s something wrong with the Chinese pinyin transliteration... the pinyin of some traditional Chinese characters are not properly displayed :S
-
- #

- JeroenHoek
- Jan 12th 2010, 15:02
Why are the flags made lighter than they are in reality? The Dutch flag now looks exactly like the Luxembourg flag.-
That's because back at the time when I introduced the flags as the indicator of the language of a sentence, I felt that the real colors were too strong and needed to be lightened a little bit to fit more to the overall colors of the site.
But you're right, it leads to issues such as having a flag of a country looking like the one of another country. -
But anyway if someday luxembourg language is added, its flag too, and it will look lighter than dutch flag :) (and the contrast between red and blue is kept as the red is also lighter)
-
-
in fact as written in the download section, the last csv was generated a month ago, when estonian was not present in tatoeba, so at the next generation, estonian will be in it (in the same time we will review the way we generate csv to make it more "automatic" )
-
In case you are interested, I have updated the files. You should find Estonian now.
-
-
Is there a way to delete a sentence? And a comment?
There also seems to be sync issue as submitted changes are not there. I think I have entered 2 sentences, but stat shows 5... A have corrected sent several times, but the old version is still there... And each edit seems to add a new sentence not to modify the existing one. Well I guess don't follow the UI logics...-
Hi,
There is currently no way for users to delete a sentence, nor a comment. Of course someday it will be possible, but we don't have much time for now and it is not a "vital" feature.
Instead of deleting a sentence, you can always replace it by another one.
As for deleting comments, well it doesn't really hurt anyone if you have posted a comment unintentionally :)
Now regarding your sync issue... I'm not sure to understand how you proceeded. But you can try to read the help and see if it helps: http://tatoeba.org/eng/pages/help
There is one thing that is not mentioned though: if you want to modify a sentence, you have to click on it. Then a form will appear and you will be able to edit it.
For instance here, this is one of your sentence: http://tatoeba.org/eng/sentences/show/344674
Try clicking on it, then change the text, then press OK.
Generally speaking, we are aware that the UI is not intuitive for everyone. We're trying our best but keep in mind that this is a project we're working on in our free time, so sometimes you have to bear with us.
Anyway you can always help out by telling us what we should display and how we should things so that you would understand how it works :) -
Oh by the way, you have to be careful while replying to messages on the Wall. You had replied by mistake to Swift's message (http://tatoeba.org/eng/wall/index#message_88). But I moved your message to the top.
It's also better to post a new message (instead of replying) if you are talking about another topic. We check the Wall everyday so you don't have to worry about your message going unnoticed.
-
-
Hi,
Could not find Estonian in the language list, please add, if possible.-
No problem, but could you please first add a few sentences/ translation, even if not detected by the system, that way it will give us some sample to test the detection for Estonian and integrate it well :)
-
I've just seen you have added yesterday five sentences in Estonian :)
if you know how to say them in english you can translate them as other people are more likely to know how to translate from english, and it will be far more usefull for people who wants to know sentences in Estionian.
anyway thanks to make Tatoeba more complete :D -
-
-
The readings for <arabic numeral>日 are incorrect, but <kanji numeral>日 are correct. Figured I'd leave a note here rather on some particular entry as it applies to many.
-
-
Great! Furthermore, <numeral>日間 should also have the same reading as <numeral>日+かん. When there is no numeral, it is read ひあい.
Further <numeral>日 exceptions:
*一日の長(いちにちのちょう)
*一日一日(いちにちいちにち)
*一日一夜(いちにちいちや)
*一日増しに(いちにちましに)
*一日置き(いちにちおき)
*一日片時(いちにちへんじ)
*一日路(いちにちじ)
*一日千秋(いちじつせんしゅう/いちにちせんしゅう)
*一日を過ごす(いちにちをすごす)
*一日三秋(いちじつさんしゅう/いちにちさんしゅう)- anthy only gives the kanji when inputting the former reading.
*一日一善(いちにちいちぜん)
*一日二日(いちにちふつか)
*三日月(みかづき)
*七日鮫(なのかざめ)- anthy cannot produce this.
I didn't dig any deeper into this, but I imagine there are loads of special cases to be found.
-
-
-
The search results currently have a line with a couple of icons, one with a link to the item and the other to the comments.
The second line is the sentence itself, and the following lines have translations of it. The translations are links to those items, but the sentence is not a link.
It seems that things could be simplified and made a bit more intuitive if those icons were dropped and the search result sentence made into a link to that item. We'd lose the comments link, but it points to the same page anyway. -
- #

- lilygilder
- Dec 23rd 2009, 17:21
About the community building mentioned in the Tatoeba blog:
Are you planning on creating a forum? I think that would help the community grow and having different threads for introductions, bug reports, feedback, etc would make it easier for users to follow the discussions. I'm not sure if it is necessary yet - there are only a bit under 300 members right now - but it would be cool nevertheless.
What do other users think?-
Yes it's in our todo list (like sooooo many other things). But as you noticed, right now there aren't that many members.
That's why we set up this "Wall", it's more simple and for now it is largely enough for people to report bugs, give feedback, introduce themselves or write about whatever :)
When the Wall will start reaching its limits in terms of usability, we'll start considering the forum solution.
-
- #

- lilygilder
- Dec 23rd 2009, 11:57
Hi there,
What can I do with repeated sentences? Is there a way to link one entry to the other or maybe even merge them?
Anyways, thank you for this wonderful project.-
You don't have to worry about them. We take care of merging them :) We actually already launched a loooong cleaning process a few weeks ago, it removed about 10,000 exact duplicate sentences.
We're going to launch it again sometime, after we've cleaned the sentences from typos or extra spaces where there shouldn't be or things like that.
Anyways, thank you for your contributions. I'm happy to see German getting popular again :D It used to be the 4th language in Tatoeba, until extremely motivated contributors in Chinese and Spanish came along...-
- #
-
- lilygilder
- Dec 23rd 2009, 12:42
Does this cleaning programm also remove nearly identical sentences? I found a pair where the only difference is the punctuation mark... I'm glad you don't have to do that manually...
I'd be happy if German took the fourth place again. I'll see what I can do and show some competitive spirit. =) This is a fun way to pass time and help other language learners. :)-
No it doesn't remove nearly identical sentences. I've seen sentences which differ only from the punctuation, but... Well this is a bit tricky.
If you take Japanese, there is supposedly no question mark or exclamation mark (although I suppose it's changing). Instead you have particles to express a question or an exclamation.
The fact that you write "I'm cold." or "I'm cold!" can change something in the Japanese sentence (samui desu / samui desu yo).
So to be safe, I wouldn't delete a sentence that has a nearly identical twin, with only a difference of punctuation.
-
-
Hi,
1. Can I translate more than 1 sentence each time? I want to translate in the rate of 50-100 words/day but my connection isn't fast and I don't want to wait for each senetence.
2. Is there any way to download the sentences for languages other than Japanese/English, e.g. Indoenesian or Hebrew, then translating and resending them to you so you can import the translated sentences in your database?
Thank you,
-
I've answered to your email but I'm posting my reply here too in case other people wonder.
1. For now, the only way you can translate more than one sentence at a time is from the search. Type a word in the search bar, it will display results (provided you entered something in English or Japanese that is quite common). You can translate from the results. It displays 10 sentences per page but I can make it display more sentences per page if you feel this is a good solution for you to translate several sentences. Just let me know.
2. For now it's not possible yet. I mean, I probably could, but I have very little time (especially these days), and I'd rather set up some automated system for that.
Besides, we did consider giving the possibility to users to download a file with sentences formatted to be opened with a program called PoEdit. You will then have a page somewhere on Tatoeba where you can upload this file, and it will import your sentences. It's not likely to be implemented before end of January though. -
Alright, we've added a "show more..." link in the random sentence. You can click on it to display more than one random sentence :)
You will be able to specify the language and the number of sentences displayed.
-
-
Could we translate sentences into language not listed in the site?
I can translate it into Malay language.
P.S. Great site!!-
the answer is just below
http://tatoeba.fr/eng/wall/index#message_58
spoiler : yes you can ;-)
thanks for your contributions in chinese :) -
For malay language wikipedia told me there's a lot of different malay, as we make difference between dialect, is this a specific form of malay or it's "standard malay" ?
http://en.wikipedia.org/wiki/File:Flag_of_Malaysia.svg
is this flag suitable ? -
Normally we've added Malay as a supported language :)
You'll still have to add a few sentences to check that the language detection does work properly though.
-
-
- #

- tinacalysto
- Dec 18th 2009, 15:08
Hey guys, any chance of having Norwegian language added?
Thx and congrats for the great site (which has become my new hobbie)!
P.S.: Portuguese GET!-
Yes, actually someone else has also requested us to add Norwegian. He actually asked to add both Norwegian Nynorsk and Norwegian Bokmål.
But for that, we're waiting until we have either :
1) Our own language detection system (because for now we're relying on Google's detection, which is reaching its limit...)
2) Or added a feature that enables people to indicate the language of the sentence (instead of having is systematically auto-detected).
Now you have to know that it is not forbidden to add Norwegian sentences, even if it's not "officially" supported. You will not be able to set the language as Norwegian (yet), but you can do that later, when we actually add Norwegian as a supported language.
Besides, it will actually give us some pressure to add it as soon as possible :P
Anyway thanks for your support! We're always glad to see motivated people like you joining the project :D
And congratulations for bringing Portuguese to the 6th position in terms of number of sentences!-
- #
-
- tinacalysto
- Dec 18th 2009, 18:37
>>>> You will not be able to set the language as Norwegian (yet), but you can do that later, [...]
Great, I'll do that. I don'
Regarding the bokmål/nynorsk differentiation, I guess something similar happens to Portuguese... in most cases one can handle to write a phrase that sounds like Brazilian Portuguese and that spoken in Portugal, but sometimes that's just impossible. Same thing for African Portuguese, which sounds to me almost like a different language. In this case I'm indicating in the phrase 'Portugal'/'Brazil'.
-
- #
-
- tinacalysto
- Dec 18th 2009, 18:40
(Crap, I sent the message by mistake without having finished it)
Well, I don't think the bokmål/nynorsk differentiation would be a problem. Most online dictionaries I've seen so far deal with bokmål as the pattern.
(btw, thanks for your gentle commentary)-
The thing is, we want to be as accurate as possible.
I have no idea how different Bokmål and Nynorsk are, but I believe they are more different than the difference between Portuguese in Portugal and in Brazil. There must be a reason why there are two different language codes for each in the ISO 639-3 codes (http://en.wikipedia.org/wiki/Norwegian_language).
Besides it could offend some people if we don't make the difference ^^'
-
-
and congratulation to have contribute to have made today the second (and with a little bit effort) day in term of contributions
http://tatoeba.fr/eng/contributions/activity_timeline
-
Norwegian Bokmål has been added as a supported language :)
Please, when you have time, add a few sentences in this language to check if the language detection works properly.-
- #
-
- tinacalysto
- Jan 5th 2010, 19:28
thank you. The language detection is working fine.
-
-
- #

- grantortino
- Dec 18th 2009, 13:58
why i cannot find my sentences in your search engine.
example:
Sentence nº340251
浅草寺にはずいぶんたくさんの人がいるんですね。-
I'm not the one who made the search engine part, but it seems that the index is not updated in real time, certainly for perfomance reason, so in few times your sentences will be available :)
-
Yes, we're not indexing on the fly. The main reason is that I didn't (and still don't) have time to figure out how to do that ^^'
Usually I launch the indexing process once a month but considering the increase of contributions, I think it'll be more once a week now...-
If you don't mind me asking, what kind of database engine is behind tatoeba.org? SQL Server/mySQL or other?
-
It's MySQL :) But for the search feature we're using Lucene (http://lucene.apache.org/java/docs/).
-
I regularly use SQL Server, so I'm not much help with mySQL, but maybe this link might help http://wiki.apache.org/lucene-java/UpdatingAnIndex
-
Thanks :)
Right now though, I must say it doesn't speak much to me... Also, MySQL is not really the issue here (because I know MySQL and it doesn't help me :P).
The issue is to know how to use Lucene (which is written in Java). I just have to take the time to read the documentation.
The search engine part of Tatoeba was coded as a school project, at a time when I didn't have much knowledge in programming but had a good partner who knew Java and so he pretty much did all the coding.
Someday I'll have to look into his code. I'll probably have to upgrade to the latest version of Lucene as well because our code is from like, 2 years ago. Someday... When I have time.
-
-
-
-
-
Qu'est-ce que je fais si tatoeba ne reconnait pas la bonne langue d'une nouvelle traduction?
("Bill war in Japan." est allemand et ne parle pas d'une guerre au japon :D!)
-
With regards to Chinese entries, can we have some way of distinguishing between Traditional and Simplified entries?
-
In fact I was thinking to add an option to convert sentences in simplified chinese to traditionial chinese, and vice versa, wouldn't it be better that way ?
-
Yeah that's a good idea. Means that all the existing entries, in either Traditional or Simplified will be preserved.
-
The other thing regarding Chinese translations that probably needs consideration, is that there are 3 or 4 major regions where Chinese is spoken (Taiwan, Hong Kong, PRC, Singapore), but each region often has a slightly varied vocabulary set to represent the same meanings in another language. I'm no expert on this, but I'm pretty sure a Taiwanese person would translate the English word "Potato" to "馬鈴薯" whereas in the PRC (Mainland) they more commonly translate it to "土豆". Maybe we need the ability to choose the "Region" of our Chinese translations?
-
Yep we have recently migrate the code of language from iso 639 alpha 2 (name of languages coded on 2 letters) to alpha 3,
http://en.wikipedia.org/wiki/ISO_639
which allow us to make more precise distinction about languages (as you can see there's already shanghainese)
but for the moment the problem is not really technical, but mostly ergonomical "how do we present it in a nice way, without overloading a sentence with billion of buttons",
moreover the problem can exist with french, canadian french etc... so I agree, its something we will need to handle one day or another
after we need to keep in mind that a beginner maybe don't want to see these regional variations, and only focus on "standard" version, so here come again the ergonomic problem
in fact for the moment if you plan to add "regional" sentences, just add in () which region it is, that people will be aware its not standard mandarin
I will notice you when we will be starting handle this :)
by the way thanks for your contributions :)
(French ?)-
Yeah I understand.
(When you get round to it, you could possibly make the flag icon a drop-down list of regions for that language, so that if we want to we can mark the translation as region specific.)
By the way I really like your site :).
I'm an Australian studying in Mainland China.
-
-
-
-
-
-
Find a work around for those adding in right to left languages (such as arabic)
and who get a strange characters order (see http://tatoeba.fr/eng/sentences/show/340400 for an example)
just edit your sentences and this ‏ to end, it's the xml entities to indicate switching writing direction :), for some strange reason, independant of Tatoeba, I've got the same problem in different text editor while trying to repeat this bug, this control character is sometimes missing
I will try to find quickly a automatic way to get it work properly
-
- #

- Luai_lashire
- Dec 15th 2009, 02:00
I've only just joined a few minutes ago.... I have favorited several sentences, but my profile still says I have 0 favorite sentences. Does it just take a while for them to show up, or is there some problem?
Also, what does it mean to "adopt" a sentence?
Sorry for newbish questions, but your site lacks a good "about" page that introduces all this to newcomers. :/-
adopt means this sentence now belong to you, and you will be the only one allowed to make change on it, and you will receive email notification ( if set in your profile )if someone comments on it
that way we're sure that they will be no "war of edit" or people editing too much sentences
for favorite, you will soon seen them :)
have you checked http://tatoeba.fr/eng/pages/help ? ( in bottom right) ? (maybe not so much visible)
-
Tatoeba should use a license without "by" like CC0:
http://creativecommons.org/about/cc0
Attribution is unnecessary and unpractical.-
in fact it's only legal problem european law say one can't abandon his moral against a text, except 50 years after his death, 70 years in France, so CC0 can't be choosen
anyway we're looking if there's any problem to go to a less restrictive licence such as CC-BY, we will be sure at the end of the week-
I like cc-sa (is almost Public Domain!) http://creativecommons.org/licenses/sa/1.0/ sadly "retired"
-
unfortunately as explain in my last message, due to european/french author right, attribution is mandatory and CC0 is still not clear whether it works in france or not, so we prefer to be safe, regardin that make law pursuit for copyright violiation is "fashion" in france ...
so the most "free" we can do is "CC-BY" ( for the moment my research hasn't show anything against it, but I prefer to check juridiction of main countries), when CC0 will be clearer regarding countries which has the notion of moral right (basically all european countries) , for further information, you can read the CC discussions pages, there, you can find more precise technical explanation :)
-
-
-
*his moral right
that means globally that we must attribute works of contributors, as we're based in europe and a major part of contributions (except takana corpus original sentences) after some internal discussion we've realized that maybe CC-BY can be used, as Tatoeba MUST attribute works, after if people want to reuse the contributions without attributing it to original contributors, that will be their problem (in fact no problem as long as they don't reuse without attributing sentences or corrections from european contributors or other countries where public domain is different from US definition)
so the licence is only to make things clear
by the way, we wouldn't have take a long time to choose a licence or so if there were no threats nor possible juridical problem, I far prefer coding than looking into law books -
the content will now be licenced under CC-BY 2.0 FR, which is for the moment, the less restrictive we can do according to european law
-
-
- #

- tatoerique
- Dec 7th 2009, 21:37
Is this a bug? When I do something like this:
-Add a new translation in a sentence for a language which were not present (e.g. Spanish).
-Press "show another" or go to another sentence to edit.
-Do a search for the sentence I edited in the first place, because I want to modify the Spanish translation.
The Spanish translation does not appear, and actually the sentence number of the sentence found does not match. If I add a Spanish translation to this, the sentence becomes duplicated (all languages). It occurs, for instance, in sentences 339047 and 339048.
Is this normal? Thanks in advance.-
Hmm, well, if I understood what you did, it's not a bug.
There is one thing in my todo list that I really should do (if only I had more time), and that is : hide all the translations when someone clicks on "Translate". Then people will probably understand better that they are not translating a group of sentence, but only one particular sentence.
So in your case, the sentence 339047 was a translation of an English sentence:
Can you deliver this? <-> Le importaría repartir esto?
When you added your translation, you also INDIRECTLY translated the Japanese sentence. Because initially you had:
配達してもらえませんか。<-> Can you deliver this?
And when you added your translation, here's what happened:
配達してもらえませんか。<-> Can you deliver this? <-> Le importaría repartir esto?
(See? Indirect translation.)
But when you did your search, you probably searched the Japanese sentence. And the search results only display sentences and their DIRECT translations.
So, you added a translation to the Japanese sentence, and the whole thing became linked this way:
Le importaría repartir esto? <-> 配達してもらえませんか。<-> Can you deliver this? <-> Le importaría repartir esto?
And now you have to know that when you BROWSE a sentence, we display both the direct AND the indirect translations. Which is why you will see two Spanish translations for http://tatoeba.org/eng/sentences/show/121527.
One is the direct translation. The other is the indirect translation.
Hopefully I understood properly your problem and that my explanation was somewhat clear...-
- #
-
- tatoerique
- Dec 10th 2009, 20:13
Ok, I'm sorry if I caused trouble. I didn't know exactly how the direct & indirect translations work. I'll be more careful from now on. Thanks for the explanation.
-
-
Which will be the first, french to reach 26 000 sentences or chinese to reach 3000 ?
(congratulations for spanish contributors and esperanto, they have reach 2000 and 100 sentences ! :D ) -
Is there any way to correct the romaji in Japanese sentence? I suppose it is computer generated and sometimes it is only weird but on some occasions it is painfully wrong.
-
sorry , I was make a new meaning of sentence in czech language but i don't known how to setting flag czech republic with it ^^! please help me
-
normaly on the main page, it should told you've added sentence that we can't determine the language
http://tatoeba.fr/eng/sentences/unknown_language
there you will be able to set your language as czech :)
PS: i've added a sentence in csech (1st article of human right) found on a website, hope there's no mistake in it-
ok ! , thank you a lot . But can you pin that link into some where easy to find ? like in user profile :)
-
in fact it should be displayed in the main page :p
anyway as told on the post behind, we're going to code a lot this weekend, and to show the manual language selection just after adding a non-detected sentence :)
-
done , now you can access to "unknow language sentences" from your profile :)
-
-
-
-
Hello all,
Here is a new bug i just found : when asked for a sentence langage (when not detected automatically), then confirm langage, the main page won't update and keep displaying an unknown langage.-
It's not really a bug, although I understand it can be confusing... I guess we'll have to rethink the display of the latest contributions.
The thing is, what we display in the "latest contributions" is actually the logs. When you add a sentence, it also adds an entry in the logs, with the text of the sentence and the language that has been detected at the moment you have added it.
And since logs entries are not supposed to be modified, well, they are not updated if you update the language of the sentence. (We don't log language modifications)
It's a bit annoying now because Google has changed their language detection algorithm and a lot of sentences end up with an unknown language :'(
Anyway this weekend the whole team is going to gather and work on Tatoeba, and we're going to introduce the possibility to set the language of the sentence manually (instead of always having it auto-detected).
-
-
can you add czech language ?
i would like to contribute some , and also i'll ask my friend to help us-
Yes, we can. I can't guarantee it will be done before this weekend though (as far as I'm concerned I have a busy week).
In the meantime however, you can always add sentences in Czech, even if Czech is not yet "officially" supported.
Your sentences will be categorized as "unknown language", but later (once we add Czech) you will be able to specify that they are Czech. -
Thanks to take time to add your languages :D
but could you just respect a convention
begin sentences by an upper case and finish by a period :
"I love you." instead of "i love you" , it's just to make detection of duplicate easier, thanks :)
-
-
Another bug report :
- Mailbox screen in french version, when "folders" is highlightened, the "mailbox" box appears partly (low part is cut).
- After sending a message via mailbox, then visiting the "sent" screen, i noticed that my mail is indicated as "sent 12 hours ago" (instead of about an hour or two). -
Hey all, here's a bug report :
- can't set up my birthday date ;
- can't create paragraphs in the field "something about you".
- sentences owned counter seems to have a glitch -> it displays 10 instead of 215 on my profile. Seems like it only counts the ten lasts contributions since i contributed far more than 10 times at my last connection...-
Thanks, we'll look into this :)
It probably won't be fixed before Tuesday though. As far as I'm concerned I have a midterm exam on Monday. -
-
-
You can set your birthdate now. There's just the problem that anyone who hasn't set their birthdate is born on November 30, 1999... :D
-
-
Congratulations for your brand new web version. It looks really nice and well-thought ! Seems i'm the first to post here ^_^

















