menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Wall (6,960 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages subdirectory_arrow_right

marafon

4 days ago

feedback

CK

5 days ago

feedback

sharptoothed

10 days ago

subdirectory_arrow_right

Cangarejo

10 days ago

subdirectory_arrow_right

Cangarejo

13 days ago

subdirectory_arrow_right

Thanuir

14 days ago

subdirectory_arrow_right

ondo

14 days ago

subdirectory_arrow_right

ddnktr

14 days ago

feedback

ondo

15 days ago

subdirectory_arrow_right

AlanF_US

18 days ago

boracasli boracasli January 21, 2011 January 21, 2011 at 8:23:35 PM UTC link Permalink

google is a mistake
only a piece of mistake

why? (but do not warn me, I'm right)
because the translate detects the language as "WRONG!"

Esperanto sentence as Italian, German, French, Turkish, Spanish, English...

Turkish sentence as Swedish, Uzbek, English...

Persian sentence as Urdu, Pashto...

{{vm.hiddenReplies[4804] ? 'expand_more' : 'expand_less'}} hide replies show replies
papabear papabear January 21, 2011 January 21, 2011 at 9:40:59 PM UTC link Permalink

Maybe in the future Tatoeba's work will help them improve their detection algorithms.

{{vm.hiddenReplies[4808] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko January 21, 2011 January 21, 2011 at 9:46:06 PM UTC link Permalink

actually to what I already know, google already use a "corpus" to train its detection system with billion of words, so to a far larger scale than us , it's just they don't have access to some data we have (for example I'm really unlikely to add a sentence in Spanish tough sometimes a very very short Frech sentence can look like a Spanish one, and we know that the size of an input will be between 1 and lets say 100 words etc. , google can't do that).

{{vm.hiddenReplies[4810] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko January 21, 2011 January 21, 2011 at 9:47:21 PM UTC link Permalink

but for sure as we support some languages Google doesn't (Shanghainese, Lojban etc. ) our data can help them for these languages, as anyway now they just can't recognize them.

sysko sysko January 21, 2011 January 21, 2011 at 9:39:58 PM UTC link Permalink

Yep but it does work for 90% of the case, so it's a "better than nothing" solution. Don't worry, as soon as we will have time we will manage to replace it by home-brew code, as google algo are made for big piece of text, so it's not made for "10 words or less" input.

{{vm.hiddenReplies[4807] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko January 21, 2011 January 21, 2011 at 9:41:35 PM UTC link Permalink

so i think a "single sentence" optimized algo, which some heuristic (will put a higher "weight" to language in which you've already contributed in) we should be able to provide a lower "false positive" rate.

boracasli boracasli January 21, 2011 January 21, 2011 at 8:27:20 PM UTC link Permalink

700,000 sentences are there in Tatoeba anymore!

Dejo Dejo January 18, 2011 January 18, 2011 at 6:05:03 PM UTC link Permalink

Join Facebook causes: Amikaro de Tatoeba
http://www.causes.com/causes/56...ers?m=96078092

Aliĝu al "Amikaro de Tatoeba" en Facebook

{{vm.hiddenReplies[4789] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG January 21, 2011 January 21, 2011 at 3:23:54 PM UTC link Permalink

Interesting, I like the initiative :) ...even though I don't really understand Esperanto.

What does "japanan frazokolekton" mean? Japanese-French dictionary?

{{vm.hiddenReplies[4802] ? 'expand_more' : 'expand_less'}} hide replies show replies
Dejo Dejo January 21, 2011 January 21, 2011 at 4:09:38 PM UTC link Permalink

It means "Japanese sentence collection" "frazo" -sentence +kolekto-

GrizaLeono GrizaLeono January 19, 2011 January 19, 2011 at 12:23:48 PM UTC link Permalink

EO: Ĉu mi rajtas sugesteti, ke ĉe ĉiu lingva versio de frazo, ties numero videblus. Tio kredeble plifaciligus la rekonstruon de la traduka historio.

NL: Mag ik een kleine suggestie doen: dat bij elke taalversie van een zin, het bijhoerend nummer zou zichtbaar zijn. Dat zou de afleiding van de vertaalgeschiedenis vergemakkelijken, denk ik.

FR: Puis-je faire une petite suggestion: que chez chaque version linguistique le numéro correspondant serait visible. Cela rendrait la reconstruction l'histoire de la traduction plus facile, je pense.

{{vm.hiddenReplies[4790] ? 'expand_more' : 'expand_less'}} hide replies show replies
jakov jakov January 21, 2011 January 21, 2011 at 2:11:04 PM UTC link Permalink

Kvankam vi ebel pravas pri tio, on ialdonu, ke ne cxiam estas bezonata scii la historion. Se temas pri malbona traduko: malligu la frazojn. Ne gravas kiu frazo estis unue, cxu ne?

Tamen estus bela informo ankaux por scienculoj ekscii kiel evoluas tradukoj, do kiuj frazoj kiel tradukigxas.

Mi sugestas aldoni tiun informon kasxe, por ke gxi ne troigu la videblan spacon.

ludoviko ludoviko January 17, 2011 January 17, 2011 at 12:45:12 AM UTC link Permalink

* English sentences without an owner *

I just tried out 20 random sentences in English. 10 of them had an owner, 10 not. No one can be sure about the quality of the 10 sentences without an owner - which means perhaps half of the English sentences.

What are the current ideas to solve this problem, to secure the quality of tens of thousands of English sentences? (How about Japanese by the way?)

{{vm.hiddenReplies[4777] ? 'expand_more' : 'expand_less'}} hide replies show replies
pandark pandark January 19, 2011 January 19, 2011 at 9:06:16 PM UTC link Permalink

Maybe I should wait until Wednesday to say that, but I'm quite playing against the rules as (most of) my (few) sentences are free.
I'm still not convinced that an owned sentence is more reliable than a free one (all the more so new sentences are owned by default), while it is less easy/quickly corrected.
Still it would be nice to check the one with "- date unknown" as first log in their history, comment them or even adopt them if you really want it :o)

{{vm.hiddenReplies[4791] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko January 19, 2011 January 19, 2011 at 10:43:33 PM UTC link Permalink


In a near future, you will have million of sentences, most of them will be owned, let's forget the "reliability" aspect for the rest of my post
A sentence on tatoeba is not an article on wikipedia, it's too small to be considered as something made "by the crowd", it will always be the production of ONE guy, tough it may be revised / agree by other people in the future. So at the opposite of a wikipedia article, at one moment for one sentence, you always have someone who can explain you WHY did he wrote it this way or did he considered the sentences as correct (by adopting it).

So 2 big cases in tatoeba

1 - The guy who added the sentence is a native.

I'm also a native and I think there's a mistakes, so yep without "ownership" it would be faster, I would be able to directly correct the sentence.
BUT:
a - I can also add a mistake when correcting it (while rephrasing it)
b - It can be a rule that I'm ignoring myself (for example with the new orthograph in french please can both be written "s'il te plait" or "s'il te plaît" with or without the ^ on the i, and without ownership a lot of users would have corrected it tough it was correct)
c - It can be a "local" way to different from mine (American english vs British english , same for French etc. )

For all these case, with the ownership, it oblige you to "request" for your change, if it was a typo, it will be corrected soon (well maybe in some days, which can ben consider as long, but I want you to remember that tatoeba is 4 years old and we hope it will be much more older, so having a mistake during some days is not important for me, as long as it is corrected)
For the other case, the owner will be able to explain why he disagree with you, and then we're really in a "collaborative" (we're discussing even arguing, but well it's part of the game) project, otherwise for me it's only a "parrallel" contributive project, (i.e I add and I leave, I correct and I leave etc.)

I'm a learner, by having an owner, I know that i will be able to ask for explanation to someone, and so making once again tatoeba much more valuable than a phrasebook, as you can contact a "real human"


2 - The guy who added it is not a native

Then I maybe add it waiting to be corrected, so yep once again, it would be faster if someone can correct it directly, but that will not help me, I would like to know why I was wrong. Moreover maybe the native who propose the correction does not know the language I was translating from, and so maybe his correction, tough more "natural" than mine, is maybe no more "sync" with the other language.


I know for some of thes points, we can imagine a "being warned when changed occurs on this sentence" to replace this, but for this, in addition to the fact that tatoeba is not finished, so yep this feature will be great, but ownership was easier to code and is already present, it will not present an other case where ownership can be usefull

3 edition war

I added a "contrervial sentence" for example, "the guy who is coding the new version of tatoeba is a retard", (replace this by any political/religious/football sentence you want) then, there's no reason to edit the sentence, it is correct and not against French law (which is the only reason we will delete correct sentences). But maybe some not "open minded" people (Hey guys, tatoeba is "example sentences" database, it's only to illustrate words / grammar rules, so don't take it too seriously) will be tempted to edit, and others to revert it etc. etc. etc.

The 2 only reasons I see against ownership is

1 - what if the guys is a retard who add mistakes without correcting them etc.
2 - what if the guys is no more active on tatoeba
But for these two things we do have moderators, so we can correct them directly (we do have this power) even if it's owned. And if it's really "huge" (the guys owns thousand of sentences) then can simply "free" all his sentences.

{{vm.hiddenReplies[4795] ? 'expand_more' : 'expand_less'}} hide replies show replies
pandark pandark January 22, 2011 January 22, 2011 at 11:52:14 PM UTC link Permalink

>> A sentence on tatoeba is not an article on wikipedia, it's too small to be considered as something made "by the crowd"
The size doesn't matter (I just HAD TO say that ^^).

>> you always have someone who can explain you WHY did he wrote it this way or did he considered the sentences as correct (by adopting it).
I have (at least one) counterexample(s).
But lets come back to the first idea. A sentence is neither simple nor monolithic. I have seen many sentences where there were more than two people contributing (even though most of the time, only one "pushed" the modifications). Usually, the first "contributer", the one that owns the sentence doesn't have anything to do with the following modifications. He just do or do not the changes. I don't see how he is more legit to make the decision than anyone else.

>> I know for some of thes points, we can imagine a "being warned when changed occurs on this sentence" to replace this, but for this, in addition to the fact that tatoeba is not finished, so yep this feature will be great, but ownership was easier to code and is already present, it will not present an other case where ownership can be usefull.
For now, that means only one person at a time can follow a sentence. And the only other way to be noticed (if someone is nice enough to notify the change in the comments) is to leave a comment on the sentence, and then you can't help receiving the messages if you want (and end up disabling the notification I guess). It might have been easier at some point but it is far from perfect.

>> who added it or at least the guys who says "I take care of this sentence" so if it's a well known user (CK for english, Sacredceltic for French etc.) then you already know it's a quite reliable sentence.
Should I come with a counterexample here? (I have some).
Furthermore, isn't is quite unfair? aren't other people sentences as trustworthy as the ones from these "famous" users? and will those last ones left in error?

I'm sorry but I don't get the whole google fight thing…

>> I know that then "you can look on the historic" can be also an answer BUT, going on the historic supposed you to click on the sentence, I know it sounds stupid and lazy, but well when I'm at the pause and seeing some random chinese sentences to learn some stuff, I really don't want to have to click to know that, 80% of the chinese sentences are owned by 3 users which I consider as reliable, so it saves me hundreds of clicks, and when I see a sentence which is from a "unknown user, then and only then I will click on his profile (and once again clicking directly on his profile rather than the sentence and then his profile trough the history save me a click, and a page loading, which is an eternity here)
This is a practical issue (accessibility), not a technical one. Consequently, I think that is not a very good reason and it could be quite easily fixed.
Anyway, the idea I suggested on IRC would solve the problem. In this case, the fact that there would be several versions of the sentence instead of one sentence with a log of modifications : the last version's owner would be the most strongly linked to it (the one you would display in a list of sentences).

Here is my idea again :

There are sentence versions instead of only one sentence with an log.
Users add comments with or without a vote (agree or doesn't agree with the new version).
The new version is accepted or rejected after a number of days depending on the number of comments and the +/- ratio.

User1 adds a sentence
User2 suggests a modification with a comment explaining why
User3 leave a comment asking for more information
User2 answers
User3 says that now she understand and agrees with User2 and vote +
User4 leaves a comment about User2 version and vote -
User1 leaves a comment about User2 version and vote +
User5 do the same
some time pass
the new version is accepted and become the new sentence

So you have to motivate every proposal and vote with a comment. And in case the majority is still wrong there are moderators (or you can suggest the opposite modification with such good explanations and examples that everyone agrees).

PS : your sentence should have a Capital T letter at the beginning and a final dot at the end (then you can add it without getting any trouble :oP).

sysko sysko January 19, 2011 January 19, 2011 at 10:11:43 PM UTC link Permalink

I will say in a far future when the number of sentences from the original Tanaka corpus will only represent a non consequent part of the corpus (now it's only less than an half, and a big part of the English sentences has been proofreaded), own or unowned will no more indicate if it's reliable or not.

But anyway even now, as when you do a "google fight", the goal is not to say "if it's owned then it is realiable, math rule guys" but rather it permits you to know who added it or at least the guys who says "I take care of this sentence" so if it's a well known user (CK for english, Sacredceltic for French etc.) then you already know it's a quite reliable sentence. Otherwise you're still able to go on the profile of the owner and be able to ponderate "how much" you can trust the contribution of this user in this language. As you will do when you want to know if the information on a given website is reliable or not after a google fight.
The fact the Times is the one or not which has written the piece of text displayed by google is not important, for you what is important is that if the Times decided to display it it's because they think it is reliable, so as the Times is "reliable", what they rely can be considered as reliable. And if it's an other website, then you're supposed to go on the website to evaluate it by yourself. So it's the same things with sentences, the fact a sentence appears as owned is not a DIRECT proof that it is reliable, but it permits you to judge, by judging the own who's owning it, if you can trust it or not.

I know that then "you can look on the historic" can be also an answer BUT, going on the historic supposed you to click on the sentence, I know it sounds stupid and lazy, but well when I'm at the pause and seeing some random chinese sentences to learn some stuff, I really don't want to have to click to know that, 80% of the chinese sentences are owned by 3 users which I consider as reliable, so it saves me hundreds of clicks, and when I see a sentence which is from a "unknown user, then and only then I will click on his profile (and once again clicking directly on his profile rather than the sentence and then his profile trough the historics save me a click, and a page loading, which is an eternity here)

Then you can answer me "well so instead of having an owner we can replace the place saved by removing it by a "ok" / "not ok" /"dunno" icon" to indicate a somewhat "objectif" reliability. But it does not solve the problem "WHO" will put this icons ? A vote system ? As anyone can own, anyone would be able to vote. And as previous discussion here has shown, defining how such a system would work is not something on which everyone agree.

So this was for the "if you consider "own/not ownn" has a link with reliable / reliable"

But for the me it's not only about that.

{{vm.hiddenReplies[4793] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko January 19, 2011 January 19, 2011 at 10:12:28 PM UTC link Permalink

I'm writing the second part

Zifre Zifre January 19, 2011 January 19, 2011 at 9:39:15 PM UTC link Permalink

I would have to agree. In practice, owned sentences are more reliable, but that's only because most of the unowned English sentences were written by non-native speakers as part of the Tanaka corpus. I suspect that the unowned Japanese and French sentences should be more reliable.

For new sentences submitted by users, I don't think it really makes a difference whether they are owned or not. The only thing to be aware of is that if a sentence is unowned, any random person could change it to something else. (But they could already submit an incorrect translation anyway, so it's not that big of a deal.)

Zifre Zifre January 17, 2011 January 17, 2011 at 1:19:43 AM UTC link Permalink

The person who would probably know the most about this issue would be CK.

I sometimes go through the random English sentences and adopt all the ones that seem okay to me and add the appropriate tags.

The problem is there are just too many sentences. If we assume that half are without an owner, that's ~80,000 sentences. If we wanted to get all of these corrected and adopted by the end of the year, that would be ~220 sentences per day. Right now I don't think there are very many people doing this. The ones I know of are CK, Nero, and me. I can probably do ~20 per day. CK does a lot, but even so, it's going to take a long time.

With Japanese, the problem is even more severe. We don't have many native Japanese speakers here (although there are quite a few who know it as a second language). I don't think any of them have started a large scale adoption process like CK has for English. I suspect that it should be theoretically easier, as most of the Japanese sentences were at least written by native speakers, unlike English.

I think a page to show random unadopted sentences in a give language would be really useful and could save a lot of time! We should also try to advertise this campaign more, as I think there are plenty of native English speakers here who aren't even really aware of the issue or what they can do.

{{vm.hiddenReplies[4778] ? 'expand_more' : 'expand_less'}} hide replies show replies
papabear papabear January 17, 2011 January 17, 2011 at 1:48:35 AM UTC link Permalink

How would we advertise these kinds of campaigns to our readers? Could we, say, put a Twitter feed on the front page?

{{vm.hiddenReplies[4779] ? 'expand_more' : 'expand_less'}} hide replies show replies
Zifre Zifre January 17, 2011 January 17, 2011 at 1:57:13 AM UTC link Permalink

One week from now (depending on your time zone) is the next Tatoeba day. Adoption and corrections are the main focus for this event.

My hope is that enough people will learn about this process through that day that even afterward, we will have more people involved and everything should go quicker.

Unfortunately I'm probably going to be busy all that day. :-(

jakov jakov January 17, 2011 January 17, 2011 at 9:28:03 PM UTC link Permalink

I find it a little annoying, that when i'm typing in a tag, the automatic search is case-sensitive. It would be great (and probably easy) to make this search case-insensitive.

{{vm.hiddenReplies[4784] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift January 18, 2011 January 18, 2011 at 1:49:51 AM UTC link Permalink

Sysko is on it. :-)

Zifre Zifre January 17, 2011 January 17, 2011 at 1:35:52 PM UTC link Permalink

I think I've found a small bug. On the new page that shows all sentences in a given language with audio, changing the page works incorrectly. Instead of going to the next page for that language, it goes to the page for all languages.

For example, on this page:

http://tatoeba.org/sentences/with_audio/cmn

Clicking on the next page goes to French sentences:

http://tatoeba.org/sentences/with_audio/page:2

When it should go to:

http://tatoeba.org/sentences/with_audio/cmn/page:2

{{vm.hiddenReplies[4781] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG January 17, 2011 January 17, 2011 at 6:47:39 PM UTC link Permalink

Thanks for the report. It should be fixed :)

Dejo Dejo January 15, 2011 January 15, 2011 at 10:08:00 PM UTC link Permalink

Welcome Ronaldini
Please check all your sentences to make sure they have the right flag, and please respect Esperanto spelling conventions. The letters X and W don't exist in Esperanto.
Bonvolu kontroli viajn frazojn por certigi ke vi havas la ĝustan flagon, kaj respektu la esperantajn ortografiajn konvenciojn. Ne ekzistas X kaj W en Esperanto.

arcticmonkey arcticmonkey January 15, 2011 January 15, 2011 at 4:32:50 PM UTC link Permalink

Is it just me or is Martin Swift's website down?

{{vm.hiddenReplies[4766] ? 'expand_more' : 'expand_less'}} hide replies show replies
debian2007 debian2007 January 15, 2011 January 15, 2011 at 4:47:29 PM UTC link Permalink

Down. I can not connect.

Swift Swift January 15, 2011 January 15, 2011 at 7:34:08 PM UTC link Permalink

Ah, sorry! Power outage took it out. Should be up in a couple of minutes.

boracasli boracasli January 4, 2011 January 4, 2011 at 2:08:45 PM UTC link Permalink

I joined the Launchpad before Tatoeba. I've translated Tatoeba into Turkish using Launchpad and passed a long time since I began to translate Tatoeba into Turkish.

{{vm.hiddenReplies[4617] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG January 15, 2011 January 15, 2011 at 7:08:58 PM UTC link Permalink

Done. Sorry for the delay.

I've updated all the other languages too.

{{vm.hiddenReplies[4771] ? 'expand_more' : 'expand_less'}} hide replies show replies
papabear papabear January 16, 2011 January 16, 2011 at 9:29:40 AM UTC link Permalink

A couple of ideas:

1. I haven't tried this, but Trang, it should be very simple for you to add the translated transcripts as subtitles in the original video. I know there are some tutorials for it. While you're at it, you probably want to translate your tags (and maybe add some more) as well.
2. Would anyone like to do an audio reading of the Tatoeba presentation in their native language and post it to YouTube as a video response?

Demetrius Demetrius January 4, 2011 January 4, 2011 at 2:34:00 PM UTC link Permalink

I think Trang should add the language to the list.