menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
tatoebix tatoebix March 29, 2013 March 29, 2013 at 2:08:11 PM UTC link Permalink

I like the idea of the native speaker initiative to enhance the quality of the sentences. Maybe expanding it to
have an actual staging area , where sentences are made ready for inclusion in the main database . A criteria for inclusion could be 3 OK tags from native speakers or 10 OK tags from non native speakers.

{{vm.hiddenReplies[15992] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic March 29, 2013 March 29, 2013 at 5:05:04 PM UTC link Permalink

oh, not the stupidity of the crowds again !

Why not stage an arena with gladiators to please them, instead ?

{{vm.hiddenReplies[15993] ? 'expand_more' : 'expand_less'}} hide replies show replies
tatoebix tatoebix March 30, 2013 March 30, 2013 at 7:35:24 AM UTC link Permalink

Ha, ha I forgot, we already have a contributor ,who is working towards the completion of the Sisypheum all by himself.

Obviously you want quality in translations , I think , that you would balk at entries like this:

You are nice.
Du bist nicht nett.

This would need a corrective contribution , which may or may not happen , before a hapless language enthusiast
stumbles upon them and unknowingly incorporates this into
his own knowledge base.

If tatoeba is not taking care of translation quality then
once it reaches 10 Million or more entries it will not be different than sentence generator sites feeding data into google translate api and filling some database.

The content of the sentence does not matter, translation quality ,however, does . How to achieve this needs to be at the forefront of discussions .

{{vm.hiddenReplies[16003] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic March 30, 2013 March 30, 2013 at 10:40:54 AM UTC link Permalink

>You are nice.
>Du bist nicht nett.

I don't know what you're talking about. Please provide links to supposedly defective translations.
Since Tatoeba is an open service, there is no way that errors can be avoided. They will be corrected in due time.
But even if there is a proportion of errors, it is still better than robotic translation.
I spot translation errors all the time in official documents from international institutions that employ extremely well-paid armies of professional translators...
At least, with Tatoeba, there is no risk of war or diplomatic raw...

>If tatoeba is not taking care of translation quality

Tatoeba does very much. Alas, the number of lesson-givers who give us advice is far greater than the number of people who actually WORK to improve the quality...
If every person who taught us lessons of what we should do had actually suggested just a 100 corrections, Tatoeba would probably be almost error-free by now...

>once it reaches 10 Million or more entries it will not be different than sentence generator sites feeding data into google translate api and filling some database.

And so ?
Feeding google translate with natural-sounding sentences rather than the present robotic nonsense would be perfect, in my view...
That is precisely why I invest so much time here actually.

alexmarcelo alexmarcelo March 29, 2013 March 29, 2013 at 6:08:50 PM UTC link Permalink

-1
Quality rating is subjective and will vary from user to user.

{{vm.hiddenReplies[15994] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais March 29, 2013 March 29, 2013 at 10:46:02 PM UTC link Permalink

Everything in the real world is subjective and prone to error, which is why there exists...

http://en.wikipedia.org/wiki/Statistics

The sooner there is some systematic approach to judging quality, the sooner this website will become useful/reliable for more than its 20/30 active contributors.

{{vm.hiddenReplies[15997] ? 'expand_more' : 'expand_less'}} hide replies show replies
alexmarcelo alexmarcelo March 29, 2013 March 29, 2013 at 11:15:15 PM UTC link Permalink

You love Statistics, don't you? Actually, we have some interesting data to consider:
http://tatoeba.org/eng/activities/adopt_sentences/
http://tatoeba.org/eng/tags/sho...Check/page:482
http://tatoeba.org/eng/tags/sho...change/page:73
http://tatoeba.org/eng/tags/sho...@check/page:13

> Everything in the real world is subjective and prone to error
But the degree of subjectivity varies. Before you want to rate sentences, you need to correct them. If you want to improve Tatoeba's quality, these links would be a good start point.

{{vm.hiddenReplies[15998] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais March 30, 2013 March 30, 2013 at 5:56:51 PM UTC link Permalink

>> Before you want to rate sentences, you need to correct them.

I disagree. If all sentences could be corrected, you wouldn't need to rate them - you'd just say "all of our sentences have been corrected" (whatever that means, since we don't know who corrected them or why should you trust the people who corrected them). Maybe I didn't understand what you were trying to say.

No, I don't particularly "love" statistics. But if you want something done (i.e. quality of content), then you should use standard tools (i.e. ratings) as a first attempt before you resort to innovative methods that have no empirical backing and may or may not work. To the best of my knowledge, this is standard professional practice - the common expression being "don't reinvent the wheel".

Managing quality in a system that averages 200,000 sentences per active user cannot be done with tags and comments. It's inefficient and won't work. Even if you personally OK all of the sentences in your native language (even if there's 100,000 of them and you dedicate a full week or two to this), it still won't mean anything because it's just *one* person's OK, which, as you said, is subjective. It's only once you get multiple OKs that the thing begins to have some sort of statistical backing.

Anyway, I don't want to get into these arguments, so... prove me wrong and correct the sentences one by one. I hope, I honestly do hope, that you can succeed (because I like this site and would like for it to prosper). But I really, really doubt it.

{{vm.hiddenReplies[16010] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic March 30, 2013 March 30, 2013 at 6:33:17 PM UTC link Permalink

and so when 3 trolls have OKed a wrong sentence just because they love destroying people's work, what do we gain ?

{{vm.hiddenReplies[16011] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais March 30, 2013 March 30, 2013 at 7:17:15 PM UTC link Permalink

You can propose a doomsday scenario to denounce any suggestion, or you could try something and see if it works (and then be pleasantly surprised).

Of course, if you assume that the majority of users are trolls, then no system in the world will save you. Thankfully, that's not the case (personally, if I count the number of "trolls" that I have encountered on this site over the years, I cannot reach 3, try as I might).

{{vm.hiddenReplies[16012] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic March 30, 2013 March 30, 2013 at 7:30:47 PM UTC link Permalink

My experience of majorities is that they love reality TV, junk food and gladiators...
I do love sentences and languages.

{{vm.hiddenReplies[16013] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais March 31, 2013 March 31, 2013 at 3:51:21 AM UTC link Permalink

I'm going to risk saying something very blunt here, and say that it's elitist to assume that the majority is somehow dumb and inept and shouldn't be entrusted with anything.

I would be curious to hear your explanation for why people who love reality TV, junk food, and gladiators would not, at the same time, love sentences and languages. I would also be curious to understand why people who loved reality TV, junk food, and gladiators, but didn't care about sentences and languages, would come on TTB to leave OK tags on incorrect sentences.

alexmarcelo alexmarcelo March 30, 2013 March 30, 2013 at 7:33:25 PM UTC link Permalink

But one troll alone can create thousands of accounts daily...

{{vm.hiddenReplies[16014] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais March 31, 2013 March 31, 2013 at 3:39:46 AM UTC link Permalink

Yes, all of which have Advanced User status with tagging priviliges.

That's not a troll, alexmarcelo - that's a professional hacker.

{{vm.hiddenReplies[16015] ? 'expand_more' : 'expand_less'}} hide replies show replies
alexmarcelo alexmarcelo March 31, 2013 March 31, 2013 at 7:52:55 AM UTC link Permalink

If only Advanced Contributors were to rate sentences, then you're probably right -- that's a professional hacker.

In _my_ opinion, a rating system at this stage of the game would be as useless as Facebook's "Like" button... If we are to vote FOR or AGAINST a sentence, we should avoid subjective judgment. For example, you shouldn't vote AGAINST a sentence just because it contains the word "crap" and you don't like it. Grammar and orthography mistakes are more important.

Also, a translation may vary from dialect to dialect. It would be a mess if Argentinians were to rate sentences written by a Spaniard, for example, and vice-versa... it's natural -- yet wrong -- to think that our native dialect is more correct than other varities. Some people just don't know about the existence of other dialects...

I'm just saying that a sentence tagged "change", "needs native check" or "check" needs more attention than the rest of the corpus... and we currently have 173,384 sentences to be checked, corrected and adopted.

{{vm.hiddenReplies[16018] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais March 31, 2013 March 31, 2013 at 5:02:51 PM UTC link Permalink

First, any rating system could easily be limited to Advanced Users (and probably should be), so that's not an issue.

Second, you cannot "avoid subjective judgment". It's by averaging a lot of subjective judgments that you get something quasi-objective. That's what statistics is.

Third, you cannot compare TTB to FB. The latter caters to everyone while the former caters to a small niche.

Fourth, of course there are difficulties in implementing any system, which is why you implement something, see how it's working, and then modify it. Regional varities could present a partial challenge, but that's not an argument for scrapping the whole idea without even trying it.

Fifth, 173,384 sentences is already a lot, but what would be more relevant would be the derivative. Is the ratio of bad sentences to good sentences increasing or decreasing? If it's decreasing, then we don't have to change anything, as eventually, at some point in finite time, the bad sentences will be such a minority that it won't matter. If it's increasing, then the current approach, whatever it may be, may not be working so well. Of course, since we have no objective means to judge if a sentence is "good" or "bad", we can't even answer these questions.

Sixth, you also need to remember the quality of the links/translations, and not just the quality of the standalone sentences. How many links do you think need checking? I would bet that it's at least the number of bad sentences... squared.

Finally, all of this discussing is probably close to useless anyway, since 99% of the ideas that appear on this wall don't really get implemented. There was a time when Tatoeba would get updated every weekend, but that time is long gone.

sacredceltic sacredceltic March 31, 2013 March 31, 2013 at 9:02:00 AM UTC link Permalink

The majority doesn't rule languages. Wrong sentences and their authors far outnumber correct ones. That's why we have education. That's not elitist, that's reality.
And don't tell me for the umpteenth time that if the majority thinks a mistake is right, it should become the rule, because that works only in English...

{{vm.hiddenReplies[16019] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais March 31, 2013 March 31, 2013 at 5:18:22 PM UTC link Permalink

Again, I'm not talking about your majority - I'm talking about the Advanced Users of TTB. The original post by tatoebix talked about OK tags. Your gladiator-loving majority cannot place those - only Advanced Users can.

Also, please stop putting words into my mouth.

{{vm.hiddenReplies[16025] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic March 31, 2013 March 31, 2013 at 5:33:04 PM UTC link Permalink

If your "rating system" involves only advanced users, then there are less of those than there are languages on Tatoeba...

{{vm.hiddenReplies[16026] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais April 1, 2013 April 1, 2013 at 5:53:08 AM UTC link Permalink

...which is why I proposed, in my last (long) post on this, to rate users based on the quality of their translations/sentences (instead of rating sentences individually).

Your arguments are inconsistent. Just now, you argued that there are too few advanced users to make a rating system work, and yet you expect these same advanced users to correct all of the currently flawed (or potentially flawed) sentences/translations (of which there are millions).

Let me turn the tables and ask *you* to propose a solution to the Tatoeba quality problem, instead of trying to poke holes in this one.

Or, to simplify, let me ask you a much simpler question:

Why should a TTB user/visitor believe that your 80,000+ French sentences are correct and trustworthy?

{{vm.hiddenReplies[16028] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic April 1, 2013 April 1, 2013 at 9:34:48 AM UTC link Permalink

I don't know what TTB means. Using acronyms in a multi-language community is a bad idea.

I know you're obsessed with the number of my translations ( yes, the vast majority of my sentences are not mere sentences that I inserted, they're also translations from English, German, Esperanto and sometimes Spanish and Dutch...)
They are not perfect and I get corrected all the time, like everybody else. But they have the merit to exist, because when I arrived here, 90% of the French corpus had been written by non natives and anglophile geeks, and French sounded here like a 100% literal translation from low-level Globish.
The merit of my numerous contributions is to rebalance that.
My solution to the quality issue, which I daily put into action, is WORK. Lots of work.
Like many here, you spend more time criticising and advising than working in your NATIVE language. That's where the quality problem lies.

We have a nice saying in French for that : les conseilleurs ne sont pas les payeurs.

{{vm.hiddenReplies[16031] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais April 1, 2013 April 1, 2013 at 3:59:36 PM UTC link Permalink

Okay, so you give vague answers instead of actually proposing concrete solutions, and you resort to digressions, insults, stereotypes, and false accusations instead of trying to have a productive discussion.

Fine, SC, fine. I'll remember, for the upteenth time, that beating my head against the wall is more productive than trying to discuss anything with you.

On a personal note, I suspect that the reason that you want to keep things as they are is because it suits you just fine, and that quality on Tatoeba is not an issue that you actually care about solving. In fact, I would extend this accusation to a lot of other people, as it doesn't take a lot of thinking to realize that the current way of doing things is going to go absolutely nowhere.

My thanks to tatoebix for having proposed what he/she proposed, though.

{{vm.hiddenReplies[16033] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic April 1, 2013 April 1, 2013 at 4:10:19 PM UTC link Permalink

>and you resort to digressions, insults, stereotypes, and false accusations

You're being delirious...

Balamax Balamax April 1, 2013 April 1, 2013 at 4:59:07 PM UTC link Permalink

I'd like to add that a system "+1 -1", as it was in use on www.websters-online-dictionary.org for some time, isn't a convenient decision at all, the greater part of the translations (single words or word combinations) seemed as if generated by Google Translate and therefore the expected quality of voting was very low, but predictable. Many translations and even some text fonts were wrong initially.

{{vm.hiddenReplies[16038] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais April 1, 2013 April 1, 2013 at 11:23:44 PM UTC link Permalink

Was the system open to registered users or to everyone?

{{vm.hiddenReplies[16041] ? 'expand_more' : 'expand_less'}} hide replies show replies
Balamax Balamax April 3, 2013 April 3, 2013 at 5:35:13 PM UTC link Permalink

I could vote for free. But the participation of registered members didn't guarantee quality, and needed constant perfection: www.wiktionary.org http://glosbe.com, http://www.logosdictionary.org

{{vm.hiddenReplies[16045] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais April 4, 2013 April 4, 2013 at 6:07:13 AM UTC link Permalink

This would be another argument for limiting any such system to Advanced Users.

I don't know what others think, but personally I have faith in Tatoeba's Advanced Users.

AlanF_US AlanF_US April 1, 2013 April 1, 2013 at 5:58:46 PM UTC link Permalink

And my thanks to both tatoebix and you for sane, well-reasoned points.

Remember that when you're talking to the Wall, you're not just talking to a wall (even if certain people give you that impression). Some of us here are nodding our heads and possibly sending private messages to sysko saying "I think X is right."

Shishir Shishir March 31, 2013 March 31, 2013 at 5:47:22 PM UTC link Permalink

If it's only the trustworthy users and we delete that of the 10 OKs from non-natives, since non-native speakers usually don't know what sounds or doesn't sound natural and nothing is specified about how proficient in that language they should be, it doesn't sound so bad.

About a specific case, such as my language, Spanish, it would be quite hard to know which are the correct sentences, since, as I said the last time this subject appeared, it depends on the dialect and the country, and there are differences that would make some correct sentences not get those 3 OKs because they are not used in other countries and the trusted users that speak Spanish from each of the dialects are less than 3. And in some languages it's even worse, since there is no trusted user, or just one. So as long as Tatoeba is not big enough for this, I'd agree with Sacredceltic and Alexmarcelo and focus on correcting the mistakes and adopting the orphan sentences.

{{vm.hiddenReplies[16027] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais April 1, 2013 April 1, 2013 at 6:01:49 AM UTC link Permalink

I'm sorry, Shishir, but I have to disagree with you. Non-natives' opinions on what's good and what isn't doesn't amount to 0.

The basic nature of tatoebix's proposal is quite wise and elegant - it's a weighted system. It gives more priority to those who are expected to know more and less to those who are expected to know less. This is much more flexible than a Boolean system of 0's and 1's, where something either is good or it isn't, and a user either is qualified or is not.

Regarding this dialects argument, I would be curious to have an estimate on what proportion of correct sentences from one region would be judged as incorrect in another. If it's 50%, then you have a problem. If it's less than 10%, then I'd say take the hit.

{{vm.hiddenReplies[16030] ? 'expand_more' : 'expand_less'}} hide replies show replies
Shishir Shishir April 1, 2013 April 1, 2013 at 3:57:48 PM UTC link Permalink

Don't be sorry for disagreeing with me, I left the comment mainly to hear your opinion about these points.

I'd say that I wouldn't use around 30% of hayastan's sentences (maybe a few more, maybe a few less) despite their being completely right and natural in Argentina, and maybe 5-10% of Marcelostockle's.
According to that voting system, some of the accepted phenomena that just take place in a determined country/a few countries wouldn't be accepted in Tatoeba as correct or wouldn't get enough "OK" tags, the same as the vocabulary just used in a country, that someone who trusted your system might think is wrong. That's why I think we need a greater community before putting this system into practice.

About the possibility of accepting the opinion of non-natives, in Spanish it would be quite chaotic, the Spanish native speakers, not knowing all the dialects of our own language, might think "it can be correct somewhere but it's not said here" and a non-native might have added that sentence and been voted as correct by some other non-native speakers, while it isn't actually used anywhere, but as the verb is well conjugated and the prepositions well chosen, they thought it was correct. How can we stop this from happening?

{{vm.hiddenReplies[16032] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais April 1, 2013 April 1, 2013 at 4:20:51 PM UTC link Permalink

For your first point, note that there's multiple other languages without dialects with very few representatives. They wouldn't get many OK tags, either. It doesn't mean that you can't try the system anyway - it just won't progress as quickly for those as it would for languages like French, English, and German. Sentences would be accepted slower, they would be judged as correct slower, but so what? Right now nothing is being judged as corrected/accepted at all.

For non-natives, I repeat what I already wrote - the opinions are averaged and their opinions are worth "less". Now, whether it's right to do this by natives/non-natives is another question (I personally do not agree that it is). Also, since the discussion is about tagging, and therefore about Advanced Users, it's a bit too much to assume that so many Advanced Users will decide to tag as correct sentences that are wrong. Most are a bit more prudent than that, I would wager, and know their limitations when it comes to a language.

In case that fails, you also have feedback. For example, the numbers that tatoebix proposed could be tuned. You could start with 3 OKs for natives and 10 for non-natives, and if you, a trusted member of TTB and "a trusted expert" on Spanish, see that a lot of bad sentences are getting through, you simply increase the number for the non-natives from 10 to 20 (for example). You live and learn. My honest feeling is that you'd be pleasantly surprised by what you can achieve with 10, though.

And no, no system is foolproof. Of course you'll have a bad sentence get through now and then, just like once in a rare while you get spam that gets through your spam filter and a good e-mail that goes in your spam (by the way, spam filtering, to the best of knowledge, is a great example of a weighted system that works pretty well).

liori liori April 4, 2013 April 4, 2013 at 3:17:44 AM UTC link Permalink

If a language has multiple dialects which have significant differences between them, then Tatoeba should take note in which specific dialect the sentence is. If you, as a native speaker, cannot easily judge a sentence, would a non-native learner be able to do so?

If I remember correctly, sysko's new Tatoeba code will deal with dialects, not languages, as a basic categorization unit. Would tatoebix's idea be acceptable for you then?

Also, when you say that there is not enough advanced users of certain dialects—well, that's the same case as with any other languages that don't have many contributors. These languages will simply not benefit from the scheme, but there will be no harm either… or will there be?

{{vm.hiddenReplies[16048] ? 'expand_more' : 'expand_less'}} hide replies show replies
Shishir Shishir April 4, 2013 April 4, 2013 at 4:00:43 AM UTC link Permalink

I thought sysko had said that Tatoeba was based on the ISO language codes; and as, despite having some forms that are not used, we can understand each other perfectly (it's not like the Chinese dialects, for example), I don't think they should be splitted either, a tag would be more than enough to indicate that it is or is not used in a determined country.

About the second point, I also wonder how many languages would benefit from this and whether it would be really useful to do it if it will be just for the 10% of the languages (or maybe less). That's why I said that of waiting till Tatoeba becomes bigger or we have more trusted users.

{{vm.hiddenReplies[16049] ? 'expand_more' : 'expand_less'}} hide replies show replies
liori liori April 4, 2013 April 4, 2013 at 4:26:44 AM UTC link Permalink

Was and is. Sysko is working (slowly) on new code that will allow dialects to be distinguished, and this was what I was referring to.

The problem with tags is that most users cannot add them, so even if they contribute in a dialect, this information will be lost. Leaving a comment under each single sentence is not a solution either, it's burdensome to users. Users should explicitly be able to set the dialect correctly the way they can set the language.

Also, I think it will be beneficial even when the scheme will cover only few biggest Tatoeba languages—they already have problems with correctness. If you think Spanish is not a good target to experiment, maybe we should firstly limit it to some nice languages with little variance among users, like, I guess, Esperanto.

What do you think?

{{vm.hiddenReplies[16050] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais April 4, 2013 April 4, 2013 at 6:09:31 AM UTC link Permalink

My 2 cents on this: dialects will always be an issue, but an approximate solution is better than no solution at all.

Shishir Shishir April 4, 2013 April 4, 2013 at 4:24:18 PM UTC link Permalink

>The problem with tags is that most users cannot add them, so even if they contribute in a dialect, this information will be lost.

I don't think so, if they write in their profile where they are from (that I think should be compulsory in a proyect like Tatoeba), if something doesn't sound like "we would say it in Spain", I'd simply tag it as "Spanish from ...", and if it is also said in another country, it would be enough to add another tag that indicates so.
Also, if languages were divided into dialects, what would happen with the sentences that are said in all the Spanish speaking countries?

About covering just a few biggest languages, I admit it would be very useful in English, Japanese (I don't know if we've got enough Japanese speakers, though) and French, since they have many unadopted sentences that learners don't know whether should be trusted; and it would be possible to do it in German and as you say, in Esperanto; even though I think these two languages can be trusted, since there are a lot of members that speak these languages and correct each other's mistakes.