menu
Tatoeba
language
S'inscrire Se connecter
language Français
menu
Tatoeba

chevron_right S'inscrire

chevron_right Se connecter

Parcourir

chevron_right Montrer une phrase au hasard

chevron_right Parcourir par langue

chevron_right Parcourir par liste

chevron_right Parcourir par étiquette

chevron_right Parcourir les enregistrements sonores

Communauté

chevron_right Mur

chevron_right Liste de tous les membres

chevron_right Langues des membres

chevron_right Langues natales des membres

search
clear
swap_horiz
search
User55521 User55521 28 novembre 2013 28 novembre 2013 à 22:12:38 UTC link Permalien

DEMETRII THOUGHTS ON IMPROVING TATOEBA CORPUS

I’ve read CK’s document and found that I completely disagree with it... So, I’ve written my own one. This is how I see the ideal Tatoeba.

* If something can be done by computer, don’t impose it on people
 * Smart grouping instead of removing near-duplicates
 * Auto-adding full stops

* Contributing should be simple
 * Don't discourage contributions
 * Keep rules simple

* One man’s mistake is another man’s correct sentence
 * "More sentences + filter" is better than "less sentences"
 * We need a way to rate corectness and filter sentences
  * By people’s voting
   * Rated by anyone (wisdom of the crowds)
   * Rated by 'native speakers' (self-proclaimed)
   * Rated by people you trust (selectable)
  * By prescriptive standards
   * [In]correct usage according to XXX style guide

* Editing is not needed
 * Editing is a mess
  * Need to check other translations
  * Moderators need to wait
  * Sentence owner may refuse to correct the sentence
 * Never edit, just add another sentences
 * Incorrect sentences will get low rating and be filtered
 * Sentences with negative ratings will be deleted by script

{{vm.hiddenReplies[18053] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
AlanF_US AlanF_US 29 novembre 2013 29 novembre 2013 à 04:27:46 UTC link Permalien

Although I find something appealing in both CK's ( http://tatoeba.org/eng/wall/sho...#message_18044 ) and Demetrius's vision of the direction in which Tatoeba should go, I don't see how either one will happen. CK's approach involves a change in contributors' behavior that flies in the face of what we've seen so far (despite frequent requests, especially from him), while Demetrius's involves a rating system that would require implementation effort that I just don't see happening at a time when Tatoeba's software is effectively dormant.

If Tatoeba's prime audience were perceived as consisting of people outside the community who use it for reference without knowing anything about the individual community members who contribute sentences, then obviously quality and reliability would be of prime importance. But I don't think the site ever served those people particularly well, especially given the decision to put the Tanaka corpus at the heart of the Tatoeba corpus.

Whether or not I like it, I have come to think that Tatoeba best suits people inside the community, and in particular, those who come for whatever particular purpose they want to meet. If I want to figure out how a particular word is used in Hebrew, I can search for it and see which sentences have been written by people I've come to trust. If I'm in an editing mood, and looking for a way to pass time, I might look at the English sentences with "@needs native check" tags and comment upon them, though I don't delude myself that I'll actually "drain the swamp" to any appreciable degree (especially since many of the sentences will fall into a gray area). If I want to learn how to translate an English sentence of my own, I can contribute it and then ask someone I trust to translate it. In this largely anarchic, do-what-you-want environment, I don't think you can do much other than contribute in whatever way you enjoy. I guess that's a more Demetrius-ist than CK-ist approach, but with less optimism regarding the usability of the outcome.

For the record, I don't contribute sentences in a language that's not my native one, but I know that there will always be plenty who will.

{{vm.hiddenReplies[18054] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
Scott Scott 29 novembre 2013 29 novembre 2013 à 23:16:11 UTC link Permalien

To those in favor of a rating system: I won't chime in with my opinion this time, but I just want to point out that there hasn't been any major change to Tatoeba in the last three years, so I'm not sure we should expect any in the next three.

{{vm.hiddenReplies[18067] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 30 novembre 2013 30 novembre 2013 à 01:27:58 UTC link Permalien

Actually, it wouldn't be too difficult to set up a quick website intended as a tester for a rating system.

We would need to:

(1) collect a list of users who would want to participate
(2) create a second set of accounts for these users (since we wouldn't assume to have access to everyone's Tatoeba password)
(3) agree on a rating algorithm
(4) set up a basic page that calls different Tatoeba sentences and translations, and then lets the testers rate them

(1) and (2) are cake. (3) would need some discussion - I proposed an algorithm in my January post, but this could be changed/refined. (4) I see as one day of hard work for one intermediate web programmer, if things (particularly, interfacing with the Tatoeba sentence database) go smoothly.

At first, it could be a separate website dedicated solely to rating the Tatoeba corpus.

Any takers?

{{vm.hiddenReplies[18068] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
AlanF_US AlanF_US 30 novembre 2013 30 novembre 2013 à 20:53:07 UTC link Permalien

I would sign up to participate, though I assume you're looking for a list that contains more than one person. You could always try recruiting people on the IRC channel or by private message.

{{vm.hiddenReplies[18076] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 30 novembre 2013 30 novembre 2013 à 21:07:20 UTC link Permalien

I'm just polling for general interest.

If no one cares about taking the time to do this, then I won't either. But if people are interested, then why not. I think a team of ~10 to start things off could be sufficient.

FeuDRenais FeuDRenais 29 novembre 2013 29 novembre 2013 à 04:44:19 UTC link Permalien

I remember proposing something similar a while ago (in January of this year, I think). Basically, I agree that too much effort is being spent by members on low-payoff tasks. In particular with regard to manually monitoring corpus quality when an automatic rating system would be quicker, more efficient, and more statistically robust.

However, I know that a lot of users here are conservative and hesitant with respect to the idea of ratings, which is an attitude that I personally find to be irrational. It'd be great if there were someone well-versed in karma systems that could contribute to this discussion.

{{vm.hiddenReplies[18055] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
User55521 User55521 29 novembre 2013 29 novembre 2013 à 13:11:22 UTC link Permalien

Actually, sometimes I think that the rating system is not required to exist as a part of Tatoeba.

It can be included as a UserScript that queries a foreign server, or just a separate rating site that imports Tatoeba sentences.

This will also allow for some decentralisation, which is also a good thing.

{{vm.hiddenReplies[18060] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 29 novembre 2013 29 novembre 2013 à 20:16:49 UTC link Permalien

Yea, I've also thought about this option. The biggest inconvenience that immediately strikes me is that completely new visitors to the site wouldn't be able to see sentence ratings without setting up a script or accessing an auxiliary site. In (my version of) the ideal world, a random internet user would stumble on a Tatoeba sentence and have a number next to it that somehow reflected its reliability. Then he/she could say, for example, "okay, 95% is pretty good, I'll go ahead and assume this is right and use it for whatever I need it for". That way this site could be *useful* to the masses with at least some sort of quality claims behind its data (sentences/translations).

That said, a pilot script/site that interfaced with Tatoeba and let the users here test out a rating system prototype without making it an integral part of Tatoeba itself could also be a useful first step.

{{vm.hiddenReplies[18061] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
User55521 User55521 29 novembre 2013 29 novembre 2013 à 21:26:51 UTC link Permalien

>> new visitors to the site wouldn't
>> be able to see sentence ratings

It is in fact possible to avoid. The foreign site can put Tatoeba tags with their ratings (and, probably, comments).

{{vm.hiddenReplies[18062] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 29 novembre 2013 29 novembre 2013 à 23:06:41 UTC link Permalien

But then that site would become the new Tatoeba, wouldn't it? :-)

Actually, how easy would it be to copy Tatoeba and to develop it independently? I've discussed this a little bit with Alan, since it seems like the people capable of developing the current site are too busy to do it...

Of course, I'm a hypocrite here since there's a lot I would love to improve about Tatoeba but probably don't have the time to do it, either, heh. But some people must have time... right?

{{vm.hiddenReplies[18066] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
User55521 User55521 30 novembre 2013 30 novembre 2013 à 16:02:26 UTC link Permalien

>> But then that site would become the
>> new Tatoeba, wouldn't it? :-)

No, it wouldn't. It would just query Tatoeba’s URL for tag-setting.

I don't propose adding new sentences on that new site.

{{vm.hiddenReplies[18070] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 30 novembre 2013 30 novembre 2013 à 17:36:58 UTC link Permalien

So a "Tatoeba Viewer" of sorts :-)

FeuDRenais FeuDRenais 30 novembre 2013 30 novembre 2013 à 17:39:40 UTC link Permalien

But seriously, if you or other people would like to dedicate maybe 15-30 minutes a day to discuss this and start coding something that could then be tested on a smaller scale before being proposed to the community, I would actually be willing. I think that there's enough interest and that this is a good chance to make some major improvements to the current system.

Let me know.

{{vm.hiddenReplies[18072] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
User55521 User55521 1 décembre 2013 1 décembre 2013 à 12:25:17 UTC link Permalien

I would be interested, but I think we need to think everything over very thoroughly.

For example, it may be useful to replace links with a 'rating for sentence X as a translation of Y'.

{{vm.hiddenReplies[18097] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 1 décembre 2013 1 décembre 2013 à 18:01:22 UTC link Permalien

We can have a good think, certainly. I should write to liori and summon him here, as he would probably be able to contribute a lot of good ideas.

al_ex_an_der al_ex_an_der 29 novembre 2013 29 novembre 2013 à 22:47:27 UTC link Permalien

>>> In (my version of) the ideal world, a random internet user would stumble on a Tatoeba sentence and have a number next to it that somehow reflected its reliability. Then he/she could say, for example, "okay, 95% is pretty good <<<

Or even much more simple:

The search results are shown according to the order of their reliability (without showing a percentage or some other "verdict" that may provoke discussions about the precision, justice and reliabilitiy of the realiability check procedure). Now the search results appear in chronical order. Certainly not allways the oldest sentences are the best. If the apply the same procedure as linguee* the feedback of the users would bit by bit rearange the order of the sentences and the reliability of the first shown examples would grow, yet there would be no need to pretend that we apply or try to find a perfect evaluation system. We'd only say that there is a higher probabilty to find good examples near to the top of the list, than more behind. If you click the link (or visit http://www.linguee.com/ otherwise) and move the cursor over the example you'll see a thumb down and thumb up button.

* http://www.linguee.com/english-...source=english

{{vm.hiddenReplies[18064] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 29 novembre 2013 29 novembre 2013 à 23:03:46 UTC link Permalien

Yes.

You could also have a tab to switch between "sort by quality" and "sort by relevance" (like Youtube :-)

sacredceltic sacredceltic 30 novembre 2013 30 novembre 2013 à 19:19:23 UTC link Permalien

To demonstrate the stupidity of such a system, I'll take this example :

http://tatoeba.org/fre/sentences/show/2876497

I would never utter that sentence myself. I had never heard it until a week ago. I actually asked the Belgian 12 years old pupil (a brilliant one with high marks in French) who said that to repeat it and confirm the meaning I had supposed it had.
Then I asked others to confirm they were saying this as well, and investigated with parents to confirm the use of that sentence.
99% of francophones would probably not recognize the validity of this French sentence. But it is nevertheless valid. it's just that it is used in a limited area (Brussels, and probably part of francophone Belgium, although probably not all of it...and maybe also part of Northern France...I don't know...)
Probably 50% of francophones will be shocked when they see that sentence and protest that Tatoeba produces weird things and I can already hear the preposterous self-proclaimed experts (usually know-it-all youngsters) who will comment that this sentence is not acceptable (I can't wait...I'll keep you posted...)

Well, again, it is valid French. 100% so.

So what about this sentence within a rating system of crowd stupidity that you are promoting ?
It will be kept in the dark bottom of the corpus because 99% of voters (and probably 99.99%, because non-natives will also vote and the vast majority of them would never hear this because they don't learn French from 12 years old from Brussels)
Why should it be so ? It is a valid sentence. An interesting one, actually. Even if I won't ever use it myself, I quite like it and that is why I added it to the corpus, like I usually do, every time I hear an interesting or rare - and valid - form, spoken by natives. (I actually use Tatoeba to collect them, among other uses I make of the service)

You could say that it is not what 99% of French pupils say in that context. But what if you happen to be in Brussels and hear it and can't understand it although you would need it ? Why is it less important to understand that one than to understand other variants anywhere else if you're not elsewhere ?
And if 12 years olds in Brussels use specific sentences, it is highly probable that 12 years olds anywhere else also use specific ways to say the same thing.
Congo has now more francophones than France, Belgium and Switzerland together...Do you know how they say this in the different parts of Congo ?
Which one is the valid example sentence that should be on top of the queries results and why ?

A valid sentence is a valid sentence and that's that. No rating system will change that. If you want to learn a language, you have to learn ANY of its valid sentences. I understood that sentence - even if I needed confirmation - because I'm a native and I understand that French is not limited to what I've already heard in my life (most people don't understand this one and those who don't are just pains in the neck on Tatoeba...)

Rating sentences will result in impoverishment. You will see illiterate people and non natives decide what is a valid sentence and what is not. I see this daily in comments on Tatoeba.
And what would happen to this precious sentence on Tatoeba ? It will be buried.
Go tell this 12 years old that his sentence doesn't deserve to be an example !
I will certainly oppose that with all my energy.

{{vm.hiddenReplies[18075] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 30 novembre 2013 30 novembre 2013 à 21:03:38 UTC link Permalien

You are making assumptions on how the rating algorithm will work without even knowing what the algorithm would be.

This is to be discussed, but as I already proposed, one should rate users, and not sentences. The validity of a sentence should be reflected by what ratings a user gets *on average*.

So, if you add 100 sentences, 99 of which are "standard" and get good ratings, then your 1 "nonstandard" sentence would only lower the collective rating to, say, 99. You can't even hope to rate 2+ million sentences and 100+ million translations one by one in a statistically rigorous manner (unless the users rate, I don't know, 10 sentences for every sentence they add, but even this would be a monumental time-consuming task).

So that answers that example of yours - the nonstandard sentence will inherit the rating of its owner. The owner could even comment the sentence if they know its nonstandard so that people won't blindly downrate it.

Do you have other examples? Please continue, because pointing out concrete potential drawbacks is much better than simply nay-saying things blindly, and so here your input is more than welcome.

{{vm.hiddenReplies[18077] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
alexmarcelo alexmarcelo 30 novembre 2013 30 novembre 2013 à 22:54:58 UTC link Permalien

These are the reasons I don't agree with a rating system:
- Nowhere is it said that Tatoeba is intended for beginners. Some members here have been using Tatoeba's sentences as a resource for language learners, and that's OK as long as it doesn't interfere Tatoeba's main purpose: collect example sentences in several languages. What I'm saying is that some people would only rate a sentence "OK" if they think that sentence might be useful for their own purposes.
- Before you want to implement a rating system on Tatoeba, you should really think (again) about regional differences. My Portuguese dialect is not the only one. It's used in several countries, and sometimes even grammar differs! Now think about English, French, and Spanish. Using ONE flag to represent a language is not only disrespectful, but also misleading and innacurate. I don't need to say how messy it would be if an Argentinean user was to rate sentences (or whatever) belonging to a Spanish user, or vice-versa.
- Just like sacredceltic wrote, Tatoeba would lose rich information if a rating system were implemented. I've added myself several sentences that I wouldn't use myself, but I know other people do and are completely correct, so I think it's relevant to keep them here. In other words, this rating system would be completely biased, and minorities would not be represented.
- Members with a low rating would feel discouraged and probably leave Tatoeba, even if they've been unfairly low rated.

{{vm.hiddenReplies[18080] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 30 novembre 2013 30 novembre 2013 à 23:51:03 UTC link Permalien

For your first point: this would just mean that they will rate a smaller portion of the sentences, right? But so what?

Second point: see my reply to sacredceltic above. These are challenges that can be overcome.

Third point: if you add a sentence that you wouldn't use yourself, add a comment on it. Or, like I said to sacredceltic above, add an option that disables rating that particular sentence. Again, these are surmountable challenges, not definitive arguments about why a rating system would fail, destroy Tatoeba, and shouldn't even be attempted.

Fourth point: another surmountable challenge. If they are being rated unfairly, then either (a) the community sucks, or (b) the algorithm sucks. (b) is a technical challenge, and you could propose several approaches to make ratings be less sensitive to quick changes (this is similar to filtering out noise in engineering/math applications, for which there are entire books and hundreds of papers). If the community is so bad that tons of people are being unfair and forcing good users to leave, then you've got a much more serious problem and it's not the rating system. Otherwise, low rated users would be those who truly contribute badly consistently.

Finally, none of this has to be mandatory, so I don't understand why there's such a resistance to even trying it out. You could allow users to opt out of having their sentences rated, in which case nothing is said about the credibility of their sentences. You could start by making this an auxiliary website, like Impersonator mentioned, that would only be there for people who care about having a credibility score on a given sentence. If it fails miserably, don't use it. If it succeeds, then join the effort and then talk about making it a standard for the main Tatoeba. That's how R&D works.

{{vm.hiddenReplies[18086] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 30 novembre 2013 30 novembre 2013 à 23:56:54 UTC link Permalien

Your premise is just flawed : you suppose a language is defined by all its speakers. But it's just not the case. Your own native language, Russian, has been actually defined by a handful of writers and poets. Not even 0.1% of Russians...
Defining a language requires talent...hopefully !

{{vm.hiddenReplies[18088] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 1 décembre 2013 1 décembre 2013 à 00:03:33 UTC link Permalien

This is called a difference of opinion, not "your premise is just flawed".

In today's world, a language is no longer defined by a handful of people due largely to the internet. The same is happening with education - it is now available to anyone with an internet connection, and is no longer the property of an educated few.

From my point of view, your stances are about 20-30 years out of date.

{{vm.hiddenReplies[18089] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 1 décembre 2013 1 décembre 2013 à 11:01:20 UTC link Permalien

> In today's world, a language is no longer defined by a handful of people

Yes it still is...
Language is not innate. It is learnt. It is a convention and must not be anarchic. It's like laws : people need experts to design them, otherwise the different street talks are not inter-comprehensible.
Ask people on Internet to vote for laws and you will end up with a giant mess, exactly like voting for maths theorems or the laws of physics...It doesn't make sense.

Communication requires a protocol. The more complex and widespread the communication, the more complex the protocol. Otherwise, sentences just stop making sense, which is what's happening with Globish...but also to ghettos.

In certain areas of Paris suburbs, I don't understand some French people, although they were born in France, because they had little school education and designed their own street talk. I don't care, I don't need much to understand them. But they can't get jobs, because their specific communication makes them unable to function in a "standard-French" society...Sometimes they ask me things while in the metro. I know that their language is a kind of French but I just don't understand it in most cases...They must believe I'm a foreigner who speaks no French. It doesn't help them much getting what they want...

>The same is happening with education - it is now available to anyone with an internet

What about the millions of teachers ? Why haven't they been sacked, yet ?
Language is part of the learning toolbox. Internet doesn't make sense if you haven't learnt the language first...

{{vm.hiddenReplies[18091] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 1 décembre 2013 1 décembre 2013 à 17:59:49 UTC link Permalien

When arguing with a troll who seems to have infinite time and nothing else to do, and who is willing to argue forever without making ANY concessions, I simply realize that I am outmatched and desist.

This is not to say that I can't argue for having a rating system - I just can't argue for it against someone like you. In any case, this will be a side project for people who care about bringing some sort of quality evaluation to Tatoeba. You don't have to partake if you don't want to.

{{vm.hiddenReplies[18103] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 1 décembre 2013 1 décembre 2013 à 21:27:13 UTC link Permalien

You have trolled this community with endless talk about the rating of sentences and we all know why you do that : because you want to prove that your sentences are as valid as those by natives, which is both false and preposterous...
You pretended to be English native, then French when you actually are Russian and you coin sentences in Mandarin, Uighur, ... You're actually the worst enemy of quality on Tatoeba.

We should actually be busy designing a way to downgrade trolls like you.

{{vm.hiddenReplies[18113] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 1 décembre 2013 1 décembre 2013 à 22:06:24 UTC link Permalien

I have "trolled" this community with "endless talk" about the rating of sentences because I would like to be a part of a community that somehow evolves and advances and benefits things on a bigger scale than simply benefiting its 10-20 active users.

Naturally, this involves proposing new things and pushing for them when you believe that they're extremely likely to succeed and succeed tremendously. You, on the other hand, cry "nay!" to any idea you don't like/understand without even considering it, and make abstract, circular arguments without proposing your own solutions.

Of course, I understand that it's much easier to sit in your armchair and criticize what others propose without proposing anything yourself.

{{vm.hiddenReplies[18119] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 1 décembre 2013 1 décembre 2013 à 22:28:38 UTC link Permalien

I perfectly understand you proposal and I oppose it because it is flawed from the very beginning.

I don't propose solutions for rating sentences by the public, because the very idea that the public can rate sentences is flawed, so I don't waste time designing silly things to serve silly purposes.

{{vm.hiddenReplies[18122] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 1 décembre 2013 1 décembre 2013 à 22:47:39 UTC link Permalien

> I perfectly understand you proposal

No, you don't, because you keep bringing up counter points that have either been addressed already or can be addressed, and act as if they're irrefutable arguments for why a rating system cannot work.

Furthermore, you do not seem to come from technical fields or have the ability to think things through analytically, thereby making you unable to envision how a rating system would function *on a mathematical and algorithmic level*. I'm not saying that this is a flaw or even a drawback, as different people have different professions/backgrounds, but the proper thing to do when you don't have a certain type of knowledge is not to say "well, that's bound to fail because it's too complicated to rate sentences", but to try to learn and understand, bringing up your particular concerns and seeing if what is being proposed can address them or not.

So, no. You don't understand and you don't even try to understand.

When I wrote my long proposal last January, I gave numbers, estimates, and formulas. I actually proposed things in detail and justified why I proposed them that way. Liori *actually coded* what I proposed, ran a simple scenario, and pointed out a drawback, which led me to refine what I did and to improve it. *That* discussion was actually useful, and we need more like that. The discussions I'm having here with you are just garbage, unfortunately.

{{vm.hiddenReplies[18123] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 1 décembre 2013 1 décembre 2013 à 23:00:32 UTC link Permalien

>hat's bound to fail because it's too complicated to rate sentences

I never said that. Having anything rated the public on Internet is just normal simple stuff. It's just that in the case of sentences, the purpose of the rating is silly, because most people don't know their own language sufficiently we'll to be in that position.
There's nothing technical about that and I could design one myself. But it's just a silly idea, so I won't...
You may produce as many formulae as you wish, it will still be flawed from the start : the vast majority of people is not skilled enough to rate sentences, so suggesting a rating system for sentences that is open to the public is just plain silly. So you're right, you're talking garbage...

{{vm.hiddenReplies[18124] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 1 décembre 2013 1 décembre 2013 à 23:22:54 UTC link Permalien

Then why do people even bother to use "OK" tags?

{{vm.hiddenReplies[18126] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 2 décembre 2013 2 décembre 2013 à 05:50:43 UTC link Permalien

Tags don't make a sentence "better" than another, which is the fundamental flaw of a rating system. My tags are no better than anybody's else.
A sentence is either valid or not, but is not more valid than another. That is what you fail to see and that makes you curiously very nervous ...

{{vm.hiddenReplies[18128] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
User55521 User55521 2 décembre 2013 2 décembre 2013 à 06:30:47 UTC link Permalien

This view is contested by modern scholars. Please read about probabilistic linguistics.

{{vm.hiddenReplies[18129] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 2 décembre 2013 2 décembre 2013 à 06:52:21 UTC link Permalien

A few silly scholars. Probabilities have nothing to do with the grammaticality of a sentence or its use. It's either used or not, either grammatical or not. That is 1 or 0. There's no probability in there !

{{vm.hiddenReplies[18131] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
User55521 User55521 2 décembre 2013 2 décembre 2013 à 09:04:27 UTC link Permalien

> A few silly
> scholars.

Please abstain from insults.

Also, I'm noting this over and over again: rating system does *not* preclude anyone from using an old OK-tag-based system. As I’ve said, there needs to be a way to filter not only by rating, but also by prescriptive standard.


If you are not convinced by probabilistic approaches, just don’t use them. It’s so simple.

No one forces rating system upon you. It's you who wants to force your denial of probabilistic methods and black-and-white (valid-or-invalid) approach on everyone.

{{vm.hiddenReplies[18132] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 2 décembre 2013 2 décembre 2013 à 10:31:58 UTC link Permalien

>Please abstain from insults.

The day you do...

{{vm.hiddenReplies[18135] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
User55521 User55521 2 décembre 2013 2 décembre 2013 à 10:56:41 UTC link Permalien

Please point out where I've resorted to insults in this discussion.

sacredceltic sacredceltic 2 décembre 2013 2 décembre 2013 à 06:46:21 UTC link Permalien

And that's where you entirely missed the meaning of the the example I gave : that sentence is valid. Just valid. Not more or less valid than any other. Nevertheless, 50% of francophones would not acknowledge that at first sight, and they would vote...
You OK tag sentences you know but you may either ignore or discuss sentences that you don't know. You don't rate them against the others...because it is silly...

{{vm.hiddenReplies[18130] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
User55521 User55521 2 décembre 2013 2 décembre 2013 à 09:21:07 UTC link Permalien

> that sentence is valid. Just valid.

Under your definition of 'valid'...

However, you’ve explicitly stated that many people would consider it invalid. You think their definition of 'validness' is wrong. Fine, so just ignore them.

And I'm interested in other people’s definition of 'validness', not in your, so why shouldn’t I be able to know it? Why these many people (who would consider your Belgian sentence wrong) cannot browse the database under their definition of 'validness'?

Why you want the right to use your definition of 'validness', but deny this right to others?

{{vm.hiddenReplies[18133] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 2 décembre 2013 2 décembre 2013 à 10:30:47 UTC link Permalien

>Why you want the right to use your definition of 'validness', but deny this right to others?


I don't deny it to anybody and they may "NOTOK" tag it as they wish...but they will have to debate, and I will prove them wrong...so back to OK/NOT OK

{{vm.hiddenReplies[18134] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
User55521 User55521 2 décembre 2013 2 décembre 2013 à 11:03:28 UTC link Permalien

> I don't deny it
> to anybody and
> they may "NOTOK"
> tag it as they
> wish...

My definition of 'valid' allows having 'less OK' and 'more OK' sentences.

For example, I think «найяскравая» is an allowed form but less OK than «найяскравейшая». And «самая яскравая» is between «найяскравая» and «найяскравейшая» for me.

How am I supposed to tag these?

{{vm.hiddenReplies[18138] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 2 décembre 2013 2 décembre 2013 à 13:05:41 UTC link Permalien

Producing facts, as you eventually do now, is much more productive than calling me a troll...
Alas, I don't know Russian or Belarussian, whatever that is.
But I doubt a sentence can be 85% correct in any language...it is either correct or not, based on grammaticality, orthography and usage.

{{vm.hiddenReplies[18139] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
User55521 User55521 2 décembre 2013 2 décembre 2013 à 14:02:36 UTC link Permalien

Belarusian superlative is usually formed with -jejš- suffix and, optionally, a naj- prefix (najjaśniejšy, jaśniejšy).
The construction with "samy" is also possible, but may seem 'too Russian' for many speakers, so it is often avoided.

The construction with a prefix naj- and without a -jejš- suffix (najjasny) seems to have popularised by an automatic Russian-Belarusian translator "Belazar" in 2009 year. (Since most people speak Russian better than Belarusian, they just use a translator... And this is how this form got into press).

Most people dismissed this as an error, but it has been argued that this form has been in use in Orthodox Church books at least 10 years before[*] (and probably before), as a loan translation of Old Church Slavonic words, and these uses were intentional.

So, some people consider these forms as mistakes (even in church books), others think it’s an acceptable alternative.

So, this way, 'najjaśniejšy' is more valid than 'samy jasny', and 'samy jasny' is more valid than 'najjasny'.

[*] This was discussed here (in Belarusian): http://by-mova.livejournal.com/589391.html

{{vm.hiddenReplies[18140] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 2 décembre 2013 2 décembre 2013 à 19:28:38 UTC link Permalien

Well, so this problem might be specific to languages that are dominated by another imperialist one.
It's not the case of my own language...yet...

I'm curious to hear what English natives have to say.
Probably we will end up with Indian English being subrated and US English overrated...I'm wondering how Brits and Aussies will swallow it...

FeuDRenais FeuDRenais 1 décembre 2013 1 décembre 2013 à 23:36:48 UTC link Permalien

"It's just that in the case of sentences, the purpose of the rating is silly, because most people don't know their own language sufficiently we'll to be in that position."

This is equivalent to saying that a rating scheme would fail because rating sentences is too complicated, so again you contradict yourself. And by "rating sentences", I of course mean "rating sentences well", which you didn't seem to catch.

By your logic, there should be no attempt to decide on the quality of a sentence or a translation. In other words, there should be no OK tags (which are just a very brute rating system, by the way), no @change tags, no comments saying things like "this is incorrect". Because we don't know our own language sufficiently well to make those judgments.

Or are you saying that about the masses and not about yourself? In that case, you have once more demonstrated your ignorance, since who could rate sentences and how is something that goes into the algorithm's design - it thus becomes technical. In fact, I would propose that only advanced users be allowed to rate. But didn't you already know that?

Shishir Shishir 1 décembre 2013 1 décembre 2013 à 20:23:56 UTC link Permalien

>In today's world, a language is no longer defined by a handful of people due largely to the internet.

In Spanish there's still the RAE, a handful of people who make the rules and say what's right and what's wrong in a language. And I have to admit I'm really glad it's like this because otherwise Spanish language would be destroyed by its own speakers...

{{vm.hiddenReplies[18108] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 1 décembre 2013 1 décembre 2013 à 20:35:47 UTC link Permalien

I think you've oversimplifying.

{{vm.hiddenReplies[18110] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
Shishir Shishir 1 décembre 2013 1 décembre 2013 à 21:32:08 UTC link Permalien

oversimplifying how?

{{vm.hiddenReplies[18114] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 1 décembre 2013 1 décembre 2013 à 21:53:29 UTC link Permalien

I don't believe that it works as simply as you say. A handful of people cannot control the media, the internet, and the press to the point where they decide how the language is used. When a new idiom appears, it appears, spreads, and people start to use it regardless of official rules. If a certain grammatical construction starts to get shortcutted for practical reasons, then no handful of people are going to enforce, in today's world, that this shortcutting stop. If the handful of people make rules that are followed, it's because the people accept those rules and because they're needed. Rules that aren't needed are likely to be ignored or amended informally until they're amended officially.

You're also oversimplifying when you say that the language would be destroyed by its own speakers. You have to define what "destroyed" means first. Of the languages I speak, most don't have regulative bodies and some are even being assailed by outside influences, but even then I would not say that they're being "destroyed". Certainly, they are being taken farther away from their roots, but so is every language to some extent and one must ask "how much?" to qualify destruction. For me personally, a language is "destroyed" when it no longer has any speakers.

{{vm.hiddenReplies[18116] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 1 décembre 2013 1 décembre 2013 à 22:22:38 UTC link Permalien

>When a new idiom appears

Idioms don't just "appear" like that. Words and patterns are first introduced by experts in their fields or often snobs, or more often by illiterate people who transfigure sentences that they ignore how to use, or sometimes even as ways to hide your conversation from parents or cops - the case with French Verlan - then they are validated by linguists, to determine whether they are grammatical or not, short-term trends or not (remember the famous "walkman" that was all the rage, to the point that other words were deemed ridiculous and subsequently just disappeared when the "ridiculous" and academic alternatives such as "baladeur" still live on ? haha...the Academy won...yet again !), then writers use them..or not...and administrations and laws are written using the official words and...at the end of the day...what is official wins because we all need to agree to something to communicate, and it's better if the language we use is also the one in which our laws and regulations are written.
What you have in mind is just a few street-talk expressions that will live an average of a few months or years and will then be forgotten and fall in disuse.

100% of the "cool" non-official words that my father used to say as a youngster are in compete disuse nowadays...If I was to utter them, my son would just be dismayed. And it's the same for every language and culture.

You just ignore 99% of the actual language that people read, write in, work with, ...and that form the actual common culture that a language represents. And that 99% is taught, by teachers, in schools, with official grammar rules, syntax...

sacredceltic sacredceltic 30 novembre 2013 30 novembre 2013 à 23:28:32 UTC link Permalien

>So, if you add 100 sentences, 99 of which are "standard" and get good ratings, then your 1 "nonstandard" sentence would only lower the collective rating to, say, 99.

Then if I'm from Brussels (non-standard French) and add only Brussels sentences, I'll have it all wrong with a rating of zero, although my French is 100% valid ?!?

Where is the logic ?

{{vm.hiddenReplies[18083] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 30 novembre 2013 30 novembre 2013 à 23:37:48 UTC link Permalien

We are going in circles. This is the same argument as for the different varieties of Spanish.

If the two types are so radically different that a significant majority of your sentences are different, then treat the two as different languages for rating purposes. If the minority is different, then simply add a comment to the non-standard ones and expect that the raters will be reasonable. Or simply add an option to disable ratings for certain sentences, if you fear that they're going to be downrated.

Your challenges are good, but none of this is beyond a good algorithm. Next challenge?

{{vm.hiddenReplies[18084] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 30 novembre 2013 30 novembre 2013 à 23:50:44 UTC link Permalien

> then treat the two as different languages for rating purposes.

No. It's 100% French. You just don't understand diversity within a language. You're just a language fascist.

{{vm.hiddenReplies[18085] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 30 novembre 2013 30 novembre 2013 à 23:52:12 UTC link Permalien

What part of "FOR RATING PURPOSES" did you not understand?

Shishir Shishir 1 décembre 2013 1 décembre 2013 à 20:30:18 UTC link Permalien

the problem is the same as always, I personally just know what's said in Spain, not what's said in Argentina, Colombia, ... so, how can I know if a sentence is used also there or not?

{{vm.hiddenReplies[18109] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 1 décembre 2013 1 décembre 2013 à 20:36:53 UTC link Permalien

You make them different languages for rating purposes.

{{vm.hiddenReplies[18111] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 1 décembre 2013 1 décembre 2013 à 21:19:39 UTC link Permalien

That's theory. But it is not that simple. Where do you put the limit ? Just in Belgium, there are many different ways to speak French : Liège, Bruxelles, Mons, Namur...all different with many local expressions. All French though and as valid as any...

{{vm.hiddenReplies[18112] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 1 décembre 2013 1 décembre 2013 à 21:58:36 UTC link Permalien

Gee, I don't know... You sit down, you propose algorithms, you observe how they work, you make judgments, and then you refine them until they can do no better. And then, if it's still insufficient, you go to sacredceltic, you kneel down before him, and you say "Good sir, I should have listened to your wisdom! We were on a fool's errand, trying to somehow improve the quality of a site's contents via a rating algorithm, an approach that, though it is the standard for almost any professional site, is simply a hopeless endeavor for a language sentence site as divine as this one! Oh, how we have erred!"

Otherwise, you succeed and the site is better.

{{vm.hiddenReplies[18117] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 1 décembre 2013 1 décembre 2013 à 22:03:15 UTC link Permalien

No, the site won't be better the day when trolls like you, who overrate their language abilities, will control a silly rating system.

{{vm.hiddenReplies[18118] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
FeuDRenais FeuDRenais 1 décembre 2013 1 décembre 2013 à 22:11:26 UTC link Permalien

I highly recommend this for you:

https://www.coursera.org/course/thinkagain

Aren't you glad that learning is open to the masses now?

{{vm.hiddenReplies[18120] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 1 décembre 2013 1 décembre 2013 à 23:08:43 UTC link Permalien

> Aren't you glad that learning is open to the masses now?

In my country, it has been for over a century now...and for free !

User55521 User55521 1 décembre 2013 1 décembre 2013 à 11:47:40 UTC link Permalien

Your example fails to demonstrate the stupidity of a rating system.

{{vm.hiddenReplies[18093] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 1 décembre 2013 1 décembre 2013 à 11:53:34 UTC link Permalien

Probably because you fail to understand it...

{{vm.hiddenReplies[18094] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
User55521 User55521 1 décembre 2013 1 décembre 2013 à 12:09:52 UTC link Permalien

I've understood your point this way:
a) Different people have different ideas about what is 'a valid sentence'.
b) Most people’s idea about what is 'a valid sentence' is incorrect.

But this is exactly what is a rating+filtering system expected to fix!

Just choose the filter according to your idea of what is correct ('wisdom of the crowds', 'native speakers' rating', 'experts' rating'), and you’ll get only the sentences you need.

I really fail to understand how this example demonstrates the problems of the rating system. On the contraty, I think it is a good example why a rating+filtering system is neccessary in the first place.

sacredceltic sacredceltic 30 novembre 2013 30 novembre 2013 à 18:41:44 UTC link Permalien

Oh, not the crowd stupidity thing again !

Why not vote for mathematical theorems as well ? I would not be surprised if most people would vote against 1+1=2...

{{vm.hiddenReplies[18073] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
User55521 User55521 1 décembre 2013 1 décembre 2013 à 11:54:49 UTC link Permalien

Please read my thoughts again.

The ratings idea is flexible enough to accomodate for your vision too: just choose 'filter by the expert’s rating' instead of 'filter by everyone’s rating' and you’ll end up with a rating you might like.

{{vm.hiddenReplies[18095] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 1 décembre 2013 1 décembre 2013 à 15:14:41 UTC link Permalien

how would you know who are the experts ?

{{vm.hiddenReplies[18100] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
al_ex_an_der al_ex_an_der 1 décembre 2013 1 décembre 2013 à 15:56:00 UTC link Permalien

:D
A really good question! Jen vere bona demando! Eine wirklich gute Frage!

FeuDRenais FeuDRenais 1 décembre 2013 1 décembre 2013 à 17:55:47 UTC link Permalien

How would you know who the experts are now?

{{vm.hiddenReplies[18102] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
sacredceltic sacredceltic 1 décembre 2013 1 décembre 2013 à 18:09:29 UTC link Permalien

We don't. We just surmise...and often we discover that self-proclaimed experts are just teenagers who know shit...

User55521 User55521 1 décembre 2013 1 décembre 2013 à 20:02:45 UTC link Permalien

>> how would you know who are the experts ?

I’ve originally thought that anyone can choose a 'reviewer group' that is just a list of people.

Than, anyone can look at the description of group and decide if she trusts this group of reviewers or not. For example, if group #1 may have real names of people, group #2 will be anonymous, some would be more inclined to trust group #1, and so on. Each group may state what they mean by 'valid sentence', and anyone can look if they agree with it or not.

But, of course, we can come up with some other system. I don’t insist on this.

sacredceltic sacredceltic 3 décembre 2013 3 décembre 2013 à 19:52:58 UTC link Permalien

>Sentences with negative ratings will be deleted by script

Just chilling ! The Lukashenko's way to total language terror dictatorship...

{{vm.hiddenReplies[18146] ? 'expand_more' : 'expand_less'}} cacher les réponses montrer les réponses
User55521 User55521 3 décembre 2013 3 décembre 2013 à 21:25:37 UTC link Permalien

Lukashenko doesn’t really need Tatoeba to get language dictatorship. In fact, the orthographic reform of Belarusian language came with a law banning the use of all other orthographies in printed works (and this includes not only the pre-reform official orthography, but also the alternative standard Taraškievica that has a reputation of being 'less Russian').

__

As for deletion, having a low rating under all the possible filters means that everyone, even the sentence’s author, has rated it negatively. So, if there is at least one user who rates the sentence as OK, I think it should stay.

I.e. this is suggested as alternative to deletion (which is unavailable to ordinary users): just rate your own sentence as incorrect, and it’ll be deleted (unless someone else happens to like it).



Anyway, this part is not going to happen soon. If ratings would be implemented any time soon, they are likely to be on a separate site and not in the main Tatoeba code.