القائمة

---- Proposal for a Tatoeba Rating System ----
I apologize for the wall of text that is to come, but I’ve wanted to write something semi-formal considering the issue of quality control on Tatoeba for some time now, which, in my opinion, is present but not to the extent necessary for a site that now has over 2 million sentences in over 100 languages. The basic idea is a rating system, which I will present by first arguing for why Tatoeba needs one, giving some concrete ideas about how one could be implemented, and then finishing by giving the immediate benefits of such a system as well as potential drawbacks.
So…
1) Does Tatoeba need ratings for its sentences?
I think since the inception of Tatoeba the answer has been “yes”, as I remember the original guide by Trang saying that “owning” sentences was a good way to ensure quality but that ultimately ratings would be needed, although this was never implemented. Ratings have been discussed several times in the past on the Tatoeba wall/sentences, although no definitive conclusion has ever been reached and, again, nothing implementable was proposed. Current practices for ensuring good sentences and translations include:
- OK tags (as originally started by CK for English)
- Other tags (e.g. needs native check, translation check, change) to point out potential mistakes
- Comments and discussions
- Encouraging users to translate into their native language only
- Encouraging users to state what languages they know and how well they know them in their profiles
Although all of these work on a small scale, they are not able to cover the full 2 million sentences and who-knows-how-many translation links. Furthermore, these are subjective – a tag is only as reliable as the user, and comments, though sometimes reaching consensus among several native speakers, sometimes do not. The second-to-last point is a bit of a debate, since many users like to practice translating into languages that they haven’t mastered (myself being one), and some new users simply aren’t aware that translating into their native language is encouraged. The last point is also subjective and not something that is done by all users.
My overall feeling is that Tatoeba remains unreliable. Or rather, it is reliable for *me*, because I’ve been here for long enough to know who the trustworthy users are (and they are not always Advanced Users and Corpus Maintainers), and will treat their sentences/translations as reliable and take the rest with a grain of salt. The new user who just joins Tatoeba has no idea, however. Someone who is not a user and simply wants to use Tatoeba for a quick reference will also have no idea regarding whom to trust. A number on a sentence that would convey some sort of statistical reliability would be worth a lot.
2) How would a good rating system function?
Not sentence-by-sentence. Back in the early days (2-3 years ago), this was the major criticism of implementing a rating system – there were simply too many sentences, and it was unrealistic to expect multiple users to rate each one of them for quality, as well as all of the translations. It’s even less realistic now, since Tatoeba has grown a bit in sentence quantity but not that much in the number of active users.
However, it is this last point that can be exploited, as Tatoeba succumbs in no small way to Pareto’s rule, with 80% of the contributions being the work of 20% of the contributors. Actually, the former might be larger and the latter smaller. Some days it feels like there are just 10-20 users contributing 95-99% of the sentences. This makes rating all sentences and translations possible simply by rating the users, who are a lot fewer in number.
So, what’s a good scheme for rating users? Let’s just say that the ratings will be between 0 (unreliable) and 100 (full confidence). It’s probably not good to have one such rating, since a user could be really good at writing, e.g., sentences in French (100), but be horrible when it comes to Mandarin (0), and lumping those into a score of 50 is misleading. There should be a rating per user per language. Additionally, there needs to be a distinction between writing natural sentences and writing translations. As such, a user should have a rating for every possible language pair as well (at the moment this would be 119*118/2 pairs). As the number of languages grows, these numbers will increase, but storage should be feasible, especially since any rating set will be extremely sparse (i.e. most users only translate between, let’s say, 10 language pairs at most, and not 119*118/2).
For the actual implementation, it would be sufficient to have a thumbs up/down for every sentence that has an owner (to rate how natural it is) and a thumbs up/down for every link (to rate the translation). I would recommend using a weighted rating algorithm that would work like this:
i) New User gets 0 positive points and 100 total points upon joining. Their rating is 100*(positive/total) = 0.
ii) If their sentence/translation gets rated by User A (who has a rating of X for the *same* sentence/translation type) and the rating is positive, then New User’s new rating for that specific type is now 100*(positive+X)/(total+X). If the rating is negative, the new score is 100*(positive)/(total+X).
Here’s an example for User N as rated for the naturalness of his English sentences:
User N joins (positive = 0, total = 100, rating = 0). User N writes an English sentence. User A (whose has an English rating of 50) rates the sentence positively. User N is now: positive = 50, total = 150, rating = 33.3. User B, who has an English rating of 10, rates the sentence negatively. User N is now at: positive = 50, total = 160, rating = 31.25. If User N now adds a French translation and links the two, they have three different ratings (one for the naturalness of their English sentences, one for the naturalness of their French sentences, and one for the quality of their English-French translations).
Starting with 100 total points is a way to prevent a user from jumping up to a rating of 100 right away with a single good rating, and the weighting makes it possible to give more power to users who have already been recognized as trustworthy with respect to particular languages or language pairs. Of course, we couldn’t initialize everyone with a rating of 0 since we wouldn’t go anywhere. So, I would propose picking a single representative for each language / language pair that is trusted based on the community’s experience and giving them a starting rating of 50 for that language / language pair.
3) Benefits and potential drawbacks
i) Every sentence and translation that has an owner would simply inherit the owner’s score, thereby immediately providing a 0-100 number that would tell the naïve visitor if the sentence/translation was trustworthy. This would give some indicator of the reliability of the majority of sentences/translations. The drawback here is that one still needs to interpret the score somehow (i.e. a score of 86 doesn’t have any concrete meaning – it doesn’t mean that there is an 86% chance of the sentence/translation being “right”).
ii) Rogue contributors (who put up bad translations for fun) or new members who didn’t read or don’t care about the rules will be handled very efficiently. Instead of having to correct all of their added sentences or, in some rare cases, to even block such users’ access, it is sufficient for a few reliable users to downrate a few of their bad sentences to make it clear that these sentences are not trustworthy. In fact, not rating up is sufficient, since the rating of any new user would be 0 by default.
iii) Arguments would hopefully lessen in their quantity/intensity as well if there’s a “neutral” rating system in place that’s somehow based on collective opinion. Well, I hope. This may or may not happen.
iv) The debate over whether or not to translate only into your native language would disappear, since bad translations would just reflect badly on the user (either discouraging them from translating or simply accepting the fact that their translations in their non-native languages will not be viewed as reliable).
v) A certain drawback is that such a system would not really do much for languages or language pairs with very few contributors (due to the monopoly on those languages by the few contributors), but I don’t see how one could really solve this problem without getting more contributors…
vi) Storage and more database calls could be a drawback, but I don’t see this as being that bad. In the worst case, you can imagine Tatoeba with 1,000,000 users and 6,000 languages, which would result in 6,000,000,000 (naturalness) + (6,000*5,999/2)*1,000,000 (translation) total ratings to track. That’s a lot, but that’s the worst case and will never be achieved (as many users only work with a handful of languages, not 6,000).
Anyway, that’s all I’ve got. Just wanted to put it out there as an idea, since I think it could solve a number of issues that Tatoeba has with respect to the reliability of its sentences (as well as other things). Might be worth trying, and it would be easy to remove if it doesn’t work. I suggest a big discussion on this regardless, since this appears to be a major problem with Tatoeba at the moment.

La sagesse des foules est une crétinerie collective.

I see another drawback that has already been mentioned in the previous discussions: some languages are spoken in different countries in different ways, so what is correct and natural in Mexico may not be so in Spain, or what is said in Spain might not sound natural in Argentina, and so on. Or would just the users of the same country be able to evaluate the quality of the sentences of that user? And if so, shouldn't it be compulsory to fill the profile with the country of origin of the user and the language they speak? Because so far it's just encouraged, but we've got lots of members who don't indicate that and more than once I've found myself wondering whether a sentence had been written by a non native or by someone who simply spoke another dialect of Spanish that's not mine.
On the other hand, I agree that something should be done about this, I know who the users I can trust are, but someone who uses Tatoeba but is not a member wouldn't.
I also think that in order to improve the reliability of Tatoeba something should be done about the tens of thousands of orphan sentences, specially in the case of English. I'd vote for adopting them and changing them if necessary to make them sound natural. If the previous version was also correct, hopefully some other native speaker will either point it out or add the original version of the sentence as another translation. This way at least we'd have more reliable sentences instead of a bunch of sentences that we don't really know if they're used.

The regional differences present a challenge, but my counterargument would be:
If the differences are so significant that half of the sentences of a Spanish speaker in Mexico are judged as unnatural by someone in Spain, then Tatoeba should split the two languages into Spanish (Mexico) and Spanish (Spain).
If, however, only 1 out of 20 sentences are judged unnatural, the user from Mexico will still receive enough good ratings for the other 19 to be deemed trustworthy. It should average out in the end - that's the power of the sagesse des foules (which, though I agree with SC, is not always ideal, can still be quite useful when it comes to simple tasks).
The most difficult cases, I guess, would be the borderline ones - i.e. the ones where you don't know if a language needs splitting.

For that specific point I don't think WE (tatoeba/tatoeba's user) should decide what is a language , what is not, where to split etc. because either you don't speak the language and well you can't decide, or you're a native speaker and you're maybe wrong about what are dialects and what are different language
A typical example being in China, most of people would set for example all the chinese languages in the group "dialect of mandarin" though for example the Shanghainese dialects does not belong to it.
At the opposite you would have people of two close cities putting their dialect as two differents ones though they belongs to the same language
And even if you get the tiny percent of user that are knowledgeable enough in both linguistic and that language, the "where to split" will IMHO put endless discussion and even if at one time people find a compromise to split in a way. What if two weeks later a new user arrive and propose, with insightful reasons, a new splitting ?
For that reason I think it's better to rely on a standard , iso 639-9 alpha 3 for languages, and latter iso 639-9 alpha 5 for dialects (note in that case dialects will be simply meta information, it will not be something that will replace the current language code but rather complete it)
Using a standard for the splitting also have the huge advantage of making tatoeba's data easily usable by other projects.
When thinking of that problems I was thinking of something
First to have a table field in the user information about what languages they speak and for it two fields
language1 level native? (I put native as a separate level and not as the highest level, as someone with a very low level of education can be native and have less commitment about the grammar than a foreigner who have work on that language for 30 years)
So that we would be able to do computation about it. (of course it would be possible for moderators to "force and lock" the level of someone if that person overevaluate himself)
When your level is less than a given level your sentence will go in some kind of waiting room, visible only to registered users and not exported so that it will not harm the quality of the course while still permitting people to work in pair
for example I don't speak a very good shanghainese, but it's easier to try to translate it and then ask a shanghainese native speaker to help me improve it, rather than to find a shanghainese native speaker who speaks french. Right now it would harm the quality of the corpus because waiting to find a correct translation, my approximative translation would be visible/exported.
Of course advanced contributors that are native speakers or with a level higher than X would be able to do a first validation to put that sentence in the main database
For the others it would go directly in the main database (basically the one we have right now) which would be of medium reliability. That one would get exported as it's reliable enough for most of usage. If at anytime a sentence seems weird, they could be put in the "not reliable database"
After you will get corpus maintainers that will be able to put the sentence from the main database and with one button to put it in a "certified" part. of course we can discuss about if it need one corpus, or two or three or more complex rules to "validate" a sentence
Don't worry for the "separate" database I already know how to easily implement that in the back office
I think that things is a good comprime, as
1 - we will be able to make the difference between 3 "reliability about quality" level
2 - we will still be able to keep the "rare and hard to prove" sentence
3 - still permit people to collaborate while keeping the quality intact
4 - avoid the problem of "idiocracy" has it would qualified user that will decide (and of course promotion to advanced contributors is given to people you trust enough to not be bigs liars, and even if they are not flawless the middle database is not advertize as "100%" reliable anyway)
5 - corpus maintainers will be able to slowly create a very "pure/premium" database at their own pace
So that we will have
1 - low reliability database (huge amount of data, may be useful for stasticical computation etc.)
2 - middle reliability (not that huge but reliable enough for "learners" , I mean I don't care if i see a missing "s" in a french sentence as long as it permit me to see the translation of a chinese sentence, also with maybe "rare" sentence hard to find proof about it)
3 - high reliability one (low amount of data but at least you're sure about it, useful for teacher etc. creating material for example , and keeping in mind you may not find in it "rare/ rare dialectal" construction)
Also for the last "level of reliability" it's open to discussion if you trust by default (i.e a corpus maintainers can alone put a sentence in it and you need to prove it's not correct to remove it), which is maybe suitable as we're supposed to have a high trust in corpus maintainers
Or a "guility by default" and the corpus maintainers need to prove the existence of the sentence (books, grammar reference etc. ) for others to validate it?
What do you think?

I didn't see the other thread
so to answer at the time about the "split tatoeba in two" which my proposition for "rating" use
on the interface you will see nearly no difference, same website, same form. It just that your sentence would fall automatically as "not reliable" or "maybe reliable" depending of your "rank" (that you first set by ourself and that can be forced by administrators if any abuse)
1 - It's just after as you will never see the not reliable ones ,without explicitly looking for them, appears in search result or random sentence (we can still imagine a possiiblity to have an option to say "I want to see them in my random/search results even if not precised" and have an little icon to precise to which category the sentence belongs (not reliable/medium/premium)
2 - you will have a way to differentiate them too in the export files

sounds awesome

Sounds like a good idea. You've clearly thought this through.
I haven't run into many bad sentences myself, but I've only been looking at one tiny corner of the corpus, namely Hebrew sentences (which I've been translating into English). These are owned by a small number of people, all of whom know Hebrew well.
One comment: instead of using worst-case numbers (e.g., 1,000,000 users and 6,000 languages) and then saying that such figures are unlikely, I think your argument would be better bolstered by realistic figures throughout.

I like worst-case scenarios since then no one can blame you if what you propose crashes the site :-)
I would let people with real database management experience comment on how realistic it is to handle such a thing, but I guess a better way to put it would be that the data would scale linearly with the number of users and quadratically with the number of languages. So... O(mn^2)?

We've been actually discussing similar ideas on IRC at least twice in the last year. There were no consensus though; I think the matter is, sadly, even more complex than what you stated.
Firstly, what we need to acknowledge is that simple ways of rating are very easily cheated. Tatoeba strives for not only “I like it”-type rating, but aims at making data useful also in scientific contexts. When dealing with such a crowd of people, especially most of them being not linguistically-trained, you need to care not only about raw values, but also psychology factors. People on IRC told me that there also used to be a significant opposition among contributors against introducing a rating system—I think it was exactly because of those issues. Therefore, a more scientific-like way is necessary.
Some ideas that were floating around during the IRC discussions were:
* Asking specific questions about the sentences, instead of simply doing up/down-votes. Like: “Do you think this sentence is grammatically correct?”, “Do you think this sentence is a natural way of speaking?”, “Do you think this sentence uses region-specific language?” and so on… For possible answers, again, pick descriptive rating instead of numerical: “Yes”, “No, because [fill in the blank]”, “I don't know”. Also, allow users to state why they think so. This would reduce the “I like it”/“I don't like it”-type rating, as well as gather much more specific information, which can be later used in resolving the issues by corpus maintainers.
* Handling false negatives. Instead of trying to impose a strict relation between user ratings and perceived correctness, just flag suspicious sentences with a tag like “moderator attention needed”, so that corpus maintainers will know these sentences need special care. The corpus maintainer might then find out whether the sentence is actually correct (in any dialect) or not. Assuming that most of the sentences will actually be correct, the rating system would decrease amount of attention needed by corpus maintainers, and provide them useful information for the troublesome sentences.
* Introducing a blind voting scheme. Instead of allowing every user to vote on every sentence, pick a random set of sentences for every user. The sentences should be chosen based on the specific user skills. Also, anonymize them (ie. do not show their owner).
I would be *very* careful about giving users numerical scores without gathering real data about the sentences firstly. It would be possible to devise such a scheme using some nice statistical and data-mining methods, such as classifiers. Classifiers would allow to estimate real probability of sentence being right, or being a dialectical variant, or wrong, given earlier contributions of the user, but only if we already had some data on the sentences firstly. Making up different equations won't help unless it will be proven that it won't be easily cheated, and that it will actually be useful…
And, now, the biggest problem: we can talk about introducing rating system ad inifinitum, but without programmers, this talking will be pointless. Yesterday I wrote a post calling for coders-contributors—actually with a hope to implement the above. As for now, nobody sent any reply… I'm going to wait for a week or two though. I might be able to arouse some interest in the issue at my university, but given that it's not very probable, I don't want to talk about it now.

While I agree with you on mostly everything, I will emphasize that rating *sentences* is probably a hopeless task, as there's too many. Especially with specific questions that require thought.
The point of the rate-up/down is to "statistically" (I use the word very loosely) rate users. This way, someone browsing the site can see that a certain sentence was written by someone who has a rating of 90 (i.e. is fairly reliable), and that the translation of the sentence to another language was done by someone who has a rating of 20 (so, the translation may have issues). The idea is not to rate sentences one-by-one. And, depending on how elegant you make the statistics, it can certainly be made scientific.
I do agree that all rating systems can be cheated, but it's much harder to cheat a weighted system. Even if you were to create 100 accounts and to rate yourself up this way, your votes would count for very little compared to those of a reliable user (i.e. a single bad vote from someone with a 100 rating could undo your 100 good votes).
Blind votes... I was thinking about this, and although it's appealing "scientifically", I think it kills a large community aspect, which is, IMO, one of Tatoeba's good points.
And yes, without programmers, talking may be pointless, but at least it sets up a reference for programmers to go back to when they actually appear :-)

Ok, then let me consider a simple rogue scenario. I hope I understand your scheme well—please double-check my code. We've got a trusted user with rating=99 (which means that he had to get at least 10000 “positive” rating points). A rogue user creates two accounts, and somehow gets a single up-vote from the trusted user on one of those accounts. According to my calculations [1] he's able to get both of those accounts to rating=90 after introducing just 25 sentences and rating one account from the other—this can be done even manually, without scripting, in few minutes…
[1] https://gist.github.com/d552cea2ea5b60f0b22f

"Flaky answer": Yes, he could, but "just" 25 bad sentences would likely be noticed by good users and downvoted during that time.
"More rigorous answer": I would employ additional safeguards in the formula (e.g. penalizing for diversity of ratings). I would also put an upper limit on rating frequency (e.g. a vote per minute), thereby giving time for non-rogue users to undo the positive self-ratings of the rogue and to limit the scripting. Finally, I would assume that few people would go to such lengths to break a rating system on a language site like this one.

> 25 bad sentences
They don't have to be bad. They can be simply taken from other sources, which will make them correct without rogue user having to even know the language well.
> additional safeguards […] (e.g. penalizing for diversity of ratings)
For diversity? Not sure how this could help…
> upper limit on rating frequency
Only means a rogue user would need more accounts. Not more sentences.
> few people would go to such lengths
Social issues make people do strange things, like… well, let's not go back to the recent discussion. And, if you want a trusted rating system, you need to make it robust and resistant to these kinds of attacks. If it won't be robust, nobody will put trust into it—like in the current ownership system.

Probably starting to bore people with these posts (let's just continue in PMs if you want to, as this is starting to go into details... but what the heck I'll post one last one here!)
"bad": I agree, that's tough. But then the sentence is natural, and everything is okay (in a strange way). So, you've achieved what you set out to achieve. It might get deleted later, but the quality of the language is at least there.
"diversity": I meant, if all of your good ratings come from the same 1 or 2 users.
"more accounts": Put a limit on account creation and make it more difficult to create an account. This is a problem already, if I'm not wrong. There's not even a confirmation e-mail for Tatoeba.
"robust system": I agree completely.

Your code is right, by the way.
Also interesting to note:
If the good user downrates a rogue's sentence once for every five positive ratings the rogue gives himself, the rogue's rating reaches a maximum at 80.

I like per-sentence (and per-translation) thumbs-up/down. If the voter is logged (to prevent repeat votes and allow mind-changes), the accumulated votes could be used for a range of derivative calculations and applications, including ratings of contributors (and raters!) and apparent divisions of the corpus.
To the degree that languages are shared conventions, bad ratings for good sentences may accurately indicate sentences that are outside the main stream. If one of my sentences collects bad ratings, I shall be motivated to provide a comment that justifies the sentence. Those looking for main-line sentences will be happy to be spared both that sentence and its defense.

>To the degree that languages are shared conventions
But they're not. English is, but that's an exception.
If languages were shared conventions, they wouldn't need to be taught.
For most languages outside English, sentences are either correct or not correct and it takes long years of education to make them out. The fact that millions of uneducated morons think that a correct sentence is wrong or that an incorrect one is correct is irrelevant.
That is why raw results from search engines, that are often conjured up to prove a point, are irrelevant for these languages.
In French, you can find more wrong sentences than correct ones on Internet, in many instances, just because French is a popular 2d, 3d or 4th language, and so non natives far outnumber natives on the web, not even counting all the illiterate natives...
What you propose is the reign of mediocrity.

All languages are shared conventions: lexicons and grammars do not drop from heaven; and communication is successful just to the degree that speaker and hearer agree on the conventional meaning of the words and structures of the languages they share. This is a commonplace of linguistics and of language learning and teaching. It is for just this reason that languages, like all other conventions, must be learned.
Most languages have few speakers and no academies. When they are taught formally, it is by non-native speakers to non-native speakers, with occasionally a native informant in the wings. Correct is what the still surviving native speakers say to one another, understand when they hear, and accept as correct.
Prescription is effective only in the absence of substantial dissent: persistent deviation from academic prescriptions results in changed prescriptions. Numbers matter. Language existed before schools, government, or law; and solecisms have been around since the first language that was spoken by more than one group.
When languages--like French, English, Arabic, and, Malay--are officially established in more than one country, usage varies; and what is correct in one may not be correct in another.
What I propose is not the reign of mediocrity, but the rational, economical, and effective use of crowd sourcing to identify the middle of the road. The premise is that there is utility in identifying sentences whose correctness is broadly acknowledged. What is irrelevant is the existence of correct sentences of limited interest and utility--particularly to less than expert students.
The pairing of correctness and utility is suggestive. Reasonable people care about correctness precisely because it is useful. Maybe we should be wise to rate sentences on utility, with correctness just a part of the mix.

>What is irrelevant is the existence of correct sentences of limited interest and utility--particularly to less than expert students
I fully disagree. One of my usages of Tatoeba, and I'm not alone here doing this, is to store rare sentences. I even inserted correct sentences that are nowhere else to be found on Internet, Tatoeba subsequently becoming the prime source for these. It is one of my objectives that Tatoeba becomes a reference of correct sentences, and not a reference of broadly misconstrued approximations by amateur learners.
In fact, Tatoeba also serves, in some cases, as storage for rare and dead languages so, in this case, it's usage that is irrelevant, because many of these are not used anymore...
You may decree that the sole purpose of Tatoeba is education of the masses, but it's merely your desire. Nowhere on Tatoeba is this purpose asserted and that is certainly not why I joined. Tatoeba is a reference for sentences translations. That's all what is of interest to me.
Utility is something subjective. Many people think Latin or Esperanto are useless, but I think otherwise. I don't care a fig about what the majority deems useful. The majority loves soccer and I hate it. So what ?

+1

+1

>All languages are shared conventions: lexicons and grammars do not drop from heaven
Initially yes. But eventually not always. Many times in history, both ancient and recent, have languages and the way they're used and written be decreed. In some instances, as in French, German, Spanish, Mandarin, Russian, ( spoken by just half the world population...), this decreeing is a permanent process.
Contrary to your belief, There are only 2 kinds of remaining languages that are not decreed : English, and all the languages that it is replacing and that are on the verge of extinction, at a rate of about a dozen each year...

Mandarin is so little a shared convention, that its name initially means "Language of the Ministers"...which reveals it was certainly not initially the language of the people on whom it was subsequently imposed, and very successfully so, since it is now the native language of 800+ million people.
Similarly, it was the French King François Premier who imposed the monarchial Loire patois to the rest of France through an edict in 1539 http://tatoeba.org/fre/sentences/show/1988591
A language is nothing democratic http://tatoeba.org/fre/sentences/show/734038

>> What is irrelevant is the existence of correct sentences of limited interest and utility--particularly to less than expert students
You are right to disagree with this assertion, since it is an intentional overstatement intended to reflect and to parody your own hyperbolic reference to relevance. My true view is that many sentences in our corpus (and many of my own contributions) are of primary interest--and therefore relevance--only to specialists.
One of the attractions of any sort of (raw, not weighted) rating system is that, as the number of ratings grows, it will eventually be possible to compute statistically meaningful estimates of the degree to which individual raters are in step with Tatoeba's thousand users. As you correctly point out, being out of step with the mass may be a good thing. The dispersion of this measure will reflect Tatoeba's success at simultaneously serving a variety of disparate purposes.
Your views on the efficacy of academic norms are interesting; but you must expect that to persons knowledgeable about language, linguistics, and sociolinguistics they will appear overblown, quaint, and--frankly--simplistic. In particular they pay insufficient attention to the disparity between the largeness of the varieties of speech in communities of speakers numbering in the tens of millions and the smallness of all such acadamies, which were more effective in a bygone era characterized by a greater reverence for authority and a feebler understanding of the nature and the power of the engines of language change.
> Contrary to your belief, There are only 2 kinds of remaining languages that are not decreed : English, and all the languages that it is replacing and that are on the verge of extinction, at a rate of about a dozen each year...
To mention only the first counterexample that comes to mind: Navajo. Do you suppose there is an acadamy that attempts to dictate the usage of, say, Comorian? What about Low Saxon? Malagasy? What academy provides decrees for Modern Standard Arabic? No matter, the single example of Navajo suffices to show that your assertion is an unsupportable exaggeration.
> Mandarin is so little a shared convention, that its name initially means "Language of the Ministers"...which reveals it was certainly not initially the language of the people on whom it was subsequently imposed
This assertion seems to indicate a lack of understanding of the phrase "shared convention." In the present context the phrase has nothing to do with voluntarism, but simply indicates a learned agreement regarding the (arbitrary) meanings of words and the structures that organize them.

>your views on the efficacy of academic norms are interesting; but you must expect that to persons knowledgeable about language, linguistics, and sociolinguistics they will appear overblown, quaint, and--frankly--simplistic.
In the anglo-saxon world. And only there...
I know native English speakers have great difficulty to understand this because English was imposed militarily to half of the world for some time, now, and it now looks like it came down from heaven, but languages, including English, are mere political instruments before being shared conventions. Most of today's speakers of English have been imposed this language and its form, whether they realise it or not.
It's ironic that you mention Navajo, because it exemplifies this central point: many Navajo children have been kidnapped from their parents by the US army, in order to educate them in an "official US English" in a very violent but almost successful attempt at eradicating their culture through the eradication of their language.
As for Malagasy, you're all wrong. Malagasy, like a MAJORITY of official languages HAS an Academy, and everything about it is political. Malagasy is an entirely normalised dialect, and it's constantly being normalised by its academy as a power's arm, in order to unify the country and its government, the same way as French, Mandarin...
And as for Low-saxon, although one courageous contributor worked very hard to make it the 18th language on Tatoeba (it used to be even 15th...), he is the only one able to correct himself here. Low saxon will be dead when our children will have our age. Maybe an Academy would have helped save it...although it was the successive German governments that did their best to create one Germany with only one German.
Again, no agreement, no shared whatever, just politics, education and force.
> the smallness of all such acadamies, which were more effective in a bygone era characterized by a greater reverence for authority and a feebler understanding of the nature and the power of the engines of language change.
That's where you entirely miss it. I could provide, and I have in the past on this wall, countless modern examples that prove that Academies are actually effective and extremely successful at imposing their choices.
My favourite is the nice French « baladeur ».
At the time I was still a child, 90% of French people actually laughed their heads off, the day the French Academy coined this lovely replacement for the unpronounceable - along french norms - "Walkman". 40 years on and "Walkman" is gone and we have this : http://www.amazon.fr/LECTEUR-MP...8078179&sr=8-9 (Hello...Amazon is US...so even US organisations follow the French Academy...hmm)
In French, in the long run, the Academy prevails ALL THE TIME, so it's hardly bygone nor small...
And the reason is simple to understand for a non US person: France, like a majority of states outside the USA is mainly state-run (much less than Russia or the PRC, but still...)
That means that the State commands all aspects of life : juridical, educational, cultural, economical. And the French state, as the Chinese state and many others, understood very well that language is a key political tool in their hands and they never miss an opportunity to use it.
So, in France, the Academy is not at all a mere gathering of senile authors, as they are often caricatured in the anglo-saxon world. They have the power to advise the state (and this is their prime mission) which, in turn, uses their recommandations to direct the administration and dispatches instructions to 2 million civil servants, including 1 million in our education system and these instructions apply to all instruction manuals, courses, contracts, patents... These instructions are of course binding.
You might argue that this applies only to the "public" sector of the economy (which happens to be only 50%...) but you'd be wrong, because private organisations and businesses have to follow as well.
For instance, in France, a private contract that doesn't use legally French vocabulary is VOID. Write them the way you wish at your own risks...
>indicates a learned agreement regarding the (arbitrary) meanings of words and the structures that organize them.
Mandarin is the very contrary of an agreement. Although it is derived from actual former dialects, It is a very sophisticated construction by scholars that is being, up to this day, successfully imposed on populations through force and cultural imperialism.