clear
swap_horiz
search
TRANG
2015-08-01 20:12
** New feature: Reviewed sentences / Users corpus **

I've been working on a new feature the past couple of months and it's now finally ready to be tested on the dev website (https://dev.tatoeba.org). I think this is a feature that a lot of users will be happy to have, since it's a step towards improving the quality of Tatoeba's data.


# What's new?

1) You can mark sentences as "correct", "incorrect" or "unsure". http://prntscr.com/7zprx9
2) Each user has a corpus. When you mark a sentence, it is added to your corpus. http://prntscr.com/7zps39
3) On the sentence's details page, you will be able to see who marked which sentence. http://prntscr.com/7zps9d


# What does this imply?

A lot of tags are going to become useless, such as "OK", "@change", "@check", "@needs native check", or basically any tag that is suggesting the correctness of a sentence.

Everyone will be able to mark sentences for verification because the new feature is available to everyone, as opposed to tags which are available only to advanced contributors and corpus maintainers.

We're going to shift towards a system that is more tolerant to "wrong" sentences. Not right now, but we'll slowly get there.

Based on how everyone is evaluating the correctness of a sentence, we will be able to calculate a score for each sentence. With this score, we can display an icon next to each sentence so that users can have a quick idea whether or not the sentence can be trusted.


# I need your feedback

What I've implemented so far is only the minimum needed for the feature to be useful. There are still a lot of things I have in mind, but before going further, I'd like to make sure we're starting on solid foundations.

So please test the feature on the dev website (https://dev.tatoeba.org), and let me know if there is anything that you find confusing, or if there's any improvement that you think is very necessary before this feature gets released.

I'm planning to include this feature in the update on August 10th.

Thank you!
hide replies
alexmarcelo
2015-08-01 21:24 - 2015-08-01 21:30
I personally find these resources very useful, but I would limit people to only "mark" sentences in the language assigned as "native" in their profile, except for Latin, Ancient Greek, Esperanto and alike.
hide replies
TRANG
2015-08-01 22:49
I don't think such limitations will benefit the project in the long term.

I prefer to go for a more open policy, and let everyone use the feature as they wish, to collect as much data as possible.

We'll keep an eye on how people are marking sentences, we'll find solutions to help/encourage people to mark sentences more accurately, and we'll figure out how to detect false positives (i.e. people who mark correct sentences as incorrect or vice versa).
hide replies
alexmarcelo
2015-08-02 15:00
Well, then it would be interesting to have explicit written rules or recommendations about how and by whom this resource should be used. I'm Brazilian, living in Spain now and my Spanish is quite good. Does that mean I should feel comfortable to mark sentences in Spanish, too?
hide replies
TRANG
2015-08-03 21:43
> I'm Brazilian, living in Spain now and my Spanish is quite good.
> Does that mean I should feel comfortable to mark sentences in Spanish, too?

That's up to you. You will know better than me how you feel.

If you do feel comfortable, then you can do it. If you don't feel comfortable, then you don't do it.
bandeirante
2015-08-02 08:44
That is a sensible idea, but I'm still opposed to it. Sometimes, not often but occasionally, a well-versed outsider is just as competent, if not more competent, than a native speaker. Happens from time to time. Of course, one should be humbole about their own ignorance.
alexmarcelo
2015-08-01 21:27
Keeping people from marking their own sentences would be a good thing, too.
hide replies
Ooneykcall
2015-08-01 21:38
Automatically mark sentences as verified by the user who posted them (if they're a native, anyway) would make more sense, I think.
hide replies
Impersonator
2015-08-03 10:59
I’m not sure this is a good idea. Not marking your sentence or marking it as 'unsure' will be equivalent to adding @NNC to your own sentence — a very sensible thing to do.
hide replies
Ooneykcall
2015-08-03 17:57
I don't see why I shouldn't be able to mark my own sentences. I am also a speaker, so my opinion should count as well.
hide replies
lovermann
2015-08-03 18:44
The main problem is, that there's no chance to verify, if the user is a native speaker or not, I guess.
PaulP
2015-08-05 08:27
> Keeping people from marking their own sentences would be a good thing, too.

Yes. Yesterday during the tests I marked one of my own sentences as "correct" because I completely forgot that it was my sentence. I unmarked it afterwards, but I would prefer that marking own sentences as "correct" is impossible. Marking them as "unsure" could a good feature though. I guess we all are sometimes unsure about some sentences, even if we are very fluent in that language.
hide replies
TRANG
2015-08-05 09:12
Why do you feel it's bad to mark your own sentences as "correct"?
hide replies
PaulP
2015-08-05 09:19
Now we are not allowed to "OK" our sentences. I don't see why the new system would be different.
hide replies
TRANG
2015-08-05 09:28
But don't you think it would be better if people were able to mark their own sentences as "OK"? It gives an indication that they have double checked it.
hide replies
Guybrush88
2015-08-05 10:38 - 2015-08-05 10:39
not in every case. I still see self-tagged sentences in English with the 'OK' tag that have typos or missing verbs or any other mistake, so they're cleary not OK
hide replies
Ooneykcall
2015-08-05 12:10
Marking will be different from tagging though (if I understand correctly), in that many people can mark the same sentence, so marking would be an opinion piece, while tagging is generally factual.
TRANG
2015-08-05 15:13
Contributors are not supposed to be able to tag their own sentences with "OK"... So in theory it's impossible that you've seen such sentences.
hide replies
Guybrush88
2015-08-05 15:25
I'm referring to sentences by CK, who is the only who can self-tag his own sentences
hide replies
TRANG
2015-08-05 16:09
To clarify, CK cannot self-tag OK his own sentences.

These tags were imported as a request from him, to help him not lose some of the data that he reuses in his projects after we would start running the deduplication script.
PaulP
2015-08-05 14:20
> But don't you think it would be better if people were able to mark their own sentences as "OK"? It gives an indication that they have double checked it.

No. It clearly doesn't, as Guybrush already said. Proofreading your own sentences is one of the most difficult things. Because people have the feeling that their own contributions are OK, they don't really double check.
hide replies
TRANG
2015-08-05 15:44
I agree that for most people it is harder to find something wrong with your own sentences, and therefore when you mark your own sentences as correct, it will be less significant than if others mark it as correct. But it doesn't mean that everybody is incapable of finding mistakes when re-reading their sentences.

Let's say that when you proofread yourself, you can notice 30% of your mistakes, and when somebody else proofreads your sentences they can find 90% of the mistakes.
In the end, even if it's only 30%, you will still find some mistakes while proofreading. It will result in less mistakes so it will still be beneficial that you have tried to proofread your sentences.
hide replies
Selena777
2015-08-05 17:15
Personally I don't see any difference, if I check my own sentence a few days or more after it had been written, or someone's else sentence. I even don't always can remember, that the sentence, I see as random, was written by me.
alexmarcelo
2015-08-06 21:58 - 2015-08-06 21:59
> Let's say that when you proofread yourself, you can notice 30% of your mistakes...

If this rating system is to be implemented to point and correct mistakes, then it's a real white elephant project... we already have more effective ways of doing that.
alexmarcelo
2015-08-06 21:53 - 2015-08-06 21:53
When a person adds a sentence or a translation, doesn't that mean that the person automatically agrees with their own sentence/translation? Wouldn't the positive mark be ambiguous in this case?
tommy_san
2015-08-05 16:01
> Marking them as "unsure" could a good feature though. I guess we all are sometimes unsure about some sentences, even if we are very fluent in that language.

In that case, it would be better to request other members to evaluate your sentences.
https://tatoeba.org/wall/show_m...#message_23736
Ooneykcall
2015-08-01 21:48
What are we going to do with language varieties? Sentences from one variety (e.g. Australian English) may seem wrong to a person who uses a different variety (e.g. American English). I mean, if most speakers consider a sentence to be 'wrong', that doesn't necessarily mean it's bad. It could be that it is only used by a certain group of speakers. Assigning it a low score based on majority opinion, though, may make it look like it's a bad quality sentence that shouldn't be trusted/used at all, wouldn't it?
hide replies
TRANG
2015-08-01 22:38
> What are we going to do with language varieties? Sentences from one variety
> (e.g. Australian English) may seem wrong to a person who uses a different variety
> (e.g. American English).

We'll have to trust users to not mark sentences as incorrect when the sentence may be correct in another variety of a language.

Obviously, some people will still do it (on purpose or not). But I think we can manage to implement a system that is smart enough to detect when such cases happen.


> Assigning it a low score based on majority opinion

I'm stopping you right here. The score is not going to be based on the "majority" but rather on the trustworthiness of each user. It's still too soon to elaborate any score anyway. I don't expect us to be able to calculate a score efficiently before another 6 months, maybe not even before another year, because we'll need first to elaborate some sort of trust system.

In other words, let's say your trust score in Russian is 100, and you mark a Russian sentence as correct. Then 30 people mark the same sentence as incorrect, but all of them have a trust score of 0. Then your opinion will prevail over the ones of these 30 people.
hide replies
Ooneykcall
2015-08-01 22:48
Fine, but then before the verification system is running, we'll need to have a section somewhere explaining what the verification labels are supposed to mean so that people are more likely to use them the intended way.
---
I see, so you seek to establish a proper trustworthiness formula before proceeding with implanting it, which I can only welcome. Then this is a question for a later time, alright.
hide replies
TRANG
2015-08-01 22:56
> before the verification system is running, we'll need to have a section somewhere
> explaining what the verification labels are supposed to mean so that people are more
> likely to use them the intended way

This is why I need feedback :)

Most people don't read instructions, so I'd like to make the feature as clear enough as possible, so that even without detailed explanation about what each choice means, users will still choose correctly.
brauchinet
2015-08-02 09:40
And what about the quality of the translation? A sentence may be fine, but still not accurate as a translation. Sometimes this is difficult to verify, sometimes very easy.
Well, there's always the option of unlinking such sentences, but if we have a rating system should translation issues be handled in it or separately?
hide replies
TRANG
2015-08-02 12:41
The quality of translations has to be handled separately, and it's not something that is covered yet.
tommy_san
2015-08-01 23:34
"Correct" and "incorrect" sound objective, but I think we'd be able to gather more valuable data by collecting subjective judgments. For example, when an Australian member knows that a particular English expression is used by some, but s/he'd never use it her/himself, we'd miss a precious piece of information if s/he simply voted "correct".

Something like these might be more meaningful.
1. I'd use this myself
2. I'm not sure if I'd use this myself
3. I wouldn't use this myself
4. I'm sure this is wrong
Only the last option would lower the "trustworthiness" of the user who added the sentence.
hide replies
pullnosemans
2015-08-02 09:18 - 2015-08-02 09:19
I like this idea. is this inspired by the problem of japanese orphans which sound inadequate to you, but you don't think they're grammatically wrong, if I recall correctly?

anyway, this could be useful to indicate that a sentence may be used e.g. to illustrate a certain grammatical pattern, but with the caveat that the sentence it its entirety sounds weird or unnatural. then users would know which sentences to use for mere examples of a certain construction they may be newly acquiring, and which sentences to actually remember as things you could say in a conversation.
TRANG
2015-08-02 13:59
To be honest I think that "correct" and "incorrect" are also subjective opinions. What you mention here is another dimension to the problem.
There is the fact that a user accepts a sentence as correct, and the fact that it is part of their "speech style" or not.

1. I can consider a sentence as correct, and it's something I would use.
"Hello everyone, how are you?"

2. I can consider a sentence as correct, but it's not something I would use.
"Thou shall not pass!"

3. I can consider a sentence as incorrect, and yet, it's something I would use.
"lol wtf did you do" (this could go in category 5 as well)

4. I can consider a sentence as incorrect, and it's not something I would use.
"are everyone hello you how"

5. I may not be sure if a sentence is correct or not, but I'd still use it.
"lol wtf did you do"

6. I may not be sure if a sentence is correct or not, but I wouldn't use it.
"Catch as catch can." (#24932)

And you can potentially add 3 more cases for "things I'm not sure I would use".
- correct + not sure I would use
- not sure if correct + not sure I would use
- incorrect + not sure I would use

It would definitely be interesting data, but at this stage I think it would make things too complicated.
hide replies
gillux
2015-08-02 18:26
I find that question quite central and I think it should be dealt *beforehand*. This data will be used as a base for an evaluation system which mechanism is yet to be clearly defined, so we don’t know what actual data we want to gather. By data, I mean to what questions users are answering, precisely, when clicking on these buttons. I think the approach “let’s gather data and we’ll see later what to do” is wrong, we should go with “let’s define a good evaluation system so that we know what data we need to gather”.
hide replies
TRANG
2015-08-02 19:45
> to what questions users are answering, precisely, when clicking on these buttons

The question is whether they consider the sentence to be correct or not.


> I think the approach “let’s gather data and we’ll see later what to do” is wrong,
> we should go with “let’s define a good evaluation system so that we know what
> data we need to gather”.

This whole issue could actually be the topic of a PhD. thesis. And no matter what I would tell you now, I'm pretty certain I will think of something different in the next few years.

So I don't have an exact idea of the evaluation system, in the sense that I won't be able to tell you precisely the algorithm that will calculate the score of a sentence. But one piece of data that we cannot do without, is the opinions of users about the correctness of a sentence.

This is where other people can tell me that I'm wrong, that there's a better system we can elaborate without the need to know who thinks a sentence is correct, and who thinks it's incorrect. But I personally can't see how we can do without it.

Now regardless of how we will use this data in our future evaluation system, this feature answers to an actual need that many users have had for a long time: the need to mark sentences as "checked", or "incorrect" or "to be checked".
To fulfill this need, we told users that they could use tags. They can use the "OK" tag to mark a sentence as correct and they can use a bunch of other tags to mark a sentence as incorrect or unsure. But tagging is not ideal. It's better to have a dedicated system for it, which is what the feature is primarily for.

So we're not just blindly gathering data for a future evaluation system that we haven't specified yet. We're trying to implement an improvement of an existing broken system (i.e. the use of tags to mark a sentence as correct/incorrect).
And it turns out that the data we will collect through this new feature will (in my mind at least) be useful in the future, for an evaluation system.
hide replies
gillux
2015-08-03 10:36
So you actually have a rough idea of scoring sentences by weighting votes based on trust, which requires asking users to vote, which is why you’re implementing a voting mechanism. But voting may not be the only way to tackle the quality problem. It all depends on how you’re going about it.

Ignoring the question of “what is correct and incorrect (and unsure)” will lead to a major confusion and approximation of the answers. The answers will depend on the language level of users and what they are using Tatoeba’s sentences for, what is “good enough” for them. “Correct” may be interpreted as “learners may still learn something from it”, “free of typos”, “grammatically correct”, “is used among people I know”, “is used in context X (academic, conversational, internet…)”, “is used nowadays”… Negate these assertions and you get as much interpretations for “incorrect”. I’m not even speaking of “unsure”.

Each of these assertions are characteristics we all partly use to judge “correctness” in our own subjective way. By mixing these opinions, you’re comparing apples and oranges. To me, when a sentence is believed to be incorrect, what matters is not the number of people who think so or who they are, it’s why. X people saying it’s incorrect (being weighted or not) has no value compared to an individual comment demonstrating what’s wrong.

On the other hand, claiming a sentence is correct is way more difficult to defend. A sentence is generally considered correct as long as nobody has a reason to say it’s not. That’s all. I happen to correct OK-tagged sentences that were otherwise considered correct because nobody spotted any error so far. And maybe someone else will in turn contradict me by having yet another thing to say.

Besides, as others mentioned I fear that dialect minorities on Tatoeba will likely to be threatened by their majority counterparts in such a system, because they are minorities on Tatoeba too. Weighting won’t help distinguishing between an valid minority and an irrelevant one in the score.

Because of all this, I’m thinking about the following way to improve the quality of sentences. It’s just an idea, but what if members could only be able to say either (1) “I was unable to find anything wrong with this sentence (proofread?)” or (2) “It’s wrong to me because [insert explanation]”. The more (1), the more likely the sentence is good, but it doesn’t mean it’s correct. Any (2) is to be solved by sentence modification, deletion, tag addition (like regional, slang…), mutual agreement, corpus maintainer decision… Instead of using bare comments, we could use a minimal issues tracker system to easily keep track of all the (2).
hide replies
Selena777
2015-08-03 17:30
It has sence, if there will be several categories of sentences, like:

1) Standard sentences (the most valuable thing for most learners, I think). They are both grammatically correct according contemporary grammar rules of at least one of language varietes (like British or American English) and sound natural and not clamsy for almost all native speakers of those language variety, actively used by them (in collocuial or/and bookish speach) and understandable for almost all educated natives, regardless their age or location.
A sentence would get into this category, if, for example, 3 high-level speakers consider it like that. Actually, there should be only doubtless sentences there, doubtfull ones should not be included.
2) Sentences with curse words. I believe it should be a special category, cause some people strongly dislike that part of languages, so they could not be exposed by it.
3) Common mistakes. (tagged with "popular language" in the Russian corpus). These are sentences, frequently used by native speakers, but they are not considered as "correct" by current grammatical rules, so they are not recommended for using for school or university exams, in compositions, etc.
4) Slang sentences, which are mainly used only by people of some generation or subculture, and often are not understandable for others.
5) Dialect sentences, that contain dialect words or grammar forms, which are used only in some regions and seem odd or not understandable for people from other regions.
6) Archaic sentence, which contain archaic words, grammar, spelling or sintaxis, that is not correct according contemporary grammar rules, or just sounds archaic for most natives.
7) Non-standart sentences, which were intentionally created to be odd, absurd or peculiar, or grammatically correct sentences, there is no consensus in the corpus about if they are "natural" or "odd", so they are not fit for "standard" category. So, if the author wish to stay a sentence "as is" he or she could just mark it like "non-standart".
8) Rest sentences (bad orphan sentences, bad sentences created by non-native speakers, literal translations, just doudtful sentences which authors are not active now, and all the rest that should be change or deleted sooner or later should get in this category.).

It's disputable, how a sentence can get into one or another category. If someone's interested, I'll write some suggestions later.
TRANG
2015-08-03 21:07
> So you actually have a rough idea of scoring sentences

I don't like to assimilate this to a voting system because this is not really the main point. I mean, definitely, the first thing that comes to mind is that when you mark a sentence correct, it's equivalent to upvoting it, and when you mark it as incorrect, you're downvoting it.

But for me the actual challenge is to give each users the possibility to create a map of their linguistic world.


> Ignoring the question of “what is correct and incorrect (and unsure)” will lead to
> a major confusion and approximation of the answers.

I wouldn't say this is ignoring the question. I'm very well aware that the definition of what is correct will vary from a person to another and that everyone will use different criteria to decide whether or not a sentence is correct. But that's precisely why I don't think we can force users to comply to a predefined notion of what is correct.

Similarly if we were implementing a feature for users to mark a sentence as inappropriate or not, every user would have a different sets of criteria. Just because we don't have a clear definition of what is inappropriate doesn't mean that we can't give each user the option to express their opinion about it.


> what if members could only be able to say either (1) “I was unable to find
> anything wrong with this sentence (proofread?)” or (2) “It’s wrong to me
> because [insert explanation]”.

I think that (1) raises the same kind of problems that asking "is this correct?". Maybe you will see there is a mistake in a certain sentence, and I won't see there is one, because we have a different opinion about that "mistake" (i.e. different opinion about what is correct).

As for (2), I can't imagine forcing people to provide an explanation every time they want to mark a sentence as incorrect. This would put too much burden. People who want to add details about why they think a sentence is wrong will presumably take the initiative to do it via the current comment system.
sacredceltic
2015-08-02 20:38 - 2015-08-03 03:08
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
sacredceltic
2015-08-04 09:06
I love the idea of requesting people to blindly click on vaguely defined buttons beforehand and redefine their meaning subsequently !

We should adopt that method in politics...although it probably wouldn't change much the policies.

Obviously, this has been thought through very carefully...
tommy_san
2015-08-02 23:01
How about sentence like these? They don't contain any errors (I hope), so should we mark them as "correct"?

Girls eat rice. (from Duolingo)
I don't remember to post the letter. #258464
The train was seen to come into the station by me. #326212
By next April will you have studied English for ten years? #264206
At school, we have learned that you don’t have to copy the homework. #3431583

Here are some Japanese sentences that sound less than natural to me. I wouldn't recommend learning them, but most of them are not incorrect if you ask me whether they're correct or not.
https://tatoeba.org/sentences_lists/show/3516

If you ask me whether they're good or bad for learners, I can say I think they're bad.
hide replies
TRANG
2015-08-03 21:21
You can just not mark them at all. Marking sentence is a possibility, not an obligation.
If you don't feel that marking a sentence as "correct", "unsure" or "incorrect" can express what you think about the sentence, then you can simply ignore it.

Another possibility is to mark them as "unsure". Basically if you cannot really decide whether a sentence is correct or incorrect, then unsure is your "in between" answer.

At this point I won't be able to tell you what would be the best thing to do.
Silja
2015-08-03 09:33
I think this a better approach than just having correct/not sure/incorrect.
hide replies
tommy_san
2015-08-04 07:59
I've changed my mind a bit. See my new post: https://tatoeba.org/wall/show_m...message_23784.

You could ask yourself whether you'd use a sentence yourself, but I realized this criterion is not clear enough, and then I've come to think it would be best to let each user determine their criteria.

One problem of the question "Would you use it yourself?" resides in the vagueness of "would". Would I use the sentence "I'm pregnant"? I don't think I'd ever use it in my life, but I would perhaps use it if I were a woman. Would English speakers in the US use British spellings? They probably wouldn't actually use them, but they'd use them if they were born in the UK. It doesn't make much sense to stick to what you'd really say, but then many more ifs (if I were an old man, if I were a girl in an animé, etc.) would drive us crazy. I don't want a lengthy guidelines about questions like these. It would be enough if members ask themselves, "Do I want this sentence in my collection of good sentences?"
hide replies
Ooneykcall
2015-08-04 09:26 - 2015-08-04 09:27
So, basically: 'good for use' (I don't feel there is anything wrong with this sentence, language-wise), 'bad for use' (this sounds uncomfortably wrong to me), 'acceptable' (I have an aversion towards this sentence, but I accept others using it when it fits)? Sounds great to me.
pullnosemans
2015-08-02 09:15
are you sure the @change tag is going to become unnecessary? what about sentences that are completely fine, but have missing punctuation or a typo in them?

which actually leads to a more general question: how should we go about such sentences? should they be marked good, or bad, or not at all? what would happen to a sentence that has been voted down because it has an orthographic mistake, then changed to be perfectly good? would it retain its low score?

which again leads to the question: what happens to sentences generally after they have been changed? will their score be reset to 0?
hide replies
TRANG
2015-08-02 14:12
I don't expect us to calculate any score for the a sentence before another year. So I won't talk about this right now.

The problem you're mentioning is the next problem we'll have to solve, which is, how do we keep the reviews up-to-date?

This is something that is not implemented yet, but the plan is that whenever a sentence is modified, each review that has been associated to the sentence before the date of modification will be flagged as deprecated.

There will be an extra category in the corpus page, which will show all the deprecated reviews. This way, each user will be able to check again the sentence, and re-adjust the correctness that they have set for that sentence.

And on the sentence's page, the reviews that are deprecated will be displayed in grey.
pullnosemans
2015-08-02 09:24 - 2015-08-02 09:30
and by the way, I would still like to see this combined with a request feature. :)

there could be a list of sentences asked to be rated, and a sentence would disappear from the list when it has received a certain number of opinions (maybe 3 or 5).

this could especially bring benefits during the beginning times of the feature, when there are millions of sentences to be evaluated, so the probability of coming across one that has been evaluated is very low. if people can request sentences they want to use to be evaluated, there would be some orientation for the users which sentences to evaluate, because they can just go to a list of sentences where there is a need for evaluation right at the moment.
hide replies
tommy_san
2015-08-04 08:07
+1
sacredceltic
2015-08-02 17:49 - 2015-08-02 17:50
la mise en œuvre de la « sagesse des foules » est un crime contre les langues prescrites, au profit d'un globish infect et insipide qui va en conséquence prendre leur place, grâce à des criminels tels que vous.

C'est un jour noir pour Internet et la diversité linguistique.
hide replies
TRANG
2015-08-02 18:00
Soit il y a un malentendu, soit on n'a pas la même définition de la sagesse des foules.

Cf. ma réponse à Oneykcall.
http://tatoeba.org/eng/wall/sho...#message_23724
hide replies
sacredceltic
2015-08-02 18:22
Oui, je crois que vous ne comprenez pas ce que vous faites...

L'Internet français se souviendra de vous comme un des acteurs majeurs de sa destruction.
sacredceltic
2015-08-02 18:26
https://tatoeba.org/fra/sentences/show/2140548

Si les foules savaient correctement les langues, on n'aurait pas besoin d'éducation.
hide replies
pullnosemans
2015-08-03 11:47 - 2015-08-03 12:08
mais qui est-ce qui sait les langues sinon les gens qui les parlent?

désolé si je suis trôp bête pour le comprendre, mais je n'arrive vraiment pas à voir comment un système d'évaluation collective des phrases fait du mal à la variation linguistique. n'est-ce pas la prescription élitiste qui est nuisible à la variation naturelle des langues?

edit: ou est-ce que tu crains que les locutrices non-natives bouisilleront l'apparance des langues avec leurs jugements incompétents? si ça arrive, je crois pas que la conséquence sera une promotion d'un globish, mais simplement un système d'évaluation non fonctionnel sur tatoeba.
hide replies
sacredceltic
2015-08-03 12:11 - 2015-08-03 12:15
>mais qui est-ce qui sait les langues sinon les gens qui les parlent?

Vous saviez votre langue avant de l'apprendre de vos instituteurs ? Vous êtes sans doute une sorte de petit génie. Pas moi. Moi j'ai appris à l'école, de mes maîtres, qui eux-mêmes avaient appris des leurs. Puis j'ai lu de bons écrivains qui emploient ma langue avec intelligence et précision.
Bien sûr, vous pouvez apprendre une langue dans la rue, mais ne comptez pas trop trouver un emploi de cette manière...

Vous ne comprenez en effet pas le problème.

Prenons un exemple simple (parmi des milliers) :

le mot français « agenda » signifie autre chose que le mot anglais « agenda » (qui signifie « ordre du jour », en français)
Le problème, c'est que plein de jeunes francophones anglophiles, qui ne sont pas très bons en français, commencent progressivement à utiliser le sens anglais parce qu'ils ignorent le sens français du mot ou qu'ils trouvent très « cool » d'avoir l'air d'être des citoyens du monde à la pensée - et à la langue - unique.

Ceci n'est pas une « évolution » mais juste de l'ignorance. Ils ne font pas évoluer une langue, ils en singent une autre, de manière parfaitement imbécile.

Si on commence à voter pour des phrases, les ignorants finissent par déborder les sachants. Puis les ignorants se servent ensuite de leur majorité pour déclasser le sens correct des mots et se targuent d'être, eux, les sachants. C'est ce qui se passe en anglais : le culte de l'ignorance, ou une « alternative » peut ainsi désormais avoir plus de 2 issues, parce que la plupart des gens ne comprennent pas de quoi il s'agit. C'est non seulement risible et crétin, mais ça handicape l'expression et la précision.

Durant toute la période où les sens se chevauchent ainsi, on se retrouve dans une phase d'incertitude linguistique, où on ne sait plus ce qu'un mot signifie vraiment.
Les gens se contentent de baragouiner et comprennent le contraire de ce que leurs interlocuteurs veulent dire.

Vous pouvez appeler ça de l'élitisme, mais pour moi, savoir précisément ce qu'un mot signifie est juste de l'efficacité et une manière de penser plus aiguisée, plus précise.

L'élitisme, ce sont au contraire ces petits bourgeois qui mélangent toutes les langues parce qu'ils ont seuls les moyens de voyager, et les utilisent à tort et à travers et imposent leurs contresens mimétiques à la majorité de la population qui, elle, ne fait pas constamment semblant de parler la langue du monde, mais juste modestement la sienne.
hide replies
pullnosemans
2015-08-03 13:03 - 2015-08-03 13:15
(tout d'abord: est-ce qu'il est adéquate en fraçais d'utiliser le singulier de la 2ème personne en parlant en ligne avec des gens qu'on ne connaît pas? j'ai remarqué que vous avez utilisé la forme pluriel en parlant avec moi, donc si c'était impoli de vous tutoyer dans ma dernière message, je suis désolé.)

je comprends mieux le problème, maintenant. je crois aussi que une situation où une language est dominante sur une autre (ou, dans le cas d'anglais, sur une multitude d'autres) est toujours dangereuse, et qu'il est très important pour un groupe des locuteurs de protéger leur langue. alors, je vous seconde dans votre philosophie de cultiver le français, et de ne pas absorber des mots anglaises sans choix.

cependant, si vous dites que on ne sait pas sa langue sans être instruit par des enseigneurs, est-que vous également dites que toutes les personnes du monde qui vivent dans une situation pauvre, sans accès à l'éducation, ne savent pas leurs langues, autant que les personnes qui ont accès à l'éducation, mais pas dans leurs langues natives? si quelqu'un parle une language indigène d'amérique du sud, mais la language d'instruction à l'ecole est l'espagnol, est-que cette personne sait l'espagnol mieux que sa language maternelle? je crois que dire ça, c'est vraiment aider à la mort des languages et la varieté linguistique.

et si la langue qui influence le frainçais des jeunes, ce serait pas l'anglais, mais l'occitane, est-que vous soyez contre cette influence avec la même intensité?

enfin, ce qu'un mot signifie, c'est comment les gens l'utilisent. je crois que c'est votre droit de toute façon de cultiver votre varieté de votre langue maternelle, mais si vous discréditez l'emploi du français d'une génération entière, c'est peut-être vous qui écarte la varieté linguistique, plus que l'anglais écarte la langue française « pure » ou « plus précise » dans votre perspective. peut-être c'est aussi vous qui comprend pas la précision des innovations dans la langue des jeunes, inspirées de l'anglais, ou de n'importe où. c'est pas un effort de provocation, seulement une impulsion à examiner votre point de vue.
hide replies
sacredceltic
2015-08-03 13:29
>je crois que c'est votre droit de toute façon de cultiver votre varieté de votre langue maternelle, mais si vous discréditez l'emploi du français d'une génération entière

Mais c'est quoi une « génération entière » ?

Vous savez quelle est la proportion de moins de 25 ans dans la population européenne ? Ce n'est même pas le quart de la population !
Mais ici, ils se posent comme s'ils étaient la majorité de la population et décrètent péremptoirement que tout ce qui a plus de 30 ans sur Internet est obsolète.
Et maintenant ils vont construire des systèmes d'évaluation pour rendre encore plus obsolètes le parler de la majorité des locuteurs des langues.

De plus, ce n'est qu'une petite partie des moins de 25 ans qui anglicisent le français. La partie petite bourgeoise qui a les moyens de voyager et d'apprendre en immersion.
Vous pensez que la plupart des autres jeunes français parlent anglais ? Vous vous mettez gravement le doigt dans l'œil si vous croyez ça !
Ils ânonnent peut-être quelques mots sortis de chansons à la mode, mais c'est tout.

>et si la langue qui influence le frainçais des jeunes, ce serait pas l'anglais, mais l'occitane, est-que vous soyez contre cette influence avec la même intensité?

Et bien non ! Parce que ce n'est pas la même chose.
L'anglais envahit tout par mimétisme social par la jeunesse bourgeoise, et ensuite, par son imprécision. L'anglais se contente d'être imprécis. Il l'a toujours été à cause de l'étendue de sa diffusion. Ce n'est pas le cas des autres langues.
Ce n'est pas une influence linguistique normale, comme par exemple l'influence de l'italien, de l'espagnol, de l'hébreu, du berbère, de l'arabe (ou plutôt du maghribi qu'on appelle improprement de l'arabe, en France) ou d'autres langues africaines sur le français.
J'emploie fréquemment, moi-même, des mots venus de ces langues dans mon français (je dis « toubib » au lieu de « médecin », par exemple, dans ma langue de tous les jours) mais je ne déforme pas le sens des mots du français en le faisant.

L'occitan n'est pas un bon exemple, parce que justement, le français est issu de la fusion de la langue d'Oc (ancêtre de l'occitan moderne) et de la langue d'Oïl (ancêtre du franco-picard/du wallon), donc l'occitan a eu une influence majeure sur le français.

Bien sûr que les langues s'influencent. Mais l'anglais n'influence pas, il détruit, il annihile, en applatissant tout : les sens, les prononciations, les règles grammaticales.
C'est un bulldozer, piloté par de jeunes snobs qui méprisent les cultures car ils se prétendent citoyens du monde.

hide replies
lipao
2015-08-03 16:33 - 2015-08-03 16:38
Sacredceltic, ĝis kiom mi komprenas la francan, mi konsentas (kaj kunsentas) kun vi; ankaŭ mi multe malŝatas la anglan lingvon (ne la lingvon mem, sed ĝian malignan kaj malbeligan influon sur aliaj lingvoj). (Kvankam tute same mi malŝatintus ankaŭ la francan, se mi vivus antaŭ ducent jaroj, kaj tute same mi malŝatos iun ajn nacian lingvon, kiu entrudos sin en la pozicion de deviga internacia komunikilo.)

Tamen mi vidas nenion malbonan en la proponitaĵo. Tatoeba estas, laŭ mi, korpuso celanta trafi la (nuntempan) staton de ĉiu lingvo, kaj se en la nuntempa stato de tiu aŭ tiu lingvo abundas esprimoj rekte prenitaj el la angla aŭ grave influitaj de ĝi, tiam vi ja povas lamenti, sed tio estas fakto. Tatoeba, cetere, estas en multaj direktoj konformigita al eksterlingvaj lernantoj — kaj kial *mensogi* al iu lernanto, ke tiu aŭ tiu esprimo ne uziĝas en la lingvo, kaj trudi al li purismaĵojn, se tiu esprimo, eĉ se malbela pruntaĵo el fremda lingvo, normale uziĝas de la plimulto de la parolantoj?

Kiel vi skribis: se oni volas bonan (do laŭ vi, sufiĉe "puran") lingvan stilon (ĉu en minoritata lingvo, ĉu eĉ ekzemple en la angla mem), oni iras en la bibliotekon kaj studas bonajn verkistojn de la koncerna lingvo; tian bonan stilon oni certe ne kaptos en iu lingva kurso nek sur la strato; tie oni kaptos la stilon de la amaso. Kaj la amaso bedaŭrinde ne konsistas el bonaj verkistoj.

Sed ĉu vi tion ŝatas aŭ ne, Tatoeba pli similas lingvan kurson aŭ eĉ la straton mem, ol bibliotekon de bonaj verkistoj.

Kaj fine, vi skribas, ke ĉi tio estas nigra tago por la franca interreto… mi mem ne havas iajn grandajn iluziojn pri la influeco de Tatoeba. Eĉ se ĉi tie la angla venkegus, la plimulton da homojn tie ekstere tio probable tute ne interesos.

Tatoeba estas ilo por montri al interesitoj, kiel oni ĝenerale uzas certan lingvon praktike, kaj ĝi cedas al la ĝenerala uzo.
hide replies
sacredceltic
2015-08-03 20:37
>mi mem ne havas iajn grandajn iluziojn pri la influeco de Tatoeba.

https://www.google.fr/search?hl...amp;gws_rd=ssl
sacredceltic
2015-08-03 20:39
>se oni volas bonan (do laŭ vi, sufiĉe "puran") lingvan stilon (ĉu en minoritata lingvo, ĉu eĉ ekzemple en la angla mem), oni iras en la bibliotekon kaj studas bonajn verkistojn de la koncerna lingv

Les bébés se contentent de dire « babababa »
pullnosemans
2015-08-03 16:44 - 2015-08-03 16:46
en ce qui concerne l'imprécision prétendue de l'anglais, attribuer certaines qualités abstractes à des langues individuelles n'a aucun sens pour moi, mais si vous pensez ça, c'est votre affaire.

mais pour votre souci, encore, je crois que je comprends un peu mieux votre problème. il semble que vous êtes concerné en particulier de la representation du français *sur cette site web*. je peux pas dire quelque chose de ça, parce que je ne sais pas si il y a un group de locutrices françaises qui vraiment traitent les variétés plus conservatifs que la sienne comme obsolète. si c'est le cas, bien sûr elles ne doivent pas faire ça. mais dites-moi, croyez-vous vraiment que ces personnes déprécieront les exemples des autres variétés avec ce nouveau système ? est-ce qu'il y a certaines personnes dont vous savez qu'elles ont une tendence de faire ça ? ou les gens seront-elles assez prudentes pour respecter cettes phrases bien qu'elles ne correspondent pas à leur goût ?
hide replies
sacredceltic
2015-08-03 20:42
Vous semblez ignorer que les moins de 25 ans, bien que représentant moins du quart de la population européenne, représentent 75% des usagers de l'Internet.
Pire, les petits bourgeois anglophiles représentent la vaste majorité des utilisateurs des sites d'apprentissage de langues.

C'est donc bien une minorité de minorité qui va nous imposer nos langues !
hide replies
pullnosemans
2015-08-03 23:14
alors leur variété est la plus pertinante sur l'internet, non? ce n'est pas mauvais ou bon, c'est simplement un fait.

l'internet, ce n'est qu'une de plusieurs domaines. je crois que ce n'est pas à vous de décider qui devrait être la majeure partie dans une certaine domaine de communication. ce n'est à personne, ça se détermine soi-même. si vous essayez dicter la structure des utilisateurs de l'internet, c'est vous qui essaye imposer quelque chose.
Silja
2015-08-03 09:40
When I add a new sentence, I can't review it on the add a new sentence page (https://dev.tatoeba.org/fin/sentences/add). Nothing happens when I try to click on the review icons.

Or am I even supposed to review my own sentences? I can do that, if I open an individual sentence page. Is this a bug?

And if I'm supposed to review my own sentences, wouldn't it be better that the sentences I add would be automatically reviewed as correct? I'd expect that if someone adds a sentence to Tatoeba, they must think it's correct.
hide replies
TRANG
2015-08-03 21:27
You found a bug :)
If you go to the sentence's page, you should be able to mark the sentence.

You can review your own sentences. I personally think as well that it could make sense to mark sentences from advanced contributors and higher as "correct" automatically, when they add a sentence in their native language.

Similarly we could mark some sentences "unsure" automatically, especially sentences that are added by non native speakers.
Impersonator
2015-08-03 12:18
I believe the new system is a major step forward. Please make sure that the new data is downloadable.
hide replies
TRANG
2015-08-03 21:29
Yes, it will be :)
sacredceltic
2015-08-03 21:31
The new propaganda, you mean ?
hide replies
pullnosemans
2015-08-03 23:04 - 2015-08-03 23:17
sssh, it's okay to be emotional, and your point of view isn't all wrong in my opinion, but you're starting to come across like a frustrated 16 year old. this arrogant and destructive tone you sometimes take on doesn't really help take your dystopic visions of a young bourgeois mob taking away the french linguistic hegemony from their parents through a new evaluation feature on tatoeba seriously, at least not me.
hide replies
Ooneykcall
2015-08-03 23:26
It is nigh impossible to shake Celtic's fierce convictions, so I suggest you don't even try.

The evaluation system is hardly going to disallow classic sentences anyway. The area of what's acceptable may grow towards novel additions, but it won't shrink away from classic/standard speech except slowly by natural evolution. People are not massively giving away the heart of their culture under English influence, since most people naturally cherish their identity and will not go for radical changes. Language dissolution isn't going to happen like that.
hide replies
sacredceltic
2015-08-04 05:47 - 2015-08-04 08:28
>Language dissolution isn't going to happen like that.

Ça se produit tous les jours sous nos yeux, ici, sur Tatoeba.
Tatoeba est un des principaux instruments de saccage potentiel des langues sur Internet, car il se place comme référence. Une référence très bien indexée.
Les fautes commises sur Tatoeba sont les mieux indexées de l'Internet avec celles, horriblement nombreuses, de Wikipédia.
La différence, c'est que tout le monde sait que la langue, sur Wikipédia, est catastrophique, car les articles y sont, pour la plupart, rédigés par des analphabètes ou traduits par des robots et que personne ne prend la peine de les corriger (le meilleur moyen de répandre un mensonge est d'écrire un article sur Wikipédia)
Mais Tatoeba, lui, se pose comme référence de la langue contemporaine, à travers les traductions.

Cherchez n'importe quel anglicisme de votre langue sur Google, et vous tombez tout de suite sur Tatoeba. Et ça va être de pire en pire, les traductions erronées s'auto-justifiant les unes les autres pour servir de référence ré-exploitée partout ailleurs, telle une traînée de poudre.

C'est par les recherches sur Google que j'ai pris, très tôt, conscience de l'énorme potentiel de nuisance de Tatoeba. Vous pouvez le vérifier par vous-même.
hide replies
Ooneykcall
2015-08-04 06:23
Anglicisms exist in usage. Borrowings from other languages exist, in general. They are also duly represented on Tatoeba. But I doubt your claim they are overrepresented is much founded.
It's a rather bad idea to trust any Tatoeba sentence blindly, as those who use Tatoeba material in bulk on other sites where all additional information, such as the poster's username and sentence tags, is removed, I agree with that freely and wouldn't advise anyone to do so. However, if you limit your trust to long-time contributors who have proved to be reliable, the problem vanishes. I haven't observed core Russian corpus contributors to be indulging in using foreignisms any more than the general modern populace does. (By "modern" I mean people probably up to forty. Of course old-timers speak somewhat differently, but isn't that always the case? I'd be more worried if a 70-year-old and a 20-year-old spoke the same way, since that would seem like stagnation.)
I reiterate that it's surely a poor idea to think that a sentence is good because it can be found on Tatoeba. However, we make no pretence of asserting it, do we? Hence, we needn't hold ourselves responsible for others' folly.
When I check a turn of phrase on Google, I look through multiple unrelated instances to see how it is used. Isn't that done normally?
hide replies
sacredceltic
2015-08-04 08:09
>However, we make no pretence of asserting it, do we?

and we make no pretence of the contrary and the Tatoeba Corpus gets poured all over as is, with its fake native contributors, its true contributors who are convinced they master 8 languages when they master none, including their own native, along with the thousands of learners who have created very poor and unnatural sentences that nobody has the time to review.

Just ask Tommy_san how he would rate the average quality of the Japanese corpus on Tatoeba...

No system of trust will ever work, because the majority of people here are neither representative of the languages they handle, nor trustworthy.
hide replies
Ooneykcall
2015-08-04 09:15
Okay, it's a more difficult situation than I would have liked to remember, you're right on that.
Some sort of scoring system could still be beneficial, I think, but it needs to be figured out well before implementation, otherwise poor use would pull its efficiency down, like we still have lots of poor English and Japanese sentences from the old days of Tanaka corpus, that no-one bothers to correct because it's a huge hassle.
Impersonator
2015-08-04 08:47
Oh, looks like you’ve adopted the tactics of the Russian government. When you don’t like something, you call it 'propaganda'. :)
tommy_san
2015-08-04 07:43
I thought about it again and partly changed my mind.

1. I support the idea of three-grade evaluation.

The system should be simple and easy to use. Otherwise people would soon get tired and stop using it.

2. Each member should determine their evaluation criteria themselves.

As Trang says, there are lots of criteria to evaluate sentences, so it's neither realistic nor desirable for us to name them concretely beforehand. It would be better to let each user determine their policies.

3. We should opt more general terms than "correct"/"incorrect".

Each member has different criteria to judge the quality of sentences. Correctness is just one aspect of them. There are so many sentences that aren't incorrect in any sense but still don't make good example sentences. At least I'm sure quite a few members here think so. Quality control on Tatoeba has never been only about correct or incorrect. I know no member who adopts a sentence or adds the OK tag just because it's correct.

So we shouldn't limit their choices of criteria by the labels "correct" and "incorrect". If, for example, some members want to collect sentences that are not only correct but also stylistically elegant, they shouldn't be discouraged from doing so. Of course, if some members decide to gather all the correct sentences, they're also encouraged to do so, but that's only one of the possible options.

I'd suggest "+1", "0" and "-1" for the moment. (It could also be something like "good" and "bad". The point is that they shouldn't sound too specific.)
"+1" is for sentences you think are good – sentences you'd like to include if you were to compile a dictionary of some sort on your own.
"-1" is for sentences you think should be changed or deleted – sentences that, in your opinion, no one would or should use. You could, but don't have to, suggest a better sentence and/or explain why you think it's bad.
"0" is for the rest – sentences that are not so bad that you can be sure they need change or deletion, but which, for some reason (even if you can't explain it) you don't feel like including in your collection of good sentences.

4. Each member should be able to give names to their collections if they want.

For example, "Spoken Australian English" or "Any Japanese Sentence Without Obvious Errors".

5. In the future, it would be nice if there are ways to provide more detailed evaluations.

Some users must want to show more than just three-grade evaluations.
One possible solution would be to allow them to create more than one collection per grade.
Another solution would be personal tags. Current tags are sometimes unsuitable for expressing personal opinions because they look like objective facts.
(Another thing you could do is to make multiple accounts for different purposes, as some members already do now, but this is clearly not very convenient.)
The combination of these might enable us to provide our opinions about each sentence more freely without worrying about what other people might think of it.

6. I'm against the idea of "trust scores".

To be honest, I have no idea how it would be possible to score someone's trustworthiness without doing an injustice to the minority, and I'd hate to be in a site where it's openly displayed that I'm distrusted by this or that member, or that I'm less trustworthy than this or that member. However, we could discuss this later.
hide replies
sacredceltic
2015-08-04 08:03
>I know no member who adopts a sentence or adds the OK tag just because it's correct.

I do so all the time...
hide replies
tommy_san
2015-08-04 08:06
Really? Even if it sounds awkward?
hide replies
sacredceltic
2015-08-04 08:17
I didn't say that. Of course not. I adopt only sentences that I would say.
TRANG
2015-08-04 23:26
Thank you for your feedback.


Regarding (3), I think the alternative you are suggesting is interesting.

What I'm wondering is how much of a difference it actually makes to use "good" or "+1" instead of "correct", and if that difference matters a lot or not.

I can't think of situations where I would mark a sentence as "good", and not mark it as "correct".
There are however situations where I would mark "+1" sentence that I wouldn't mark as "good" or "correct" (I would more likely mark them as "not sure"). That would be for sentences that I personally find linguistically interesting, but that other people may just consider as confusing and useless for learners.

I can imagine myself marking sentence "-1" while not marking them as "bad" or "incorrect": sentences that I feel are inappropriate (for example an insult towards another member that would have been added just for provocation).
I can imagine myself marking a sentence as "bad" while not marking it as "incorrect": sentences that look grammatically correct but for which I cannot find any meaning except a really twisted one.

So my impression is that using +1/-1 will result in a broader way to mark sentences, while using good/bad will be a bit more precise, and correct/incorrect even more precise.
I think I have a preference for "good/bad" but I'm honestly fine with any of these solutions.

But then my question is, how should we reword the tooltips and titles?

1. For the tooltips
- Mark this sentence as "good" / "neither good or bad" / "bad" (?)
- Mark this sentence as "+1" / "0" / "-1" (?)
- Add this sentence to your collection and mark it as "good" / "neither good or bad" / "bad" (?)

2. For the titles on the user's collection page.
- Sentences considered good / neither good or bad / bad by {username} (?)
- Sentences marked as "+1" / "0" / "-1" by {username} (?)


Regarding (4), you already have the possibility to name your collections: by making lists. Now the lists feature is very tedious to use, I'll give you that. But the feature of "naming your collections" already exists. We just need to make it more user friendly.


Regarding the use of "corpus" that you mentioned in another post, I think your point makes sense. We actually use the word "collection" on the homepage for users who are not registered and logged in, for the short description of what Tatoeba is. It would be coherent to use it here as well, unless someone has a better word.
hide replies
umano
2015-08-05 02:51
I also prefer "good/bad" or "+1/-1" because "correct" and "incorrect" seems like buttons for grammarians' (and even they contradict each other), while the other options seem closer to any native speaker of a language. So:

Icon: (✓) or (+1)
Tooltip: Approve this sentence
Filter name: Approved

Icon: (✗) or (-1)
Tooltip: Disapprove this sentence
Filter name: Disapproved

Icon: (?)
Tooltip: Say you are unsure about this sentence
Filter name: Dubious

And in a sentence page you could read:

456 users approved it
321 users disapproved it
123 users are unsure about it

My 2 rupees.
tommy_san
2015-08-05 10:38
Let me ask you two questions.

1. Would it be OK to vote based on a rather special criterion – for example to vote only for good sentences in the Kyoto dialect, or good sentences with Internet slangs?

2. Would it be OK to vote against a sentence just because we personally don't like it, or do we require an objectively convincing reason like a grammatical error? In other words, would we be criticized or considered untrustworthy if we vote against sentence that some others find good? (I'm thinking about awkward Japanese sentences that sound like literal translations from a European language, which I think are misleading for learners of Japanese, but are actually favored by some people, especially intellectuals.)

The phrases "vote for/against" might be useful, by the way.
hide replies
TRANG
2015-08-05 15:08
1. Yes, it would be okay. If you simply want to limit your collection to a very specific criteria, I don't see how it can impact negatively the whole system.

2. I think to a certain extent it would be okay. If you are able to provide a relevant explanation about why you're "voting against" a certain sentence, then it will be more valuable. But I think it is understandable that you don't want to analyse and justify every yourself every single time you mark a sentence. So you shouldn't necessarily be considered untrustworthy if you vote against sentences without bringing a justification. It will already be interesting enough to be able to spot more easily sentences where opinions diverge.

I personally prefer to avoid the use of "vote" because it gives the impression that we have to make a choice to keep a sentence or not, and that's the opposite of what I'm looking for. My hope is that we can move towards a system where we can actually allow more diversity.
hide replies
tommy_san
2015-08-05 15:34
> 1. Yes, it would be okay. If you simply want to limit your collection to a very specific criteria, I don't see how it can impact negatively the whole system.

Then we should definitely avoid the word "unsure". When these kinds of members mark a sentence "0" (or whatever we'll call it), they're most of the time sure that it doesn't meet their own criteria, and they're also sure that it's not too bad, so they're not unsure about anything.
hide replies
TRANG
2015-08-07 09:42
One thing I've realised recently is that the question that I want users to answer is "Is there anything you would change in this sentence?"
So "unsure" would actually make sense here.

(✓) Good = There's nothing you would change. You think the sentence is perfectly fine.
(?) Unsure = You're unsure because you don't consider that you have enough knowledge to judge, or because it's just a difficult case to decide on.
(✗) Not good = There's something wrong with the sentence and you would change it or delete it.

I think this is what most people will intuitively do when presented to option to choose between (✓) (?) and (✗), and I think this is the closest I can come up with, to reflect the type of information that I'd like to gather and that I think a lot of users would like to have.

The option to mark sentences (✓) and (✗) is basically a way to have a more coherent way to mark sentences as OK or not OK than what we have now. It should also help users evaluate more quickly and more accurately whether or not they want to trust a sentence.
Some users will judge that a sentence can be trusted only based on the number of (✓) vs the number of (✗). If you're not too picky about quality, that's probably what you will do instinctively.
Some users will be more careful and look at who exactly marked the sentence as (✓) and who marked it as (✗). They will only trust the sentence if it was marked (✓) by a user that they personally trust.

The option to mark sentences (?) can be used in 2 different ways.
1) You can't decide whether you would change the sentence or not. You feel there may be something wrong, but maybe the sentence is still acceptable enough that you would be fine to keep it as is. The sentence is however interesting enough to you that you feel compelled to mark it.
2) You don't consider you are in position to decide if there's something to change in the sentence. This would be the case for instance if you're adding a sentence in a language that you are not native of.

Obviously this is not perfect but I don't think it's possible to anticipate all the problems that this new system will bring in. It's even less possible to design solutions around problems that we only know in theory.
I prefer to go with the assumption that In the majority of the cases, users will mark sentences in a way that is actually relevant and useful for others. We will have to see how it goes for the more edgy cases.
hide replies
tommy_san
2015-08-07 10:46
When speakers of American English mark simple British English sentences (e.g. "What colour is it?") as "?", what are they unsure about?
hide replies
TRANG
2015-08-07 12:19
That's something we'll have to ask American English speakers who mark British sentences as unsure.

If I was an American English speakers, I would probably not mark these sentences at all. But if I decided to mark them,
- I would mark them "good" if accept the variance in spelling.
- I would mark them "not good" if I don't accept it.
- And I would mark them unsure if I consider that there's something questionnable, but I haven't gathered enough justifications to decide whether it's bad, or if it's still acceptable.
hide replies
tommy_san
2015-08-07 13:57
> If I was an American English speakers, I would probably not mark these sentences at all.

Then I think there should be a fourth option for that. Otherwise you'd end up seeing again and again the sentences that you've decided not to mark.

I was thinking of including them all in the collection "0".
hide replies
TRANG
2015-08-07 19:58
It's a bit too early to tell if we need to introduce a fourth option.

I think it would be find if you add them in your "0" collection.

My needs and strategy of marking sentences are going to be different from yours. Just because I would personally not mark a sentence doesn't mean that it doesn't make sense to mark it, or that it shouldn't be marked.
hide replies
sacredceltic
2015-08-07 21:24
so basically, you admit elaborating a scheme to assess the quality of sentences although you have no clue how to do it.
You're just irresponsible because your decisions on the matter will have grave consequences on the indexing, publishing and exposure of these sentences.
Basically, you actually reckon, that you have no clue what you're doing !
hide replies
TRANG
2015-08-07 22:12
Didn't you know already, I have no idea what I'm doing ever since I started Tatoeba.

Things still worked out.
hide replies
sacredceltic
2015-08-07 22:32
Or so you claimed...
tommy_san
2015-08-06 00:44
> 1. Yes, it would be okay. If you simply want to limit your collection to a very specific criteria, I don't see how it can impact negatively the whole system.

And this also means that you don't necessarily need to mark a sentence as positive even if you think it's correct or good. This is one of the reasons why I prefer less specific labels. It may also explain why I think members should be able to name their collections (it makes no sense to have to mark a sentence AND put it into a list).
hide replies
TRANG
2015-08-07 09:37
Okay I misunderstood you. When you mentioned naming collections, I thought you had in mind that each user could create several collections, and name each of these collections.

I think that it will be rather rare to have users who are collecting such specific sentences that they can give a name to their collection. Most of the time, people will be collecting all kinds of sentences. In the rare case where someone is collecting very specific sentences, and they want to explain it to the rest of the community, I suppose they would/could mention it in their profile.
hide replies
tommy_san
2015-08-07 10:45
> Okay I misunderstood you.

No, you didn't. ☺

For example, I have two "OK" lists and want to keep distinguishing between them.
https://tatoeba.org/sentences_lists/show/3185
https://tatoeba.org/sentences_lists/show/3514

Here are some lists containing sentences CK would mark as "0".
https://tatoeba.org/sentences_lists/show/1394
https://tatoeba.org/sentences_lists/show/914

If we had the option to associate a sentences list to one of the three evaluations, then we could just keep using our lists as before and use the new feature at the same time.
hide replies
TRANG
2015-08-07 12:22
I see, so what you actually want is to say that certain lists will necessarily contain only good sentences, and if you add a sentence to that list, it should be automatically marked as good so that you don't have to add to the list AND mark it as good.
hide replies
tommy_san
2015-08-07 14:01
Yes. Thanks for a nice summary.

There are also some lists for the other two evaluations (not just "good").
https://tatoeba.org/sentences_lists/show/3515
https://tatoeba.org/sentences_lists/show/3516
https://tatoeba.org/sentences_lists/show/3517
sacredceltic
2015-08-06 21:01
So let me summarise this great "languages evaluation scheme" :

the buttons will feature : Good/Bad/Dunno and they will subsequently be interpreted as either :

1) I would approve of it (although I'm not a native)/ I disapprove of it (although I'm illiterate) / I have no clue about this language but I want to give my opinion anyway because I can pretend to understand the question and to be a citizen of the world.

2) I hate these educated academic people with a degree that I don't have so I'll click on Bad (= 95% of people who do not have an academical degree) to piss them off / I want to show how young and cool and international I am because I can travel extensively as I have loads of money from my rich parents and everything should sound English and the rest is just obsolete (I don't know the meaning of that word, though...it's just too old...) dung./ I have no idea what this is about and there is no interface in my language but I will pretend there is and click randomly...
hide replies
umano
2015-08-07 15:24
sacredceltic, I think the use of (✓) (?) and (✗) buttons should be limited to natives. If your native language is French you can't mark Japanese sentences. So, your first point should not be a problem.

As for point 2 in your comment, I don't think this website is used by illiterate people, so that would not be a big problem either. Actually, having these buttons will help people with good command of their native language to disapprove sentences from the kind of people you describe.
tommy_san
2015-08-04 08:05
By the way, I have doubts about the way we use the word "corpus" on this site.
Can the "Tatoeba Corpus" really be called a corpus? Linguists usually don't write sentences by themselves when they make corpora, don't they? Another drawback is that it sounds too technical.
I'm also not sure if it's a good name for the new feature. I used the word "collection" in the post above. Someone else might come up with a better term.
hide replies
sacredceltic
2015-08-04 08:34
je suis d'accord. J'essaierai de ne plus le faire. Ce sont les administrateurs qui ont imposé ce vocabulaire prétentieux et il est vrai qu'il est inadapté.
Impersonator
2015-08-04 08:43
> Can the "Tatoeba Corpus" really be called a corpus?

I believe it can, in a wide meaning of the word.

> Linguists usually don't write sentences by themselves
> when they make corpora, don't they?

They usually don’t, because it would make the corpus biased. The linguist already has some assumptions, and if she creates the corpus herself, the corpus will reflect her assumptions and will be useless for testing her hypotheses.

But we’re not acting as linguists who tests some hypotheses against the corpus here. We’re adding sentences that we consider natural without knowing the hypothesis that will be tested in advance.

Of course, our corpus will be biased to some extent. The accent on full sentences, and absence of context, results in a bias towards some sentences. So we can use Tatoeba only to test some of the hypotheses, but not the others. So, before using Tatoeba to prove or disprove some ideas, we must check if Tatoeba might be biased.
saeb
2015-08-04 11:25
Just a small note. If we wanted to use a bayesian model on this data it would be impossible to interpret it without knowing what the answer is explicitly about. Otherwise you wouldn't know what bias is being represented. So there's two flaws with the current design: are the questions understood in the intended way by the users answering and are the questions explicit enough to be interpreted meaningfully. Whatever data we collect would be virtually useless if we don't answer those two questions well first. So we might want to capture community bias on the prescriptivist/descriptivist spectrum or spoken/written spectrum or in-group/out-group spectrum, each would need their own set of very explicit questions.
pullnosemans
2015-08-04 14:08
haha, wow. just look at this thread! didn't know tatoeba could be this exciting.
hide replies
sacredceltic
2015-08-04 19:05
C'est là l'excitation des néophytes...On se calme rapidement, vu le niveau !
sacredceltic
2015-08-04 19:25
les étiquettes "OK" seront-elles transcrites dans le système - obscur - de nov-notation ?

Auquel cas ceux qui, tels CK, ont le rare privilège de s'auto-approuver (alors que ses phrases comportent autant de fautes que celles des autres...), bénéficieront immédiatement de la note maximale selon le nov-système...

Si oui, cela concernera-t-il les vrais-faux natifs ? Les faux-vrais ? Les faux-faux ?

Selon quels critères exactement ?
sysko
2015-08-06 11:46
(i'm on my mobile phone, so i haven't been through all thread)

wouldn't it be more accurate to have

* i would say it
* i would not say it

because if incorrect , you got to propose a correction (and maybe permit people to +1 a correction)
and especially for language like french or english that are widely spoken accross the world, regional difference may be interpreted as 'uncorrect' , with a result of regional specifities or archaic form to be wipe out and only the minimal common block to be kept

also it would be better to keep 'ok' (correct) as something only done by a minority of approved people (yeah how they get approved is where the troll begin) both native and with a approved mastery on the language.
so that for example a construction like

apres que je sois arrive chez moi ... would not have ok but a lot of 'i would say it' because it's a mistake a lot of people do ,especially native french, while the correct version that does not sound natural for a lot of french people woud have 'ok' but a lot of ' i would not say it'

(that's just an example, so no need to comment on it, that people more and more consider it correct etc. as its not the subject)

especialy this kind of voting system soon will be used by people to downvote vulgar/religious/offensive statement or just as a mean of retaliation.
hide replies
TRANG
2015-08-07 09:49
Welcome back :)

Tommy_san mentioned something like this (I would use it vs I wouldn't use it) in his message here:
http://tatoeba.org/eng/wall/sho...#message_23729
And my response:
https://tatoeba.org/eng/wall/sh...#message_23739
tommy_san
2015-08-07 10:52
See also this one.
https://tatoeba.org/eng/wall/sh...#message_23785

Would you say "Je suis heureuse"?
hide replies
sysko
2015-08-07 12:08
(damn my message didn't get posted the first time)

I **would** say it , admitting I’m a woman or given the right context, hence the importance of the would.

I think this way we can use the "knowledge of the mass" to reflect the only things it is able to reliably measure

=> how widely a given sentence is accepted as natural (as said before, in contrast to "correct", as we often do "natural" mistakes) for a given population
i.e with anonymised stats we could have like
* 65% of british english speaking user(1) would say it
* 10% of american english speaking user(1) would say it
* 0.5% of austrilian english speking user(1) would say it

(1) => user who have voted, note that i use user and not "people" , as we'll be always limited to people a) having a connection to the internet b) using tatoeba c) voting


And this way we could keep the great and holy "correct" to people with the authority for it (I would for one recognized as such, clearly identified people not hiding behind a nickname with easily checkable credential, writer/native teacher of said language/academic people etc. , i know it would a very scarce resource but eh here we're talking about giving a an absolute approval "it is correct, period")
hide replies
tommy_san
2015-08-07 13:54 - 2015-08-07 13:56
OK, so almost 100% of male French speakers would say "Je suis heureuse" (if they were female) and almost 100% of female French speakers would say "Je suis heureux" (if they were male). That'd be great to know, but aren't you interested in the percentage of male English speakers who'd actually say "It's a lovely day" themselves?

> * 65% of british english speaking user(1) would say it
> * 10% of american english speaking user(1) would say it
> * 0.5% of austrilian english speking user(1) would say it

But American English speakers *would* probably say it if they were British, right? And you'd possibly say anything if you were uneducated or non-native. So if we use this criterion, we'll really need a clear rule about this "would".
hide replies
sysko
2015-08-07 21:25 - 2015-08-07 21:27
Interesting remark. I think here come the question of drawing the line to have an easy definition vs. a complex one that brings more precise data.

I would say when making the decision one can't impersonate himself as being of another linguistic group, and we consider other factor (sex/age/mood/location) as being "change-able" and the female/male young/old formal/informal distinction to be expressed by the mean of the "tags" feature.

In order to improve the chance of people understanding the feature, the very first time one click on it , we could pop a short text "by clicking this, you mean that you would say it regardless of your age/sex/mood" (or something phrased better of course)

After I think Saeb made an insightful comment regarding how to treat the unavoidable case of people using it without understanding it.