menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
TRANG TRANG January 23, 2015 January 23, 2015 at 5:03:06 PM UTC link Permalink

** Language skill in profile **

In a distant future there are some changes that I would like to implement in order to improve the quality of the sentences in Tatoeba's corpus.
While thinking about it, I was wondering how to efficiently define someone's language skill, because one thing is sure: we will need to add a feature for members to indicate what languages they speak and how well they speak it. But what kind of information will we need exactly?

We can go for something like "beginner", "intermediate", "advanced", "fluent", "native". Or we could go with a scale from 0 to 10, where 0 is when you know nothing and 10 is when you are an expert in the language.
But these designations/scores may be kind of vague, and everyone has a different way to evaluate their level.

Then there's the question writing versus oral skills.
For instance in my case, I cannot speak, write or read Vietnamese, but I can understand it when spoken. Not everything, but if it's a casual conversation then I can understand. How would I define my level of Vietnamese on only one scale?

We could perhaps follow the CEFR:
http://en.wikipedia.org/wiki/Co..._for_Languages
Or we could use something else. I'm not sure if there are other frameworks out there that attempt to define the language skill of a person.

In any case, before we implement a feature that allows members to indicate the languages they know, we need to discuss and decide of a standard way to evaluate the language skill of a contributor. If you have anything to say on this issue, I'm listening.

{{vm.hiddenReplies[21583] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pfirsichbaeumchen Pfirsichbaeumchen January 23, 2015, edited January 24, 2015 January 23, 2015 at 5:29:23 PM UTC, edited January 24, 2015 at 1:18:06 PM UTC link Permalink

I'd try to keep it simple and go for something like "beginner", "intermediate", "advanced", "native level". In your case, "(mostly) passive knowledge" may be imprecise but would probably be enough, as details could be given individually in the profile description.

I think it most important that members indicate their native language in their profiles and that non-native contributions are automatically marked as needing verification.

Edit: I'd like to add that I hope a way for users to indicate their native (or strongest) language in their profiles will be implemented in a future not too far away.

odexed odexed January 23, 2015 January 23, 2015 at 5:45:28 PM UTC link Permalink

Maybe we could recommend some on-line test for those who can't precisely estimate their level, for example,
http://www.testden.com/challeng...FeHVcgodZTsA3A

I believe there are similar links for other languages, too.

sabretou sabretou January 23, 2015 January 23, 2015 at 7:12:01 PM UTC link Permalink

I'm partial towards the ILR scale, compared to CEFR: https://en.wikipedia.org/wiki/ILR_scale

ILR is simpler and has less levels (more levels = more vagueness and incorrect assessment).

Perhaps we could make our own system. These scales are generally made with dedicated speakers of languages. On Tatoeba, however, many of us are language nerds and therefore, polyglots to varying degrees.

Going off your Vietnamese example, we could use a system were a language is listed on a user's profile, along with a table of icons representing writing, speaking, reading and listening, which are then colour coded to represent level: grey for no ability, red for poor ability, orange for medium, green for good. Native languages are represented with a special gilded row, a star or a similar mark.

So mine would look like:

Marathi (gold) : [L: Green] [R: Yellow] [S: Green] [W: Yellow]
English: [L: Green] [R: Green] [S: Green] [W: Green]
Hindi: [L: Green] [R: Yellow] [S: Green] [W: Yellow]
Urdu: [L: Yellow] [R: Red] [S: Yellow] [W: Red]
Japanese: [L: Red] [R: Red] [S: Red] [W: greyed]

If it's too complicated, we could leave this as a 'fine-tune' option and allow readers to pick based on a simpler colour coded system based on ILR: 0: Grey, 1: Red, 2: Orange, 3: Yellow, 4: Green, 5: Gold (picking one of these automatically fills the rest of the LRSW icons with that colour).

gillux gillux January 23, 2015, edited January 23, 2015 January 23, 2015 at 10:03:10 PM UTC, edited January 23, 2015 at 10:05:13 PM UTC link Permalink

> we will need to add a feature for members to indicate what languages they speak and how well they speak it. But what kind of information will we need exactly?

To answer this question, we need to ask ourselves what do we need this information for. If we need to know how well someone masters a language so that his or her contributions should be classified as trusted or not (for instance automatically tagged as NNC), two levels are required: native or not.

If we consider language skill as a way to tell users in what languages one is able to communicate with them, then I’d add a third level: able to read messages from other users.

Of course users are free to add more details about their knowledge in their profile description, like your oral understanding of Vietnamese, but I see no need to store more details in specific form areas if they are not gonna be used for your initial objective (improving the quality of the sentences).

{{vm.hiddenReplies[21588] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pfirsichbaeumchen Pfirsichbaeumchen January 24, 2015 January 24, 2015 at 7:38:26 AM UTC link Permalink

+1

mraz mraz January 24, 2015 January 24, 2015 at 9:52:39 AM UTC link Permalink

+1

Silja Silja January 24, 2015 January 24, 2015 at 11:55:45 AM UTC link Permalink

Gillux has valid point here: what is the aim of indicating the language skills?

If the sole purpose is to identify a native speaker from a non-native, then we need only to ask the users to specify their strongest language and nothing more. We should ask for the strongest language, since the native language is not necessarily always the strongest one. This would be also the most useful information for the project. The simplest implementation of this would be a required field where the user has to indicate their strongest language. This would be a drop-down list with all the languages available in Tatoeba and "other" + a text field to type in the language name for those, whose strongest language is not yet available here.

If we want also to have general information about the overall language skills of the users, then we should have more sophisticated system, where the user can indicate all the language skills they have. As it has been commented already, users can indicate their general language skills more specifically in their profiles description.

However, providing a structured way to indicate the language skills would be more user-friendly way to do this. Many users (including me) have tried to list all the languages they know in their profiles and also to define their proficiency levels of those languages. If we go for the structured way to represent the language skills, it wouldn't be of course required information. The ones who want to use it could do so, but you can always just leave it empty or alter or remove the information when ever you want.

The language proficiency system could be something like what sabretou described: the users can evaluate their speaking, listening, writing, and reading abilities all individually. I also like sabretou's idea of colour coding. If we go for the overall language skills, I think we shouldn't create a system of our own, but rather use one of the existing systems. CEFR is OK for this.

TRANG TRANG January 24, 2015 January 24, 2015 at 1:48:00 PM UTC link Permalink

> I see no need to store more details in specific form areas if they are not gonna be used
> for your initial objective (improving the quality of the sentences).

Yes you are right. I realized that I wrote my message to hint towards improving the quality of the sentences, but I didn't want to limit the reflection only to that initial objective.

You mentioned for instance that the language skill could be used to tell what languages others can use to be able to communicate with a member. It's something I didn't have in mind when I wrote the post, but it's actually very useful to know this. I've often found myself browsing a user's comment to try and figure out if they would understand me if I wrote in English, or if I would need to use another language, in which case which language.

Maybe others will think of more things that could be useful, but aren't necessary for improving the quality of the sentences.

sacredceltic sacredceltic January 24, 2015 January 24, 2015 at 1:04:58 PM UTC link Permalink

>we will need to add a feature for members to indicate what languages they speak and how well they speak it.

Avez-vous prévu un détecteur de mensonges ? Parce qu'il y a quelques menteurs ici, même s'ils ont parfois réussi à faire accroire qu'ils étaient natifs et à se faire estampiller comme tels par CK...

Et avez-vous prévu la détection de mythomanes ?
La plupart des gens surestiment leurs niveaux de langues.

{{vm.hiddenReplies[21592] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG January 24, 2015 January 24, 2015 at 2:06:08 PM UTC link Permalink

C'est un problème auquel j'ai réfléchit, oui. Mais je n'ai pas de solution 100% fiables à proposer.

Je ne pense pas que beaucoup de gens vont intentionnellement mentir sur leur niveau, mais je pense que beaucoup de gens vont soit surestimer, soit sous-estimer leur niveau, sans avoir de mauvaises intentions.

Qu'est-ce que vous suggéreriez pour répondre à ce problème ? Comment est-ce que vous détectez que quelqu'un est bel et bien natif dans une certaine langue ?

{{vm.hiddenReplies[21594] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 24, 2015 January 24, 2015 at 3:54:47 PM UTC link Permalink

>Comment est-ce que vous détectez que quelqu'un est bel et bien natif dans une certaine langue ?

Je ne le fais pas, tout simplement, et je prétends que ceux qui s'octroient ce pouvoir nous abusent.
« Natif » ne veut rien dire, de toutes les manières. Il y a des natifs qui ne maîtrisent pas leur langue natale. C'est même la majorité de la population, c'est pour ça qu'on a inventé l'éducation, qui obtient plus ou moins de succès en la matière.
Malheureusement, même ceux qui n'ont reçu qu'une très faible ou très mauvaise éducation se piquent d'écrire des phrases sur Internet.

Demain, dans un futur très proche en fait, c'est Google qui décidera si une phrase est correcte ou non. Et pour ce faire, Google se basera sur des Corpus tels que Tatoeba.
Les erreurs qu'ils contiennent deviendront la nouvelle doxa.