menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
cueyayotl cueyayotl March 7, 2016 March 7, 2016 at 1:59:29 AM UTC link Permalink

I wonder if we couldn't add Valencian as a separate language, distinct from Catalan, due to political reasons? Valencian speakers are usually very insistent that their language is distinct from Catalan, and I wonder how many potential members we may have lost due to our current classification scheme, an issue brought to my attention by our new Valencian user (isnamar).

The main issue is that they share the same ISO 639-3 code: CAT.

{{vm.hiddenReplies[25696] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK March 7, 2016, edited October 30, 2019 March 7, 2016 at 2:33:48 AM UTC, edited October 30, 2019 at 10:47:30 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[25697] ? 'expand_more' : 'expand_less'}} hide replies show replies
cueyayotl cueyayotl March 8, 2016 March 8, 2016 at 12:42:33 AM UTC link Permalink

They are probably as far apart as Galician and Portuguese (which HAVE two different ISO 639-3 codes) and probably more dissimilar, than say, varieties of English.

You are right, though. We would get plenty of duplicates and near duplicates (they use different spelling conventions). But, in the present state, we would definitely be repelling potential Valencian-speaking users who are not comfortable with the "Catalan" classification of their language. We would be throwing out a big chunk of our corpus... a shame, as Valencian is quickly being spoken less and less.

I wonder if there is something more clever that we could do... maybe have both languages (and flags)... but catalog all sentences under the code "CAT" as usual. There MUST be something...

{{vm.hiddenReplies[25703] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 March 8, 2016 March 8, 2016 at 1:12:15 AM UTC link Permalink

Maybe if we had a Valencian speaker that contribute with Valencian sentences, we could do something, I guess.

More info: https://en.wikipedia.org/wiki/Valencian

{{vm.hiddenReplies[25704] ? 'expand_more' : 'expand_less'}} hide replies show replies
cueyayotl cueyayotl March 8, 2016 March 8, 2016 at 1:16:02 AM UTC link Permalink

We do. New user: isnamar

He was the one who brought the issue to my attention.

{{vm.hiddenReplies[25705] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 March 8, 2016 March 8, 2016 at 2:29:44 AM UTC link Permalink

I created a list - https://tatoeba.org/eng/sentences_lists/show/5812

@isnamar @cueyayotl

Ricardo14 Ricardo14 March 8, 2016 March 8, 2016 at 2:34:43 AM UTC link Permalink

==► https://github.com/Tatoeba/tatoeba2/issues/1048

CK CK March 8, 2016, edited October 30, 2019 March 8, 2016 at 2:40:09 AM UTC, edited October 30, 2019 at 10:47:16 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[25710] ? 'expand_more' : 'expand_less'}} hide replies show replies
cueyayotl cueyayotl March 8, 2016, edited March 8, 2016 March 8, 2016 at 6:09:45 AM UTC, edited March 8, 2016 at 6:10:47 AM UTC link Permalink

@Ricardo14, I think CK may be on to something. We certainly shouldn't add new languages without an ISO 639-3 code, but the split "Catalan/Valencian" with split flag to suit COULD work. It's better than nothing, and these changes are revertible if we for some reason discover a problem with the change. Let's run it past Trang and see what she thinks.

@CK, I think most of us are used to seeing the Union Jack representing the English language. As a more fitting comparison, it is probably a tad bit more irritating than British people seeing the US flag representing English (which happens a lot here in Asia).

sacredceltic sacredceltic March 8, 2016, edited March 8, 2016 March 8, 2016 at 7:48:50 AM UTC, edited March 8, 2016 at 7:49:25 AM UTC link Permalink

Carving political borders doesn't define a language. Especially since, in this case, the "Province of Valencia" has been carved by the Spanish state, without consulting speakers of its languages.

Various attempts to fragment language communities have been and are still cunningly used by central states to justify arbitrary enforced political borders and divide communities. This is also the case of Romanian in Moldova (which has been carved out of Romania by Stalin) and many other areas of the world.

In the Province of Valencia, there is a majority of speakers of Catalan. Of course, one can always debate - often for political reasons - that the local variant constitutes another dialect (not the inhabitants of any two towns speak exactly the same one anywhere on earth) but it's abusive to call it a different language, and, in the case of Valencian, ISO was not fooled.

In the city of Valencia itself, there are speakers of Catalan and Castillan, and both languages are often mixed. That doesn't define a third language.
Among famous Catalan writers, many are from Valencia itself.

Now, that doesn't justify flags anyway. I think flags are not helpful on Tatoeba. Worse, they have a nasty nationalist competitive effect, which is noxious to the quality of sentences.
Flags also help the bullying and erasing of minority languages.

{{vm.hiddenReplies[25714] ? 'expand_more' : 'expand_less'}} hide replies show replies
cueyayotl cueyayotl March 8, 2016 March 8, 2016 at 8:11:19 AM UTC link Permalink

> Carving political borders doesn't define a language... I think flags are not helpful on Tatoeba.

Political borders SHOULDN'T define a language, though we still have the SIL (the ISO people) classifying languages by "country of origin". If everyone had yours and CK's point of view of the flags, there wouldn't be a problem. Unfortunately, it's not the case, and so the Valencian variety of Catalan ends up being grossly underrepresented on this site. (I checked... we didn't have any clearly Valencian sentences before this new user isnamar)

> Flags also help the bullying and erasing of minority languages.

True. But, removing flags here on Tatoeba is not an option... at least not for now. We must do something though; your opinion is important too. Otherwise the Catalan flag we have now could very well "bully and erase" the potential to have a large Valencian corpus.

{{vm.hiddenReplies[25715] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK March 8, 2016, edited October 30, 2019 March 8, 2016 at 8:34:14 AM UTC, edited October 30, 2019 at 10:47:09 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[25716] ? 'expand_more' : 'expand_less'}} hide replies show replies
alexmarcelo alexmarcelo March 8, 2016 March 8, 2016 at 11:51:35 AM UTC link Permalink

This may sound naive, but how difficult would it be, if possible, to implement flag-changing icons (e.g. animated GIFs)?

{{vm.hiddenReplies[25718] ? 'expand_more' : 'expand_less'}} hide replies show replies
odexed odexed March 8, 2016 March 8, 2016 at 11:56:02 AM UTC link Permalink

I don't think the flag is the problem. This is how everyone can set their own flag for any language he wants.

https://tatoeba.org/eng/wall/sh...#message_25111
https://tatoeba.org/eng/wall/sh...#message_25112

{{vm.hiddenReplies[25720] ? 'expand_more' : 'expand_less'}} hide replies show replies
alexmarcelo alexmarcelo March 8, 2016 March 8, 2016 at 11:57:53 AM UTC link Permalink

That's not something every user can or would be willing to do. It's extra work.

{{vm.hiddenReplies[25721] ? 'expand_more' : 'expand_less'}} hide replies show replies
odexed odexed March 8, 2016, edited March 8, 2016 March 8, 2016 at 12:09:45 PM UTC, edited March 8, 2016 at 12:10:37 PM UTC link Permalink

It's not that hard to do either. If you set Brazilian flag for Portuguese, other people from Portugal can say they don't like that change. I think that people who have a personal problem with some flag can change it as suggested. And people who accept the things as they are don't care about it. As to me, I like the way Tatoeba looks now and I got used to it. I believe it's better not to make so radical changes.

{{vm.hiddenReplies[25722] ? 'expand_more' : 'expand_less'}} hide replies show replies
alexmarcelo alexmarcelo March 8, 2016, edited March 8, 2016 March 8, 2016 at 12:21:15 PM UTC, edited March 8, 2016 at 12:26:03 PM UTC link Permalink

> I think that people who have a personal problem with some flag can change it as
> suggested.
My point is that people who try Tatoeba for the first time wouldn't even know about the existence of this procedure, and they'll probably leave if they don't like the interface before someone tells them about it.

> If you set Brazilian flag for Portuguese, other people from Portugal can say they
> don't like that change.
So why do Brazilians, Angolans, etc have to take the rap?

{{vm.hiddenReplies[25724] ? 'expand_more' : 'expand_less'}} hide replies show replies
odexed odexed March 8, 2016 March 8, 2016 at 12:27:04 PM UTC link Permalink

I believe that a better solution would be to show a different flag for different people depending on their IP adress. It's not that hard to find out the country this way. So people with Brazilian adresses would see the Brazilian flag and people from Portugal would see the Portuguese flag. As to other people, I would prefer to leave all flags the way we are accustomed to seeing.

TRANG TRANG March 8, 2016 March 8, 2016 at 12:41:18 PM UTC link Permalink

> how difficult would it be, if possible, to implement flag-changing icons
> (e.g. animated GIFs)?

Not too difficult, but I would veto this idea. It would be bad design.

CK CK March 8, 2016, edited October 30, 2019 March 8, 2016 at 11:54:30 AM UTC, edited October 30, 2019 at 10:47:01 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[25719] ? 'expand_more' : 'expand_less'}} hide replies show replies
odexed odexed March 8, 2016, edited March 8, 2016 March 8, 2016 at 12:14:15 PM UTC, edited March 8, 2016 at 12:17:22 PM UTC link Permalink

I'm quite sure that most people wouldn't like this change. For example, I don't even know what 'ron' language code stands for. For new users this would cause a lot of trouble.

{{vm.hiddenReplies[25723] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK March 8, 2016, edited October 30, 2019 March 8, 2016 at 12:24:27 PM UTC, edited October 30, 2019 at 10:46:53 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[25725] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic March 8, 2016 March 8, 2016 at 3:47:14 PM UTC link Permalink

>However, at least for English and Japanese, the language names are based on the "original" countries that spoke them.

No, the flag for English is currently the Union Jack, which represents the union of 4 peoples and linguistic communities : Scot, Welsh, Gaelic and English speakers...

Choosing to make it represent but one of them is a gross simplification with vast political implications.

{{vm.hiddenReplies[25747] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK March 8, 2016, edited October 30, 2019 March 8, 2016 at 3:52:21 PM UTC, edited October 30, 2019 at 10:46:33 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[25748] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic March 8, 2016 March 8, 2016 at 4:10:26 PM UTC link Permalink

The Flag for England is the Cross of Saint Georges, red on white...Scots, Welsh and Gaels have nothing to do with it...

User55521 User55521 March 9, 2016, edited March 9, 2016 March 9, 2016 at 7:24:18 PM UTC, edited March 9, 2016 at 7:24:59 PM UTC link Permalink

I strongly oppose using Latin-script language codes because I like the diversity of the writing systems, and Latin script is a threat to this diversity.

I make a conscious effort to mark languages using Cyrillic or Chinese characters in my comments (#4579872, #1971830) exactly because I don't like Latin-script language codes.

I agree that flags are not the best solution, but Latin-script language codes arenʼt any better.

TRANG TRANG March 8, 2016 March 8, 2016 at 12:35:18 PM UTC link Permalink

> But, removing flags here on Tatoeba is not an option... at least not for now.

I usually refer to what you call flags as "language icons" because for me, the point is not to have flags but to have a visual representation of a language.

Removing flags in the sense of not using country flags to represent a language, is an option.
Removing flags in the sense of removing the icon and simply have a string to represent languages, is also still an option, but not one that I'm in favor of.

Icons are a powerful compact way to represent a concept. They're necessary to design good user interfaces. The difficulty is to find/design the right icon.

We use country flags for many of the language icons because that's one of the easier and commonly used solutions. When it comes to the "popular" languages, if you ask yourself what image people will intuitively associate to a certain language, country flags often come to mind.

But it doesn't have be a country/region flag, never had to be, and cannot be applied for every language. We have several languages for which the icon is not a country/region flag. What's important is that it is an image that people who speak the language can recognize as a representation of their language.

Therefore feel free to review the icon for Catalan. I'm completely fine with changing language icons that don't reflect properly a language. We don't always get it right on the first try.

alexmarcelo alexmarcelo March 8, 2016, edited March 8, 2016 March 8, 2016 at 11:43:50 AM UTC, edited March 8, 2016 at 12:33:47 PM UTC link Permalink

If we want to make Tatoeba really inclusive, then we should think about another standard for languages, since ISO 639-3 doesn't distinguish between some languages and dialects.

I've seen myself several colleagues complaining that they are not willing to contribute because Tatoeba doesn't include their language (as judging from the flag). And they do have a point. Brazilian Portuguese differs much more from European Portuguese than, say, American English differs from British English.

I've lived in Valencia for several months and, believe me, cueyayotl's got a point when he says we're losing potential contributors. In this particular case, I think the flag issue is much bigger than the language code itself.

If we still insist on using ONE flag per language, than a much more intelligent idea would be to use the flag of the country with the highest number of native speakers. For example, the Brazilian population is 20 times higher than the Portuguese population, so, by logic, we're losing 20 potential Brazilian contributors for 1 Portuguese contributor.

The short-term solution would be to abolish the use of country flags or include more. The long-term solution would be to adopt another language standard for Tatoeba. Of course no
language standard can possibly represent every single dialect in the world, but I'm pretty sure one could find something at least less exclusive than ISO 639-3.

That we would have lots of near duplicates is no trouble. "Banana" is "banana" in dozens of languages and yet doesn't make them the same.

{{vm.hiddenReplies[25717] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG March 8, 2016 March 8, 2016 at 12:43:55 PM UTC link Permalink

> If we want to make Tatoeba really inclusive, then we should think about another standard
> for languages, since ISO 639-3 doesn't distinguish between some languages and dialects.

If there's a more complete/accurate/appropriate language categorization standard out there, we can definitely consider it. But is there one?

{{vm.hiddenReplies[25729] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic March 8, 2016 March 8, 2016 at 1:24:49 PM UTC link Permalink

Je pense que l'essentiel du problème est en fait politique. C'est particulièrement le cas pour le Catalan "valencien", dont la revendication apparaît concomitamment à l'approche de l'indépendance catalane...
Je parie que si on avait 2 corpus de phrases catalanes, 99% des phrases seraient communes aux deux...
Personnellement, je comprends cependant que des peuples et des locuteurs refusent de se ranger derrière des drapeaux qu'ils récusent.
Le problème est donc moins un problème de définition de langues, que l'ISO fait fort bien, qu'un problème de drapeaux.
Soit on passe à plusieurs drapeaux par langue, soit (ma préférence) on supprime les drapeaux et on les remplace par les codes de langues. D'autres sites l'ont fait sans problème.

{{vm.hiddenReplies[25731] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux March 8, 2016 March 8, 2016 at 1:59:13 PM UTC link Permalink

> D'autres sites l'ont fait sans problème.

Lesquels ?

{{vm.hiddenReplies[25734] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic March 8, 2016 March 8, 2016 at 2:21:47 PM UTC link Permalink

http://flagsarenotlanguages.com...ing-languages/

{{vm.hiddenReplies[25737] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK March 8, 2016, edited October 30, 2019 March 8, 2016 at 2:40:30 PM UTC, edited October 30, 2019 at 10:46:45 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[25738] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic March 8, 2016 March 8, 2016 at 2:55:31 PM UTC link Permalink

Well, as I said in French below : this is a very political matter, with or without flags anyway. So of course it is hotly debated. But flags actually fuel even more the problem, because they're adding an additional layer of polemics to the linguistic one.
By sheer respect for people, flags should be done away with.

sacredceltic sacredceltic March 8, 2016, edited March 8, 2016 March 8, 2016 at 3:29:25 PM UTC, edited March 8, 2016 at 3:30:00 PM UTC link Permalink

Nous avons déjà eu ce débat sur le mur il y a plusieurs années (et j'étais déjà très hostile aux drapeaux)
J'avais produit à l'époque un lien vers un site, que je ne retrouve pas, créé par un ingénieux italien (je crois me souvenir) qui proposait un système de combinaison de 3 signes (ronds, carrés ou triangles...peut-être des étoiles aussi...), de couleurs variées, tirés au hasard, pour représenter les langues. Ensuite, la combinaison restait fixée pour l'éternité, de façon à ce que les utilisateurs du site s'y habituent définitivement.
C'était une façon brillante de résoudre le problèmes des susceptibilités nationales et culturelles.
Son seul défaut était de ne pas être universel. Mais ça ne me paraît pas si compliqué à généraliser : il suffit de rassembler les bonnes volontés.
Il faudrait se renseigner sur une initiative internationale de ce type qui définirait un standard universel de représentation graphique des langues sans drapeaux. Peut-être que le W3C a déjà réfléchi à la question. De grands acteurs du web comme Goggle ou Facebook y sont intéressés au premier chef. Peut-être faudrait-il les interroger sur le sujet, et si une telle initiative n'existe pas, la leur proposer dans le cadre du W3C ?

alexmarcelo alexmarcelo March 8, 2016, edited March 8, 2016 March 8, 2016 at 1:56:58 PM UTC, edited March 8, 2016 at 1:59:47 PM UTC link Permalink

The only one I know of is Linguasphere.
http://www.linguasphere.info/lc...e-welcome.html

For Portuguese, for example, the classifications can be found here:
http://www.linguasphere.info/lc...ndex_o-p-q.pdf

alexmarcelo alexmarcelo March 8, 2016, edited March 8, 2016 March 8, 2016 at 2:10:06 PM UTC, edited March 8, 2016 at 2:32:57 PM UTC link Permalink

Another feasible solution is to set the flag according to the speaker's country/community. If I register as a Brazilian, it makes sense that my senteces exhibit the Brazilian flag. It's just aesthetics; this wouldn't cause any trouble to the language classification. Its code and the results in a search would still be the same.

At any given time one could easily assign, say, all my Portuguese sentences the Brazilian flag; or isnamar's sentences the Valencian flag, even if they've been written long ago.

What are the cons?

gillux gillux March 8, 2016 March 8, 2016 at 1:53:47 PM UTC link Permalink

> That we would have lots of near duplicates is no trouble. "Banana" is "banana" in dozens of languages and yet doesn't make them the same.

I do think it’s a problem when it comes to sentences that are equal in writing and meaning in different language varieties. Let’s say I want to search for sentences containing the word “banana” in English. (English is NOT a good example, but it’s just an example.) If I need to look it up in British English, then American English, and then Australian English etc. to find different sentences, it’s not usable. Equally, if I need to enter the sentence “I like bananas.” in all the English varieties, it’s very cumbersome.

A possible way to deal with this problem could be a hierarchical classification of languages and varieties, to allow some sentences to belong to the global “English” category, and others to one or more specific variant like “British English” or “American English”. This way, one could add or search for sentences in either English or one of its variants. At the moment, we’re using tags for this, but they are not systematically used, not usable by regular users and not visible enough. We could have a sort of subflag or something to indicate the language along with its possible variety. Do you think the Valencian problem could be solved by something like this?

Note that the ISO 639-6 standard was mean to classify language varieties, but it’s dead. We may as well establish our own classification scheme.

{{vm.hiddenReplies[25732] ? 'expand_more' : 'expand_less'}} hide replies show replies
alexmarcelo alexmarcelo March 8, 2016 March 8, 2016 at 2:04:25 PM UTC link Permalink

> I do think it’s a problem when it comes to sentences that are equal in writing and
> meaning in different language varieties.
You've answered yourself. The use of metadata would be a solution. The same way we have tools to link sentences together, we could have tools to link varities together.

> We may as well establish our own classification scheme.
I couldn't agree more.

{{vm.hiddenReplies[25735] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic March 8, 2016 March 8, 2016 at 2:44:21 PM UTC link Permalink

et comment allez-vous classifier le Portuño ? Dans la classe du portugais ou celle de l'espagnol ?

Je vous le dis : c'est un vrai merdier. N'y mettez même pas le petit doigt !

{{vm.hiddenReplies[25740] ? 'expand_more' : 'expand_less'}} hide replies show replies
alexmarcelo alexmarcelo March 8, 2016, edited March 8, 2016 March 8, 2016 at 2:54:30 PM UTC, edited March 8, 2016 at 2:55:26 PM UTC link Permalink

Je n'essaie pas de proposer un système de classification parfait. J'essaie juste d'en proposer un un peu moins biaisé.

Le système actuel de classification via drapeau est insuffisant et, dans une certaine mesure, manque de respect.

{{vm.hiddenReplies[25741] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic March 8, 2016 March 8, 2016 at 3:03:14 PM UTC link Permalink

Je suis d'accord pour le manque de respect. Ça encourage aussi des comportements nationalistes déplacés.
Mais de là à définir sa propre classification des langues !
L'ISO est le lieu le moins mauvais qu'on ait trouvé internationalement pour ça.
N'oubliez pas que la définition des langues, et leurs noms mêmes, sont des instruments politiques de l'oppression de peuples entiers.
Pourquoi croyez-vous que la République Populaire de Chine insiste tellement pour qu'on appelle « chinois » le « mandarin », qui n'est la langue que d'une partie des citoyens chinois ? (et elle parvient très bien à imposer sa politique sur Tatoeba...)

{{vm.hiddenReplies[25743] ? 'expand_more' : 'expand_less'}} hide replies show replies
alexmarcelo alexmarcelo March 8, 2016, edited March 8, 2016 March 8, 2016 at 3:18:54 PM UTC, edited March 8, 2016 at 3:31:36 PM UTC link Permalink

Oui, je sais. Il y a plusieurs langues chinoises.

Moi je pense que chez Tatoeba c'est un problème très sérieux de soutenir cette relation langue-pays. Malheureusement, c'est un débat ancien et je sais qu'il est très improbable que les choses changent... tout pour raison esthétique.

Mas si on va utiliser les drapeaux, on devrait au moins donner à l'utilisateur l'option de choisir le drapeau de son propre pays. Je pense que ce serait un bon début.

{{vm.hiddenReplies[25744] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic March 8, 2016 March 8, 2016 at 3:34:51 PM UTC link Permalink

Ou alors, le compromis proposé par certains sur le site que j'ai produit : avoir un drapeau en GIF, animé, où tous les drapeaux concernés se succèdent à tour de rôle...Donc on retrouverait ainsi par exemple le drapeau belge, parmi d'autres, sur le GIF du français, du néerlandais et de l'allemand...

{{vm.hiddenReplies[25746] ? 'expand_more' : 'expand_less'}} hide replies show replies
alexmarcelo alexmarcelo March 8, 2016 March 8, 2016 at 8:47:45 PM UTC link Permalink

En fait, c'était ma première idée, mais TRANG l'a déjà rejetée...

TRANG TRANG March 8, 2016 March 8, 2016 at 11:55:34 PM UTC link Permalink

> je sais qu'il est très improbable que les choses changent...
> tout pour raison esthétique.

To me it is not a matter of aesthetic but a matter of usability. Icons make UI's more intuitive, more comfortable to use. Imagine that instead of the icons in the sentence's menu, you had "T" instead of the translate icon, "E" instead of the edit icon, "F" instead of the favorite icon, "L" instead of the lists icon, "A" instead of the audio icon...

As I've said in my reply to cueyayotl[1],
"we use country flags for many of the language icons because that's one of the easier and commonly used solutions. When it comes to the "popular" languages, if you ask yourself what image people will intuitively associate to a certain language, country flags often come to mind.

But it doesn't have be a country/region flag, never had to be, and cannot be applied for every language. We have several languages for which the icon is not a country/region flag. What's important is that it is an image that people who speak the language can recognize as a representation of their language."


Your suggestion of letting users choose the language icon based on their country is pretty much equivalent to introducing some sort of categorization. If a language needs to have two different icons, each of the icons will be considered as representing two different things (not necessarily two different languages, but two different *something*). So you are adding a layer of categorization.

I am not as reluctant as sacredceltic to the idea of defining our own categorization, but I understand very well his point of view.

There wasn't always this restriction for languages to have an ISO 639-3 code. We have two languages (Toki Pona and CycL) that are not part of the ISO 639-3 categorization but are nonetheless present in the corpus because there was a time when we were just adding any language people requested us to add.
Then we realized that defining what is a language, defining the boundaries between the languages, and deciding how to name the languages, was not a job we were in the position to handle at all.

Several years have past, and sure now we have a bit more experience, we have a bit more knowledge when it comes to language categorization. Does that makes us ready to tackle the problems we couldn't years ago?
I personally don't think we're ready because we don't have a group of people, with enough expertise and authority, with a good infrastructure and a good process, who are committed to doing the job that the SIL is doing.

So for the case of Portuguese, similarly to Catalan, the only viable solution I see today is that we redesign the icon so that both Brazilian and Portuguese people can acknowledge that it represents their language. If you want to push for this change to be done, then you (and anyone) are encouraged to send your icon suggestions to Ricardo14.

-----

[1] https://tatoeba.org/eng/wall/sh...#message_25727

{{vm.hiddenReplies[25768] ? 'expand_more' : 'expand_less'}} hide replies show replies
alexmarcelo alexmarcelo March 9, 2016, edited March 9, 2016 March 9, 2016 at 12:47:04 AM UTC, edited March 9, 2016 at 12:55:32 AM UTC link Permalink

TRANG, I acknowledge all the effort you've been putting into this project all this time. I've experienced myself several improvements since I joined Tatoeba almost five years ago. I didn't mean to sound rude, I was just trying to make myself heard.

The solution of adding a mixed icon for, say, Portuguese (Portugal and Brazil) sounds good for me, but it probably does NOT for people from other nationalities. So my question is: how can the "flag categorization" be bad if we have been using it all this time, except that for one country alone? When you're looking for a subtitle for a movie on the Internet, it always comes along with a flag indicating the dialect it's been translated into. I always run away from the Portuguese flag in this situations... believe me, I'd understand a Spanish subtitle better than a Peninsular Portuguese one...

I think this diversification would not only be useful and positive, but also aesthetically beautiful. Users would probably feel more motivated to translate more sentences in order to make their flags more popular. Besides, it's easily reversible in case we don't get used to it.

Please make me understand: why keeping one country flag for all users is better than allowing each user to use the flag of their own country?

{{vm.hiddenReplies[25770] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG March 9, 2016 March 9, 2016 at 10:56:29 AM UTC link Permalink

> Please make me understand: why keeping one country flag for all users
> is better than allowing each user to use the flag of their own country?

Because by doing that, we would consider that a language can be divided into several parts, and each part can be represented by a country. That doesn't work. And you have to understand as well that using two different images will mean for users that they are dealing with two different things.

Let's imagine that somehow we've implemented your suggestion and all your sentences are now displayed with the Brazilian flag.
I have no knowledge about Portuguese and I'm browsing Portuguese sentences. I see some sentences have the flag of Brazil and some sentences have the flag of Portugal.
What I will assume is that sentences that have the flag of Portugal are sentences specific to the Portuguese spoken in Portugal, and the sentences that have the flag of Brazil are sentences that are specific to the Portuguese spoken in Brazil.
What about the sentences that could be used both in Brazil and in Portugal? How do I figure it out, as a non Portuguese speaker?

That is just one problem, but I hope it's enough to make you see that it is not a good option.

{{vm.hiddenReplies[25774] ? 'expand_more' : 'expand_less'}} hide replies show replies
alexmarcelo alexmarcelo March 9, 2016, edited March 9, 2016 March 9, 2016 at 12:27:06 PM UTC, edited March 9, 2016 at 12:50:14 PM UTC link Permalink

> What I will assume is that sentences that have the flag of Portugal are
> sentences specific to the Portuguese spoken in Portugal

This is exactly what most potential users think when they find the Portuguese flag and not the Brazilian flag. You never thought that by using the flag of Portugal alone people might think EU-Portuguese is the only variant supported? Your argument is problematic for a very simple reason: you're suggesting that the mere addition of alternative flags would cause "language categorization", when by putting thousands of sentences under the same flag you're already doing that.

> What about the sentences that could be used both in Brazil and in Portugal?
> How do I figure it out, as a non Portuguese speaker?

A flag would just spot where the author of a sentence comes from. This could be made clear in the site documentation. Besides, if you're a non Portuguese speaker interested in Portuguese sentences, you probably know which variant you're interested in. This is especially true for languages with enormous linguistic differences, like BR-Portuguese and EU-Portuguese. Many EU-Portuguese sentences are garbage for people trying to learn Portuguese to live and work in Brazil, and vice-versa. As simple as that.

Using the flag of a specific country to represent a language is wrong. Using the flag of a specific country to indicate the homeland of a specific user is not.

I'm afraid Tatoeba is just not as open to diversity as once claimed.

{{vm.hiddenReplies[25775] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG March 9, 2016, edited March 9, 2016 March 9, 2016 at 3:09:24 PM UTC, edited March 9, 2016 at 9:18:35 PM UTC link Permalink

It seems to me that you are mixing up problems.
1. The problem of languages classification.
2. The problem of visual representation of a language.

My point is that I don't want Tatoeba to take responsibility for language classification. We don't have the structure to handle these decisions. That's why we currently rely on the ISO 639-3 classification.
If the ISO 639-3 defined EU-Portuguese and BR-Portuguese as two different languages, then we would have two different icons for each. But they define it as one same language, so we can only have one icon.

If we decide to introduce two language icon for Portuguese, then we are making the decision to not follow the ISO 639-3 classification, but to follow our own classification, where there's some sort of "meta language" that is Portuguese, and this meta language is split between PT-Portuguese and BR-Portuguese.

Your problem is actually about language classification: you disagree with the ISO 639-3. For you, EU-Portuguese is so different from BR-Portuguese that they should be separated in a way or another. For this problem, you have to go talk to the SIL, and tell them "Guys, EU-Portuguese sentences are garbage for people trying to learn Portuguese to live and work in Brazil, and vice-versa. Why don't you define them as two different languages?"

Meanwhile here at Tatoeba, we don't deal with language classification, we don't have the structure, expertise, authority for it.
We have however the problem of visually representing each language that we support with a 30 x 20 pixel icon. As of now, each language can only have one icon. We are aware that the categorization could be more fine grained, but we won't make icons for sub-categories. It's a lot of work, and we don't even have official sub-categories for languages. We allow users to add tags, but these tags have no official status, they are not standardized. We shouldn't even be the ones defining these sub-categories, because again, we are not responsible for language classification. Which is why for the case of Portuguese, we won't use two different icons based on whether the author comes from Portugal or from Brazil. But we can change the current icon, which is the flag of Portugal, into an icon that would include both the EU-Portuguese and BR-Portuguese communities.

> A flag would just spot where the author of a sentence comes from.

This is an information that belongs to the user profile, not to the sentence. A user could come from Portugal and write a sentence in BR-Portuguese. A user could also not want to reveal information about where they specifically come from or grew up because they just don't want this information to be public. And if a user comes from Belgium and we display the Belgium flag, what language would that be? Dutch? French? German?

I hope I was able to be clear enough because I don't know how else to explain this ^^'

{{vm.hiddenReplies[25777] ? 'expand_more' : 'expand_less'}} hide replies show replies
alexmarcelo alexmarcelo March 9, 2016 March 9, 2016 at 6:25:52 PM UTC link Permalink

Clear enough. Thank you.

sacredceltic sacredceltic March 10, 2016, edited March 10, 2016 March 10, 2016 at 8:31:45 AM UTC, edited March 10, 2016 at 11:42:39 AM UTC link Permalink

Je pense que vous faites une confusion : le SIL n'est pas un organisme international chargé de définir les classifications des langues. C'est l'ISO qui fait ça. Le SIL est une ONG, active dans l'éducation linguistique, avec ses propres objectifs, qui participe à ces travaux mais ne les supervise pas.
In fine, la classification des langues est un acte géo-politique et donne lieu à de très houleux débats. Des critères objectifs de différenciation des langues et dialectes y ont été élaborés et c'est sur ces critères que se fonde la classification.

Pour que deux langues se différencient, selon cette classification, il ne suffit pas que leurs expressions idiomatiques soient différentes (ce qui est le cas du français de France et du Canada, par exemple), il faut qu'elles se différencient de manière importante au niveau du vocabulaire, de la syntaxe, de la grammaire (ceci est évalué quantitativement).

Il faut distinguer langues et cultures. On peut avoir des cultures très différentes et employer la même langue, à quelque chose près.
La perception d'étrangeté, ou même de non inter-compréhension, n'est pas un critère objectif, en l'occurence.
Le fait est que certains néerlandophones belges ne se comprennent pas entre eux, et que moi-même, je ne comprends pas certains Québecois lorsque je les entends. J'ai également pu voir des Allemands du nord et du Sud qui ne se comprenaient pas. Ces gens emploient pourtant objectivement la même langue. C'est la combinaison de : accent étrange + expressions idiomatiques inconnues, qui donne une impression de différence. Mais cette impression est superficielle lorsqu'on passe à l'écrit et à la lecture.

Hier, j'ai entendu un Belge employer une expression judiciaire que je ne connaissais pas. Du coup, je n'ai pas même compris les mots employés, pourtant simples, une fois que je les ai compris, et j'ai donc dû faire répéter plusieurs fois à cette fin. La personne s'en est vexée, parce que, me sachant français, elle a cru que je snobais son français (le syndrome des minorités). J'ai eu toutes les peines du monde à lui expliquer que je n'avais jamais entendu cette expression auparavant, et que, en conséquence, je n'ai même pas reconnu les mots employés, et que ce que j'avais entendu m'était si peu familier que je n'avais même pas su en grouper les syllabes pour former quelque chose de compréhensible pour moi. La personne est restée vexée en affirmant que c'était du bon français (ce que je n'ai jamais contesté...)
Ce n'était pas un problème d'accent, seulement marginalement, mais un problème de manque de familiarité.
Donc on peut souvent parler la même langue et ne pas se comprendre. Les langues sont des outils imparfaits, et il est très dommage que tant de gens l'ignorent et glorifient leurs langues comme si elles étaient des perfections absolues. Ce faisant, elles sous-estiment les multiples problèmes de communication dont les langues sont la cause.

La perfection n'est pas de ce monde.

{{vm.hiddenReplies[25788] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG March 10, 2016 March 10, 2016 at 2:12:07 PM UTC link Permalink

Oui vous avez raison. Merci pour la clarification. En effet le SIL ne se charge pas de définir les règles de classification, mais seulement de les appliquer.

cueyayotl cueyayotl March 9, 2016 March 9, 2016 at 2:16:41 AM UTC link Permalink

The Tatoeba project has grown considerably, and will continue to do so. As we add more and more languages, the issues become more apparent, and more complicated. The SIL has done amazing work in categorizing exactly WHAT is a language and WHAT is a dialect. If we use any categorization, it is imperative that we at least use their work as a backbone. This does not mean, however, that their work is perfect. In fact, there are a lot of ad hoc classifications they have created due to a lack of research. This is very true in Colombia, where over 50 years of conflict impeded linguistic research in the area. Even in Mexico, to give an example, I've seen a dialect of Zapotec classified as ISO 639-3 ZAB, when it resembles more ZAW (in fact, the speakers of this dialect CAN'T understand what is considered the standard for ZAB, but they understand a considerable amount of ZAW). Amastan has given good examples of bad classifications for Algerian languages as well.

If we do choose to strictly follow the SIL's ISO 639-3 codes, we should at least be lenient in our naming conventions: "Catalan/Valencian" instead of "Catalan", "Namtrik" instead of "Guambiano" (ISO 639-3 GUM: an issue I brought up some time ago), definitely using what natives call their language, rather than a misinformed classification imposed on them by linguists who hadn't done all their research. We should also consider changing our names for languages when users of certain variants emerge. Take for example, Emilian (https://en.wikipedia.org/wiki/Emilian_dialect). I don't think any native Emilian speakers call their language that. All of our Emilian sentences are currently in the Reggiano dialect, and so there hasn't been any problem. But, what if a very diligent (capable of adding thousands of perfectly good sentences to our corpus), yet very closed-minded Piacentino speaker insists on contributing in Piacentino, which is part of the Emilian dialect-continuum? Will we really deny them and lose them? But, then again, renaming ISO 639-3 EGL ("Emilian") as "Bolognese/Ferrarese/Mantovano/Modenese/Parmigiano/Piacentino/Reggiano/Vogherese" is pushing it, isn't it?

I agree with sacredceltic about the "flags". As per my original example, simply changing the language icon of ISO 639-3 CAT to something more neutral and naming the language Catalan/Valencian would solve this problem. As a language-enthusiast, I am not at all fond of the idea of separating Catalan and Valencian into two distinct languages (and neither is our native Valencian user, by the way). Now, CK is definitely on to something with the new icons, but we may have to stylize them somehow: maybe with an algorithm derived from the actual 3-letter ISO 639-3 code, or something. The flags we have now are very pretty; anyone who walks behind me at work and sees all the flags (and characteristic green header) does a double-take. If we are to replace the language icons, we need something just as flashy. But, as TRANG has mentioned: what better to identify "popular" languages than flags?

Isnamar, our native Valencian user, would prefer to have a system in which multiple "language icons" are displayed, showing us in what varieties of the language such a sentence is allowed. He gave the following three examples:
1. El gos és meu. (CAT, VAL)
2.1. El noi fiu això. (CAT)
2.2. El xiquet va fer això. (VAL)
I let him know what sacredceltic has made so painfully obvious in his comments: defining linguistic varieties is a very political process. Even for Spanish, how would we divide dialects? By country? Chiapaneco Spanish in Mexico is closer to Guatemalan Spanish than the Spanish spoken here in Sinaloa, Mexico.
This may be why the ISO 639-6 classification died, as gillux put it so well. So, the hierarchical classification of languages may be impossible.

{{vm.hiddenReplies[25771] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux March 9, 2016, edited March 9, 2016 March 9, 2016 at 1:41:59 PM UTC, edited March 9, 2016 at 1:44:05 PM UTC link Permalink

> But, then again, renaming ISO 639-3 EGL ("Emilian") as "Bolognese/Ferrarese/Mantovano/Modenese/Parmigiano/Piacentino/Reggiano/Vogherese" is pushing it, isn't it?

So ultimately, the mere naming of a language can be a political issue too.

It’s not a perfect solution, but what if we had something like “language presentation” pages for each language (or one for all the languages), that would explain what our current language icons and names are unable to express? I’m thinking about a short page that could be reach by clicking on the flag or the language name (or even a large tooltip). Its goal would be to:
• define the language more clearly when its name is ambiguous or the language encompass several varieties (which would as well solve what I said here [1])
• explain that we’re using flags and single names only for their ability to efficiently convey information

It’s just about making what is currently obscure clearer: that we’re using language names and language icons that are sometimes incorrect, but that this choice shouldn’t be interpreted as political and that all the language varieties are welcome.

[1] https://github.com/Tatoeba/tatoeba2/issues/936

{{vm.hiddenReplies[25776] ? 'expand_more' : 'expand_less'}} hide replies show replies
cueyayotl cueyayotl March 10, 2016 March 10, 2016 at 12:27:38 AM UTC link Permalink

That's good! I wonder if we could that explanation across as new users are adding the languages that they speak. Then, when they click on one of our languages, there would be some sort of notice beside the selection box explaining that this "language" encompasses such and such varieties.

sacredceltic sacredceltic March 8, 2016, edited March 8, 2016 March 8, 2016 at 2:41:21 PM UTC, edited March 8, 2016 at 2:41:49 PM UTC link Permalink

>We may as well establish our own classification scheme.

C'est un sujet extrêmement politique. Définir les langues, c'est pire que d'avoir à définir des frontières. Vous voulez essayer avec la Crimée ?
Je ne saurais trop vous déconseiller de gérer votre propre classification. C'est ouvrir une boîte de Pandore qui va déchaîner des torrents de violence et de haine...

{{vm.hiddenReplies[25739] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux March 8, 2016, edited March 8, 2016 March 8, 2016 at 6:13:04 PM UTC, edited March 8, 2016 at 6:16:48 PM UTC link Permalink

Pour mettre les choses au clair: je ne parlais que d’ajouter une classification par dessus celle de l’ISO existante, et non de reclasser toutes les langues. Mon but est d’offrir un moyen de mettre en évidence, au sein des langues existantes, les variations qui ne justifient pas à elles seules une langue à part entière (notamment pour le Valencien, et tant qu’à y être d’autres cas similaires). Ça ne prétend pas résoudre tous les cas de figure comme celui du Portuñol.

Tout comme alexmarcelo, je pense que ce n’est pas parce que le problème de la classification des langues en général est insoluble qu’il faut abandonner d’entrée de jeu. La situation actuelle (classement selon la norme ISO) est de toute manière bancale, alors pourquoi ne pas tenter de la rendre moins mauvaise ? Le problème des drapeaux, quant à lui, reste entier, avec ou sans classification des variations. Tatoeba s’affiche clairement comme une plateforme en faveur de la diversité des langues, aussi je ne pense pas qu’on s’expose à « des torrents de violence et de haine » en tentant de la faire valoir.

{{vm.hiddenReplies[25760] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic March 8, 2016 March 8, 2016 at 6:30:49 PM UTC link Permalink

Je pense vraiment, qu'avec les meilleures intentions du monde, définir des "dialectes" est un acte politique. Par exemple, si j'étais l'ennemi des Basques, ou des Berbères, je prétendrais que ces langues n'existent pas vraiment et sont en fait une diversité de dialectes. C'est précisément ce que des états qui se prétendent "arabophones", comme le Maroc ou l'Algérie, font : diviser pour mieux régner.
Je suis convaincu que c'est exactement le cas du "valencien", dont les plus grands écrivains comptent en fait parmi les pères du catalan...
Je suis bien sûr que le gouvernement espagnol se réjouirait de la multiplication des catalans...