Menu
Is there some central policy on minority languages and dialects? I encountered some Taishanese inside the Cantonese collection, which to a layman is clearly a different language. On the other hand, I find the Kölsch dialect of German as a distinct selection in the language list (which is great, because they ARE quite different). What to do when one's language of interest is not in the list?
Hi and welcome,
here is the answer to your question http://en.wiki.tatoeba.org/arti...nguage-request
Thank you! Yeah, many of the Chinese languages did not receive a 639-3 code, although they are at least as remote from one another as Kölsch is from German (which DOES have a -3: ksh). While certainly making things easier on the maintenance side, it would likely make sense to add languages even when they did not get a -3. As it is now, they are just contaminating the collections of related languages (As this Taishanese sentence in my Cantonese Anki collection showed me). So is the official policy to just put them into a collection where they do not really belong, or to put them into a public list for all eternity?
Also, the reason that it makes the language selection list unwieldy would probably be reason to think of a way to better represent a potentially huge (thousands) list :) I don't have a great idea in that regard too, unfortunately… grouping into language families might be difficult for laypeople to navigate, maybe an auto-complete searchbox?
> Also, the reason that it makes the language selection list unwieldy would probably be reason to think of a way to better represent a potentially huge (thousands) list :)
We started working on a solution for this but it’s still uncomplete. You can however use it by going to your settings page and enabling the advanced language selector option.
> So is the official policy to just put them into a collection where they do not really belong, or to put them into a public list for all eternity?
The policy is not to add them in the first place. Quoting AlanF¹:
“Sentences in any language, whether or not they are constructed, can be added if and only if they have an ISO 639-3 (three-letter code). Esperanto is in, Kah is not. The "unknown language" category is for sentences in languages that have an ISO 639-3 code, but that we have not yet added as recognized languages.”
Considering some dialects have been received an ISO 639-3 code, while others did not², it’s a quite unfair policy. But I understand admins don’t want to take the heavy responsibility of allowing or denying language additions.
¹ http://tatoeba.org/eng/wall/sho...#message_19666
² https://en.wikipedia.org/wiki/I...Macrolanguages
It is a fact that there is a policy in the People's Republic of China to deny existence of certain minority languages to impose national unity. The RPC is lobbying very strongly international institutions such as the ISO in order to impose its nationalistic views.
This specific Chinese situation should be taken into account.
> As this Taishanese sentence in my Cantonese
> Anki collection showed me
There is currently only one Hoisanese sentence in the whole corpus, so it shouldn’t cause much trouble. If you really need to separate it, you can use tags right now.
Also, you *can* apply to get more codes added for Chinese languages. The problem is, you need to prove it is a separate language, and describe its position within the Yue family, and 'which to a layman is clearly a different language' will not be enough.
You may be interested in this discussion: http://meta.wikimedia.org/wiki/...ipedia_Teochew ; the most important in it is this answer: http://www-01.sil.org/iso639-3/...s_2008-083.pdf
As you can see, the code for Teochew is not added not because they don’t consider it mutually intellegible with Hokkien, but because they don’t have enough information about how Minnan (nan) should be broken down: it’s clearly visible that Teochew and Hokkien should get separate ISO codes, but it’s not clear what is the broader picture (where should other Minnan dialect belong?).
>it’s clearly visible that Teochew and Hokkien should get separate ISO codes
although they don't...find the culprit !