menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
orion17 orion17 November 12, 2015 November 12, 2015 at 2:13:19 AM UTC link Permalink

I wonder when the Sundanese language was added to Tatoeba. I saw that all sentences have no owner and the sentences are not good...

{{vm.hiddenReplies[24843] ? 'expand_more' : 'expand_less'}} hide replies show replies
cueyayotl cueyayotl November 12, 2015, edited November 12, 2015 November 12, 2015 at 2:45:19 AM UTC, edited November 12, 2015 at 3:08:52 AM UTC link Permalink

These sentences were contributed by a native speaker using the Southern-Priangan dialect of Sundanese.

I am trying to convince him to join the project and contribute for himself, but have thus far failed.

If you would like his contact information, please send me a private message.

I have sent you a private message with 160 translations provided by the same person which I didn't add to Tatoeba. Please let me know if they are good, or if you believe that they should just be deleted. Thank you :)

{{vm.hiddenReplies[24846] ? 'expand_more' : 'expand_less'}} hide replies show replies
Amastan Amastan November 12, 2015 November 12, 2015 at 6:51:23 AM UTC link Permalink

Cueyayotl,

That's exactly what I'm intending to do to document some threatened languages and dialects (especially Berber dialects), i.e. I'd send sentences in vernacular languages (English, standard Arabic, dialect Arabic, French, etc.) and ask the native speakers to translate them for me, then publish them on Tatoeba.

Why would I do that? Just because almost all my attemps to convince those native speakers to join Tatoeba have failed. And I'm not interested in documenting endangered languages with just a few dozen sentences like it's the case here. My 100.000 Berber sentences are enough to let a person learn basic Berber. So I think that every threatened language should have at least 100.000 example sentences, and if their native speakers aren't interested in saving them, then there are other people who do care about the surivaval of these languages. Of course, a non-native speaker might mishear and make grammatical errors or spelling mistakes, but I think that Tatoeba is the perfect tool to correct such errors. People should keep wokring all their lives to improve things and a dying dialect like the Blida Atlas Berber (Beni Salah in Ethnologue) won't wait for the few elderly who still speak it to learn how to write and read to be able to contribute sentences to Tatoeba. They'll all die before this miracle happens.

{{vm.hiddenReplies[24847] ? 'expand_more' : 'expand_less'}} hide replies show replies
cueyayotl cueyayotl November 12, 2015 November 12, 2015 at 8:05:43 AM UTC link Permalink

I totally agree with you. If I could, I WOULD go out and get 100.000 sentences of each endangered language. Unfortunately, I am now in South Korea, without access to any of these languages. Next year, I plan to go back to Cambodia for a year. I have studied Cham [CJA] (among other minority languages) for quite some time now, completely puzzled as to how to document them (nobody writes them). I DID manage to bring Ngeq [NGT] to Tatoeba, though I am entirely dissatisfied with the amount of data I collected (I DO have more in my notes, though). Now, armed with more technology, I am excited to return and record as MUCH audio as I possibly can. Sa'och [SCQ] (spoken in Veal Renh commune: about 10.7193N, 103.8275E) died out during my stay in Cambodia, but hopefully other languages don't have to.

I think 100.000 sentences is MORE than enough to learn the basics of a language :P
But, it is a very nice goal.

How many people speak Blida Atlas Berber? Since we follow SIL's conventions for naming languages, we WOULD have to classify Blida Atlas Berber under the name 'Chenoua', though I am sure that, as long as no one is contributing in another dialect of 'Chenoua', TRANG would allow it to be named "Blida Atlas Berber (Chenoua)" [Cf: Odia (Oriya)], at least temporarily :)

{{vm.hiddenReplies[24848] ? 'expand_more' : 'expand_less'}} hide replies show replies
Amastan Amastan November 12, 2015 November 12, 2015 at 8:30:47 AM UTC link Permalink

Cueyayotl,
Thanks for your message,

The Blida Atlas dialect is close to but not the same as the Chenoua dialect. Blida Atlas residents call it "tacelḥit", and they've recently started to call it "taqbaylit" (exactly as Kabyles from Kabylie refer to their dialect). The Blida Atlas dialect is a "bridge" or part of the language continuum between Kabylie and the Dahra mountains where the Chenoua dialect is spoken. One notable different between the Blida Atlas dialect and the Chenoua dialect is the post-verbal negation particule:

In Kabyle (my dialect), a verb in the negative form is preceded by "ur" (the pan-Berber pre-verbal negative particle, and followed by "ara" (the post-verbal negative particle). Thus:

"Ssneɣ" (I know) shoud be "ur ssineɣ ara" (I don't know) in Kabyle.

In Chenoua, like in other Zenata dialects, the verb is followed by a "c" [sh]:

"Ur ssineɣ c" (I don't know).

In the Blida Atlas dialect, a verb generally followed by a "k" or "ka":


"Ur ssineɣ k" or "ur ssineɣ ka"

Although we have been informed that there were some villages where they pronounced it "c" like in the Chenoua area.

All these things have to be urgently documented. In addition to that, the Blida Atlas dialect has now a radio program in Radio Metidja (just 1 hour per week), and more and more people are interested in re-learning it. Therefore, the most basic thing we should provide them is a corpus that education experts might rely on to try to revive this dying dialect.

Whenever I have time, I visit the Blida Atlas to record elderly speakers. However, they're quite hard to find, although the area is just 50 kilometers away from Algiers, Algeria's capital and largest city.

There's a question I'd like to ask you, though:

Do you think that using Tatoeba's example sentences to document these languages is a way that would allow us to document them very quickly?

I try to use TB's sentences alongside the "classical method" which consists in recording conversations and accounts from native speakers.

Could you confirm, please?

{{vm.hiddenReplies[24849] ? 'expand_more' : 'expand_less'}} hide replies show replies
cueyayotl cueyayotl November 12, 2015 November 12, 2015 at 9:22:32 AM UTC link Permalink

I know that it isn't the same as the Chenoua dialect, but the SIL STILL considers what you would call the Blida Atlas Dialect as ISO 639-3 CNU (which it calls 'Chenoua' language). Maybe you should send SIL a message persuading to give the Blida Atlas Dialect a separate ISO 639-3 code, otherwise it COULD cause problems later. I would support that decision 100% :)

The grammatical information you gave is very interesting. I have notes from Teotitlán del Valle Zapotec, which the SIL classifies under ISO 639-3 ZAB, though it is COMPLETELY unintelligible with Guelavía Zapotec (under the SAME ISO 639-3 code), but mutually intelligible with Mitla Zapotec (ISO 639-3 ZAW). I don't know who exactly came up with some of these linguistic divisions, but it is time us locals made some change. This, unfortunately, is why my Colombian friend I spoke of before did not want to join.

Anyway, as for your question, use combination of both. In a "classical method" account/conversation, it may be difficult to add translations in other languages, as many concepts are completely lost in translation. That is OK, you have several languages in which you can try to translate in, so that we can have a good idea of what the original expression was. As most of the people who speak these nearly-extinct languages have never taught their language, they are not familiar with the scheme in which we learn Indo-European languages, namely "I eat." "You eat." "He eats." etc. I have been unfortunate enough to ask for translations of these consecutively and come up with no apparent pattern after changing the verb a few times (usually it is because they changed the aspect or modality of the verb between sentences). Sentence to sentence translation without context is generally very difficult for these people. What I recommend, is to CREATE a story, or outline for a conversation using sentences from Tatoeba. Maybe even use the same outline on a different day to get a different translation. Ask questions, and try your hardest to comprehend. Keep in mind, that in your translations, 1st and 2nd person can get switched around, so try to pick up on those patterns SOON. I cannot tell you how many times I asked how to say something like "Your name is Tom." and received the equivalent of "My name is Tom."
One last thing: record EVERYTHING!! Even if you have HOURS of audio, record ABSOLUTELY EVERYTHING that they say. You won't believe how much gold will slip by when you are not recording.

I'm excited for you, I really am :D

{{vm.hiddenReplies[24850] ? 'expand_more' : 'expand_less'}} hide replies show replies
Amastan Amastan November 12, 2015 November 12, 2015 at 3:54:09 PM UTC link Permalink

Good afternoon, Cueyayotl,

To have an idea about my work and the Blida Atlas dialect, visit this link:

https://www.youtube.com/watch?v=6wOb6xncVew

The video was filmed and posted by a young French-born researcher whose parents are from the area. He himself is working on a dictionary of the Blida Atlas Berber, however his resources are limited for the time being, since his dictionary is mainly based on bibliographical works that date back to a 100 years ago (mainly Destaign and Laoust). My work will consist in documenting as many hours as possible from the last elderly people who still speak the dialect fluently (it's very hard to find them). Maybe I'll send an except of a recording to the SIL to convince them.

The reason why they consider the Blida Atlas dialect as part of the Chenoua dialect is that Laoust published in the early 20th century (Étude sur le dialecte berbère du Chenoua comparé avec ceux des Beni-Menacer et des Beni-Salah), yet the French title reads "Study of the Berber dialect of the Chenoua region with a comparison with those [i.e. separate] dialects of the Beni-Menacer and the Beni-Salah. So I don't understand why linguists (including those of INALCO among whom there are many Berbers) stubbornly consider the Blida Atlas dialect (Berni Menacer and Beni Salah) as part of the Chenoua dialect.

Another information: the Blida Atlas dialect is mutually intelligible with both Kabyle (of Kabylie) and the Chenoua dialect, and all the Northern Berber dialects are mutually intelligible to varying degrees. I think that this is due to the fact that, until a relatively recent period (10th century AD), Northern Berber was the main lingua franca in the northern part of North Africa.

>>>> Anyway, as for your question, use combination of both.

Great! That's what I've been doing so far. Yet, I too have an idea about the problems you mentioned. Recently, I worked with a friend who spoke another endanegred dialect called taɛemmuct. The dialect is spoken in the Algerian commune of Amoucha, located between Bejaia and Setif, in the southeastern part of Kabylie, an area where Kabyle borders Arabic. The subdialect is very interesting because it's very similar to Shawi, yet it's spoken in Kabylie. I took English sentences from Tatoeba and I had to translate them into Algerian dialectic Arabic so my friend understand them. I managed to have some sentences translated in his dialect, yet I noticed that sentences that could be translated in non-standardized and non-modernized languages should be related or adapted to the traditional way of life of that language's commnity. Therefore, you can't ask the speakers of such languages to translate you a sentence like "Tom took my credit card", because a credit card simply doesn't exist in communities living in remote mountains or deserts. That's why I'm planning to start working on a gigantic "basic corpus" suitable for any endangered language, including a language like South American Yanomamo.

>>>> I cannot tell you how many times I asked how to say something like "Your name is Tom." and received the equivalent of "My name is Tom."

It's the same for me, especially with elderly people. Yet when I manage to have a few hundred sentences translated, grammatical patterns start to emerge from the corpus. I wish all the speakers of these languages had a certain level of literacy, but in most cases, the best speakers (generally monolingual) are unfortunately illiterate.

Another question:
How do you do to document the full conjugation pattern of a language? Would you rely on your 1000's of hours of recordings to try to understand how verbs are conjugated in some language, or do you ask speakers (who can do it) to conjugate any new verb you come across with?

{{vm.hiddenReplies[24851] ? 'expand_more' : 'expand_less'}} hide replies show replies
cueyayotl cueyayotl November 16, 2015 November 16, 2015 at 5:40:55 AM UTC link Permalink

Good afternoon, ⵓⵎⴰⵔ

What a fantastic video! Are you able to transcribe it? It seems phonologically close enough to Kabyle to be able to transcribe without many problems.

So that's why... languages DO change a lot within 100 years, but it seems that even in Émile Laoust's time they were already different :/ there is no excuse: if they were already different enough to be classified as distinct languages before, then certainly now, they should be classified as distinct languages (and have distinct ISO 639-3 codes).

>> I'm planning to start working on a gigantic "basic corpus" suitable for any endangered language, including a language like South American Yanomamo.

In some languages, you can't count past '2'. Others don't have colors. Forget asking a Yanomamo native how to say "I rode my camel across the desert." :) I guess there could never be a list where all the sentences applied to all forms of traditional life, so I suppose it just comes down to creating a list of sentences without neologisms/technology :)
Neologisms can pose problems, as is the case for 'credit card' as you mentioned before. In Mexico, most tribal languages would just use the Spanish word, as I would assume the Algerian tribal languages would just use the Arabic word. It is OK to have these translations, but it wouldn't be very helpful in the long run. (I do remember a clever translation of 'ATM' I heard in one of the Mixtec languages, though)
It'd be great to have such a "basic corpus" of sentences suitable for daily village life. I really wonder if just using a collaborative list would suffice. Also... be careful when you are linking sentences. If you translated some of Tatoeba's sentences into Algerian Arabic FIRST, the Blida Atlas dialect should be linked to Algerian Arabic only, until you understand this dialect well enough to conclude that it is a proper translation of the English.

>> How do you do to document the full conjugation pattern of a language?

If you find ANYONE who can conjugate a new verb you come across with, HOLD ON TO THEM LIKE GOLD! In your case, the languages you wish to document are similar to your own, so the patterns will emerge much more quickly. Some languages will be more Arabic-influenced than others, but since you speak both Arabic and a Northern Berber language, you shouldn't have too many problems (unless one of the languages you wish to document has had some Nilo-Saharan influence as well).
At any rate, you must intensively research any languages within the same language family that have been documented and see the documented verb patterns, as well as those of any language that may have influenced it in order to get an idea of what meanings verbs can convey in another language. Languages like Chinese (any), Vietnamese, Khmer, etc. do not conjugate their verbs, but others, such as Korean may not conjugate by grammatical number, but DO have a very rich conjugation scheme based on tense, modality AND aspect. There are literally THOUSANDS of ways to conjugate a single verb in Korean, and I have yet to see a full list (though if I see a verb in a sentence, by all the conjugation patterns I have learned, I could identify all the 'data' it contains :) ).
Research as much as you can, and always remember the context in which a sentence was translated, as it can help give clues as to the deeper meanings of a particular instance of a verb.

gleki gleki November 16, 2015, edited November 16, 2015 November 16, 2015 at 8:11:04 AM UTC, edited November 16, 2015 at 10:04:08 AM UTC link Permalink

> I know that it isn't the same as the Chenoua dialect, but the SIL STILL considers what you would call the Blida Atlas Dialect as ISO 639-3 CNU (which it calls 'Chenoua' language). Maybe you should send SIL a message persuading to give the Blida Atlas Dialect a separate ISO 639-3 code, otherwise it COULD cause problems later. I would support that decision 100% :)

Looks like a strong indication where Tatoeba contributors in some ways exceed authors of the official ISO set in expertise.

I see no reasons to lag behind ISO in this regard.

I doubt any living person is an expert in Ithkuil but I have no doubts people are able to speak what belong to the continuum of these "Berber dialects".

TRANG TRANG November 14, 2015 November 14, 2015 at 7:37:28 PM UTC link Permalink

> Just because almost all my attemps to convince those native speakers to join Tatoeba
> have failed.

I'd be interested to know more about why these attempts have failed.

Is because they just don't feel interested in the project in general. Or is it because they found Tatoeba too complicated to use?
Is there anything we could change in Tatoeba that would have made them more likely to join?

{{vm.hiddenReplies[24884] ? 'expand_more' : 'expand_less'}} hide replies show replies
Amastan Amastan November 15, 2015 November 15, 2015 at 10:44:59 AM UTC link Permalink

Trang,
Thanks for your message,

>>>> Is because they just don't feel interested in the project in general. Or is it because they found Tatoeba too complicated to use?

No one complained about difficulties related to the use of Tatoeba. Some of them
("Uyezjen" from Ghardaia, northern Sahara, who speaks the Mozabite dialect, "Yecca" from Kabylie, who speaks the same dialect as me) have joined the project and contributed thousands of sentences together. Others have just contributed a few dozen sentences, then they left forever.

The problem with speakers of non-standardized languages, especially those who live in poorer countries, is the lack of motivation. Sometimes, they don't have the means to contribute: no Internet at home, too busy with work, no knowledge in computing, etc. However, there's a general lack of motivation, perhaps due to the fact that they have negative ideas about their minority languages, therefore, such languages 'don't deserve' so much trouble. Even informers (those we visit to make recordings) have, in some cases, to be motivated in various ways (food, presents, help to find a job, etc.). Some of them would ask me directly: Are we going to earn money from that? And sometimes, even material incentives aren't useful. I've tried to convince two young guys from the Aures area (northwestern part of Algeria) to work with me and even make a dictionary together. Even with that project of a dictionary of the Aures dialect that could be financially profitable for them, they weren't very interested.

That's why some of us would eventually end up contributing thousands of sentences of languages and dialects that are not their native language, waiting for better days when there would be more people from those communities who would have enough time and motivation to contribute directly to Tatoeba.

Yet I've also thought about something that Tatoeba could some day do to help languages that don't have a writing system: having exclusively oral corpora for these languages, waiting for the day when such languages would be transcribed or written.

orion17 orion17 November 14, 2015 November 14, 2015 at 1:02:05 AM UTC link Permalink

although he is a native speaker, you should have to make sure that he also understands the source language because the translation does not only depend on whether he is a native or not...

TRANG TRANG November 14, 2015 November 14, 2015 at 7:40:02 PM UTC link Permalink

I'd like to ask you the same question that I've asked Amastan, about your attempts to convince people to join Tatoeba.

https://tatoeba.org/eng/wall/sh...#message_24884

> I'd be interested to know more about why these attempts have failed.
>
> Is because they just don't feel interested in the project in general. Or is it because they
> found Tatoeba too complicated to use?
> Is there anything we could change in Tatoeba that would have made them more likely
> to join?

{{vm.hiddenReplies[24885] ? 'expand_more' : 'expand_less'}} hide replies show replies
cueyayotl cueyayotl November 16, 2015 November 16, 2015 at 4:33:01 AM UTC link Permalink

Thank you, TRANG.

>> Is because they just don't feel interested in the project in general. Or is it because they found Tatoeba too complicated to use?

The reasons for not joining have been plenty. In MY case, most stem from people being too lazy or not having proper motivation. Though there ARE those who have complained that Tatoeba was too complicated to use. Example: user jeronimoconstantina (who initially did have problems with the interface) invited fellow Kapampangan native speaker reyjay1, who simply couldn't figure it out, and to this day has not contributed anything to Tatoeba.

Most people who refuse to join say that they WILL join someday, but retain a "let somebody else do it" attitude. Others say that there is no way that they would work for free. Others don't want to work for others (as bizarre as this sounds, it DOES happen. I've encountered a native Abkhaz speaker who was building his own English-Abkhaz corpus... when I asked if he could volunteer on Tatoeba and then use Tatoeba as a source on his site, he apologized because he did not work for others). And, still others don't see the value of Tatoeba (often with the excuse that 'crowdsourcing' can never yield in a quality corpus). I cite, for example, user ManguPurty. He has tried his hardest to get more native Ho speakers to join, but being just 17 years old, others see him as being too young to take an initiative in something with real value. If I had the funds, I would fly to ManguPurty's village and do the recordings myself, as this Ho language is a language I have a genuine interest in (being an Austroasiatic language, related to Khmer and Vietnamese). The Ho language, unfortunately has never had a standardized orthography... even in their Varang Kshiti script, or in Devanagari.

So, I have to second Amastan's motion of having an 'exclusively oral corpora for these languages', and not just that, but the possibility for multiple recordings of the same sentence (as in forvo.com).

I'm not sure what else could be done to attract more users...

{{vm.hiddenReplies[24897] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 November 20, 2015 November 20, 2015 at 2:17:11 AM UTC link Permalink

Facebook. That's the key, I think.
As far it's the largest community in the world, we could call people from there and try to figure out how do they make it attractive to everybody.
As far Tatoeba isn't a "social network" like Facebook, we wouldn't have as many members as they but we could use it to convince members to join Tatoeba (even letting they know that it exists) and learn a bit from it.