Üzenőfal (7 412 bejegyzés)
Tippek
Mielőtt bármi kérdésed lenne, kérlek, olvasd el a GYIK oldalt.
Célunk az barátságos légkör fenntartása a civilizált beszélgetések érdekében. Kérünk, olvasd el a magatartási szabályzatot.
ecorralest101
5 órával ezelőtt
IdiomHunter
5 órával ezelőtt
ecorralest101
7 órával ezelőtt
Tamajeq1286
13 órával ezelőtt
teto
21 órával ezelőtt
Tamajeq1286
tegnap
Tamajeq1286
tegnap
IdiomHunter
tegnap
Tamajeq1286
2 nappal ezelőtt
Tamajeq1286
2 nappal ezelőtt
I am currently speaking with some native speakers, and Quechua is not a single language, but rather a collection of dialects. In particular, the linguistic differences are sometimes so pronounced that speakers of a certain dialect might not understand those speaking another. Southern Quechua (known natively as urin qichwa) is by far the most widespread and spoken variety within the Quechua language family. I am currently contributing sentences in Cusco-Collao Quechua, which is a Southern Quechua variety. Southern Quechua (especially the Cusco variety or the literary standard) is usually the recommended variety for anyone wishing to start learning the language. I was wondering: wouldn't it be better to separate Quechua into 'Quechua (Southern)' and 'Quechua (dialectal)'? This way, we could ensure greater cohesion within the sentence corpus. From what I have seen here, a large part of the Quechua on this site is Southern.
Any opinions on this? I personally believe that separating Southern Quechua (from cusco) 🇵🇪 from Kichwa of Ecuador 🇪🇨 from all the other variations would be a great improvement. As I said, the cusco variation is the most standardized one and the one we should probably invest the most effort on
Hello,
I agree with you. For sure Tatoeba should implement a separation of such varieties as they differ a lot from each other. I don't speak Quechua, but admire the beauty of the indigenous languages in our region of Latin America. Each language deserves respect and recognition.
Subject: Request to add Tawallammat Tamajaq [ttq] to Tatoeba
Dear Tatoeba Team,
I would like to request the addition of a new language to Tatoeba. Here are the required details:
Language name: Tawallammat Tamajaq
ISO 639-3 code: ttq
SIL link: https://iso639-3.sil.org/code/ttq
My Tatoeba username: Tamajeq1286
Public sentence list with 100+ sentences: https://tatoeba.org/ar/sentences_lists/show/174933
Tawallammat Tamajaq is a Southern Tuareg language with ~1.3 million speakers in Niger, Mali, and Nigeria. The list contains 100+ original sentences, not duplicates, and follows the standard Latin orthography with Ǝ ɣ š.
Thank you for your support in preserving and promoting this language.
Best regards,
Tamajeq1286
Further supporting information for adding Tawallammat Tamajaq [ttq]:
To demonstrate the digital readiness and active support for this language, please note that there is already a substantial contribution on Glosbe, with over 250,000 translated entries spanning Arabic, English, and French. Please note that these are recent contributions, so they might not be fully reflected in Glosbe's public statistics yet:
- Tamajaq to Arabic: https://glosbe.com
- Tamajaq to English: https://glosbe.com
- Tamajaq to French: https://glosbe.com
Additionally, an electronic dictionary for Tamajaq is fully available. We can actively leverage this resource as a reference to contribute, verify, and expand high-quality sentence pairs directly on Tatoeba once the language is approved.
Thank you for considering this request!
Dear Tatoeba Team,Following up on my language request for Tawallammat Tamajeq [ttq], I would like to inform you that I have a clean corpus of over 7,000 parallel sentences ready for import.These sentences were previously contributed by me to Glosbe under my official profile and platform (Tamajeq.com). Since I am the original author and creator of this data, there are no copyright issues, and the corpus fully complies with Tatoeba's CC-BY licensing terms.Once the language code is activated, I can provide the dataset in the required CSV/TXT format for a developer mass-import. Thank you!"
Inclusion of Tawallammat Tamajeq can be tracked on https://github.com/Tatoeba/tatoeba2/issues/3312
As for the dataset, you can submit it to team@tatoeba.org or open a new issue on Github with the file attached.
Dear gillux,
I would like to express my deepest gratitude, utmost respect, and sincere appreciation to you and the Tatoeba team for approving my request and opening the GitHub issue so quickly. Your support in preserving and promoting this language means a great deal to our community.
Regarding the language icon for Tawallammat Tamajeq (ttq), I would like to kindly request using the national flag of Niger 🇳🇪 instead of the previous artwork. Since the vast majority of ttq speakers live in Niger and the language is officially recognized there, using the Nigerien flag will make it much easier for our community to identify their language on the website.
Thank you once again for your incredible understanding, support, and flexibility!
Best regards,
Tamajeq1286
I can’t judge on the relevance of either icons, but I’d like to point out the current broader context to help decide.
Tawallammat Tamajeq (ttq) is part of the family of Tuareg languages. The SIL lists 4 of them: https://iso639-3.sil.org/code/ttq
Tatoeba already supports Tahaggart Tamahaq (thv) https://tatoeba.org/sentences/s...ne/indifferent which is another language used in Niger. The two other Tuareg languages Tayart Tamajeq (thz) and Tamasheq (taq) are not yet supported.
Tatoeba also supports the Hausa language (hau). This language uses the flag of Nigeria with letter code on the side. (It used to use https://en.wikipedia.org/wiki/F...usa_people.svg and it is not clear why it was changed).
I previously submitted a cultural symbol, but some local linguists suggested that using the flag of Niger would be more accurate and preferable than simply a cultural symbol, since Tawallammat Tamajaq is a national language in Niger. I also suggest using 🇳🇪. Thank you very much.
Dear gillux,
I previously informed you that I have a large collection of sentences translated into English, Arabic, and French. Some are available as plain text in my notes, some in spreadsheets, and some I previously converted to PDF. You kindly informed me that I could send them by email. My question now is: what formats should I use to send the sentences to you? I would appreciate a more detailed explanation of the most suitable and fastest method for this task, given the large number of sentences. It's important that you avoid a slow, manual approach while maintaining your established grammar guidelines.
Best regards,
Tamajeq1286
Hello everyone,Thank you for this great initiative and your efforts to support the Tawallammat Tamajaq (ttq) language.To contribute to this discussion and showcase the richness of the language, I would like to share this unique Tawallammat Tamajaq pangram. It contains all the official letters and modified characters used in the standard Tamasheq alphabet:"Xa bălla imuzărăn naŋŋin wər gəmmiyăn âr taggaẓt dəɣ Məššina fəl ad agəẓ ṭǝma n ăɣrǝm han măqqăja săkṣoḍnen."This sentence is a perfect example of the language's structure and can be very useful for testing fonts, character rendering, or language data in the project.Best regards!
Hello Tatoeba team,
I wanna ask you a question. Why does Tatoeba team take too long to accept a new language request?
Dear gillux,
Is this method suitable for formatting sentences before sending them to you after saving them in a text file via email?
Ma dər tănsed? -> How did you sleep?
Ma dər toled? -> How are you?
Ma tăxallăkăd? -> How are you?
Man-ăwen-nəkk? -> How are you doing?
Tăggoḍăyăm? -> Are you thankful?
Ăytedăn-năwăn ma dər olăn? -> How are your people?
Bărarăn-nəkk ma dər olăn? -> How are your children?
Bărarăn-năm ma dər olăn? -> How are your children?
Kăyy d-ălmăz? -> You and the sunset?
Əgleɣ. -> I am leaving.
Proposal to update the Laz language flag icon
I would like to propose an update to the flag icon used for the Laz language. The current file (lzz.svg) displays a fictional design mistakenly copied from an old Wikipedia asset. Wikipedia editors have since officially marked that asset as a hoax and removed it from factual history pages (archived on Wikimedia Commons as 'File:Fictitious flag of Lazistan Sanjak.svg').
I propose replacing it with the widely recognized cultural flag of Lazona (the name of the Laz land in the Laz language):
- design and emblem: A blue field (Black Sea) with a green stripe (nature) between thin white stripes (peace) at the bottom. A white Borjgali centered in the top blue section, serving as an ancient solar symbol and traditional emblem common to the broader Kartvelian cultural family (Megrelian, Svan, and Georgian).
Why this update is beneficial:
- native verification: It is actively used by language activists and is featured by a native speaker in the educational video on the official Omniglot Laz Language Page (https://omniglot.com/writing/laz.htm) explaining its cultural significance.
- regional acceptance: Because the majority of the Laz population resides within Turkey, the Turkish language query "Lazistan Bayrağı" was specifically checked to scrutinize potential regional sensitivities. The search heavily surfaces this design as a peaceful cultural identifier, confirming that this flag represents organic cultural acceptance by the population itself rather than any political or controversial move. I performed this check explicitly to abide by Tatoeba's strict guidelines regarding neutral, conflict-free language icon choices.
- technical standards: I have built a fully compliant, optimized SVG file. The Borjgali emblem has been preserved to the absolute best detail that a micro size of 30x20 pixels can allow, and the final file footprint is well under 2KB as required.
You can view the live flag artwork proposal directly at this address:
https://itty.bitty.site/#/data:...eha5n/ByHFgGo=
Thank you for your time and support in keeping Tatoeba's language assets accurate!
Hi teto, thank you for bringing this up. Your proposal looks sound and legit, I think we should change the flag. You can keep track of the progress of the flag change on Github: https://github.com/Tatoeba/tatoeba2/issues/3308
Hi gillux,
Thank you for confirming the flag change.
Hi gillux and the community,
Thank you so much for updating the Laz language flag!
I noticed that the Tatoeba page has been updated, and it is wonderful to see this beautiful new flag.
Ennek az üzenetnek a tartalma szabályellenes, ezért rejtve maradt. Csak adminok és az üzenet szerzője láthatja.
Dear gillux,
I am currently organizing and cleaning up a database of over 10,000 sentences in Tawallammat Tamajeq (ttq) and their English translations using my phone's Notes app. I want to ensure the format is perfectly compatible with your import system.
From a data-processing perspective, I assume a single-line delimiter format is best. Would this format be suitable for you?
Format:
Ma tăxalăkă? -> How are you?
Lammădăɣ Tamajeq. -> I am learning the Tamajeq language.
(Note: If you prefer a different delimiter like a Tab (TSV), a pipe (|), or a specific symbol like (@), please let me know and I will adjust it).
Additionally, given the large volume of data (10,000+ lines), I will send the final output as an attached plain text (.txt) file rather than pasting it into the email body, unless you advise otherwise.
Thank you for your guidance and for your work on the platform!
Best regards,
Tamajeq1286
Dear gillux,
I previously informed you that I have a large collection of sentences translated into English, Arabic, and French. Some are available as plain text in my notes, some in spreadsheets, and some I previously converted to PDF. You kindly informed me that I could send them by email. My question now is: what formats should I use to send the sentences to you? I would appreciate a more detailed explanation of the most suitable and fastest method for this task, given the large number of sentences. It's important that you avoid a slow, manual approach while maintaining your established grammar guidelines.
Best regards,
Tamajeq1286
The Quechua language is composed of many roots and suffixes. This means that words sharing a similar meaning have similar roots. For example, Paqarimushan / Paqarin / Paqarillanpi are all different forms of the concept of morning and/or dawn. The search bar looks for exact matches, which means that if I search for 'Paqarin', I only find results containing exactly that term. If I wanted to search for all sentences containing the root 'Paqari-', I would not be able to do so. In my opinion, this is a limitation.
Likely you can use the OR in searches like this:
Paqarimushan|Paqarin|Paqarillanpi
Click the "Help" link above the search bar for other advanced search techniques.
What CK says about using a vertical bar to search for a variety of words is correct, but limits you to the words you specified. You may find it helpful to use an asterisk as a wildcard to let you match a variety of leading and trailing strings. For example, *paqar* will find words that contain the fragment "paqar".
The help page at the "Help" link will indeed provide you with information, and the part about the asterisk is here:
https://en.wiki.tatoeba.org/art...contain-someth
However, much of the information on the page is geared towards the 25 languages, including English, that support a stemmer, a tool written outside Tatoeba that strips common grammatical prefixes and suffixes within a language. Thus, in English, a search for "living" will also find sentences with "live" and "lived". Unfortunately, the search engine that Tatoeba uses doesn't have a stemmer for Quechua, and writing one would be a substantial project. Hopefully, using the asterisk in your searches will accomplish what you need.
https://tatoeba.org/zh-tw/users/for_language/eng
It seems like users who were active within the last week are no longer being prioritized in the display.
I confirm this issue: https://github.com/Tatoeba/tatoeba2/issues/3314
Should users be allowed to add original sentences in ancient or dead languages? Some contributors have been writing original Classical Chinese sentences, but I don't think we have reliable means to verify their correctness. Here to name a few:
https://tatoeba.org/zh-tw/sentences/show/1326209
https://tatoeba.org/zh-tw/sentences/show/1733573
https://tatoeba.org/zh-tw/sentences/show/1635843
https://tatoeba.org/zh-tw/sentences/show/1844943
https://tatoeba.org/zh-tw/sentences/show/2949725
Since Classical Chinese is a dead language, my personal view is that we should only accept sentences sourced from historical texts.
Members of Tatoeba have been contributing original sentences in dead languages for a long time, including Latin, ancient Greek, old French, middle French, ancient Hebrew, old Spanish, old Turkish, just to name a few. Changing the policy now would not be very practical, and some sentences may have been sourced from historical texts without attribution.
Of course there are no "native speakers" of these languages who may claim they are correct and natural, but I think we can approach a similar quality level based on non-native checks. I have a feeling that people who go all the way learning dead languages must be so passionate about it that they can be trusted.
I wonder if we currently have any corpus maintainers that are maintaining dead languages tho.
You can see here that unfortunately no dead languages have corpus maintainers: https://tatoeba.org/de/stats/native_speakers . But this problem isn't limited to dead lamguages, many widely spoken living languages are also unmaintained.
It seems that list is not updated. There're more than 1 corpus maintainers and more than 2 advanced contribution for Mandarin Chinese
I have updated the list. Related issue: https://github.com/Tatoeba/tatoeba2/issues/3313
Note that the numbers under "Contributors" are not to be taken seriously because they are highly affected by spam accounts.