Wall (6,084 threads)
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
10 hours ago
11 hours ago
12 hours ago
15 hours ago
15 hours ago
21 hours ago
21 hours ago
2 days ago
2 days ago
2 days ago
** Milestone Passed **
The Tatoeba Project now has over 9 million sentences.
Buen trabajo a todos!
Ergulis (Josef): https://tatoeba.org/deu/user/profile/Ergulis
[DEU] Korpuspflegerkandidat für Tschechisch
Josef kandidiert als Korpuspfleger für Tschechisch, um bei der Korrektur fehlerhafter Sätze nicht mehr aktiver Mitglieder zu helfen. Wie immer ist jeder eingeladen, sich hierzu in einer Privatnachricht zu äußern (einfach auf die Verknüpfung am Ende dieser Nachricht klicken).
[ENG] Corpus Maintainer Candidate for Czech
Josef has offered himself as a corpus maintainer for Czech to help apply necessary corrections to sentences owned by inactive members. As usual, you are all invited to give us your feedback in a private message (click on the link at the end of this message).
[EPO] Kandidato por iĝi bontenanto de la ĉeĥa frazaro
Josef kandidatas por iĝi bontenanto de la ĉeĥa frazaro kaj do helpi la korektadon de eraraj frazoj de anoj ne plu aktivaj. Kiel ĉiam ni invitas ĉiun komenti pri tio en privata mesaĝo (simple alklaku la ligilon je la fino de ĉi tiu mesaĝo).
... ist jetzt Korpuspfleger für das Tschechische.
... is corpus maintainer for Czech now.
... nun estas frazara bontenanto de la ĉeĥa frazaro.
Thanks you very much for granting me the corpus maintainer post. I really appreciate that.
Kedves magyar nyelvű felhasználók!
A mai nappal lemondtam korpuszfenntartói tisztségemről, mert nem kívánok asszisztálni a magyar nyelvű korpusz minőségi mélyrepüléséhez. A jövőben semmilyen formában nem kívánok részt venni a magyar nyelvű korpusz fenntartásában. Mindenkinek köszönöm az együttműködést!
Hány mondatot javítottál ki, amin change cimke volt?
Lemdet qbel "ilugan n tira" d wamek ara teddum ɣer wannar-a, mulac d aserwet kan. Ma yella ad nesseɣtay kan, d aḍeggeṛ n wakud akked tezmert n wallen d weɛrur ara tent-yaɣen... Mulac ad ɣ-tḥettmem ad nettlusu nnwaḍeṛ-nni n tqeṛɛet n lgazuz (am win akken-nni meskin!)! Cuḥḥet-aɣ ttxil-wet/kent!
« Vava atan dagi » blayak gan Salahnamar sometir kan Kotava.
Alles Gute zum Geburtstag, lieber Paul! Many happy returns of the day! Feliĉan naskiĝotagon! 🎉
Danke schön, liebe Lisa! Koran dankon, thanks a lot!
We have over 200 thousands Japanese sentences now!
A reason for great joy. Hooray! おめでとうございます。大喜びですね😊
Congratulations to everyone!
Of these Japanese sentences, ...
67,731 are owned by the 51 native Japanese speakers.
117,460 are "orphans" (sentences with no owners)
The rest are owned by non-native Japanese speakers.
For those who are seriously studying Japanese, I would recommend that you only trust the ones owned by native Japanese speakers.
It'd be great if we could get more regular Japanese speakers on board with this project
I appreciate all the work that bunbuku and small_snow have done in maintaining the Japanese corpus, and feel like they deserve more help
Edit: forgot to mention tommy_san and I’m probably forgetting others
No one in their right mind would do that. You would have to have a lot of free time to go over 117,460 sentences.
I worded it wrong, sorry
It happens. 👍
Until we run out of native-owned Japanese sentences for you to translate, I don't think you need to pressure anyone into proofreading all these.
I know how boring and tedious this can be. I proofread all the English sentences in the Tanaka Corpus up to a given length and adopted the ones that I thought sounded good and natural, editing the ones that could easily be edited, since they weren't already linked to various languages, which complicates editing. I ignored the long, multi-sentence examples, thinking I would eventually proofread those later.
Here are some sentences you could translate owned by 3 of our Japanese contributors with lots of sentences.
bunbuku's Japanese sentences with no translations.
small_snow's Japanese sentences with no translations
tommy_san's Japanese sentences with no translations
I mentioned it because I originally came here from Clozemaster, and the orphan sentences turned up more often than not in the Japanese section, and some of them made no sense to me
I’m not worried about running out of sentences to translate
Clozemaster is not enough good for learning a language.
iKnow has better sentence examples for words in order: https://iknow.jp/content/japanese
I use clozemaster. I have an extension to mark words.
The 'English from Magyar' is not only contains translations of Hungarian, but translations of translations, and later if the sentence is changed in Tatoeba, in clozemaster it remains unchanged.
I think the quality of Clozemaster varies from language pair to language pair. The English->Russian pair is good, and has helped me quite a bit, but there are other pairs that have the problems you describe.
Norjan oppiminen on samanlaista hupia; tietokantaa ei ole päivitetty Tatoebasta pitkään aikaan. Toisaalta minulla on tarpeeksi esitietoja, että huomaan aika monet kirjoitusvirheet eikä se ole ainoa menetelmä, jota käytän.
I've stopped using Clozemaster months ago and switched to Anki today using confirmed sentences from here
I also have a wide array of resources for Japanese
I just mentioned them because I found out about this through there.
(In the meantime, the Kabyle competitors also passed a mile stone as a rocket – 300,000 sentences!)
Is this a bot account? https://tatoeba.org/eng/user/profile/Bilgi
I don't know about you guys, but I think new users shouldn't be allowed to add so many sentences at once.
>I think new users shouldn't be allowed to add so many sentences at once.
It belongs to me. I could have added those sentences with my main account, but I preffered to create a separate one that is dedicated to that content. I have some other "special purpose" accounts, too. Since I use the same IP address, anyone that has access to IP logs can see that it's me. Maybe I should slow down the speed a bit. Sorry for the inconvenience.
Thanks for replying, soliloquist. I thought it was a troll spamming the Turkish Corpus.
Thanks for your concern.
** Updated Projects **
Tab-delimited Bilingual Sentence Pairs
Bilingual Sentence Pairs - Selected Sentences from the Tatoeba Corpus
** Related Stats **
Native-speaker-owned sentences in each language that are linked to English sentences on List 907
Counts by Username of Native Speaker-Owned Sentences Linked to Sentences on List 907
Thanks to all of you who help make these projects possible.
Found a search engine bug.
The result of searching "sweep" does not include sentences with "swept".
Can anyone make a proper issue on GitHub?
Is it a bug? I don't think so. Just one of the irregular verbs.
Do you see 'saw' or 'seen' if you search for word 'see'?
I had no idea that it doesn't work with irregular verbs. Forgive me please.
Forgive for what?
You had an assumption that this is a bug. I would say, it is not a bug. I'm not a programmer or something, I just say what I've seen.
To elaborate: We're talking about inexact ("fuzzy") matches that involve stemming, a feature supported for about two dozen languages, which are listed on this page:
The stemmer works by removing common suffixes from both the search words and the words contained in sentences to see whether the remaining parts of the words match. For instance, if you remove "-ing" from "walking" and "-ed" from "walked", the results match (and also match the bare form "walk").
There are two things to know about the stemmer:
(1) It doesn't belong to us, so the best we can do is pass along reports, and I don't think we even do that for individual words.
(2) It is not supposed to be 100% accurate. We expect to see both false positives (for example, "sing" matches "singe") and false negatives ("sweep" doesn't match "swept", as you saw).
You can explicitly combine word forms with a vertical bar in order to compensate for false negatives, and you can use the equals sign to specify an exact match in order to compensate for false positives. Note that forms recognized by the stemmer do not necessarily have to be verbs. For instance, "beauty" will match "beautiful".
For irregular verbs, you need to search like this.
Here is a page set up with all forms of the English verbs included in the search links.
Stemming in Kabyle is a complex process because of its morphology.
nouns and adjectives in kabyle have a state : free state and annexed state. So, there is a change at the beginning of the word following the state.
Verbs also are very inflectional, so it's hard to find the stem from a flexion.
We are working on a stemmer and a lemmatizer for Kabyle language. We will share the work when done
Interesting. Have you been in touch with the people at Manticore Search, who produce our search engine?
I've just heard about it. Is there any link, any contact information?
Manticore Search's stemming system is actually based on Snowball https://snowballstem.org/ , which is a specialized programming language for writing stemmers as well as a collection of stemmers for various languages.
AFAIK, Kabyle is an Afroasiatic language distantly related to Arabic, so you might want to have a look at Snowball's Arabic stemmer: https://github.com/snowballstem...hms/arabic.sbl
As for sharing your work, see their CONTRIBUTING.rst file https://github.com/snowballstem...NTRIBUTING.rst and maybe also have a look at what it took to get some other languages added (the most recent addition being Yiddish: https://github.com/snowballstem/snowball/pull/137 )
Thank you @Yorwba for sharing. I'll see this.
Indeed, Kabyle belongs to the Berber family, the same branch as Arabic, Hebrew, Amharic, . the derivation and inflectional systems are almost similar. But the script is different. Kabyle is written with Latin script.
I will see the project.