Wall (6,088 threads)
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
20 hours ago
21 hours ago
3 days ago
We have over 200 thousands Japanese sentences now!
A reason for great joy. Hooray! おめでとうございます。大喜びですね😊
Congratulations to everyone!
Of these Japanese sentences, ...
67,731 are owned by the 51 native Japanese speakers.
117,460 are "orphans" (sentences with no owners)
The rest are owned by non-native Japanese speakers.
For those who are seriously studying Japanese, I would recommend that you only trust the ones owned by native Japanese speakers.
It'd be great if we could get more regular Japanese speakers on board with this project
I appreciate all the work that bunbuku and small_snow have done in maintaining the Japanese corpus, and feel like they deserve more help
Edit: forgot to mention tommy_san and I’m probably forgetting others
No one in their right mind would do that. You would have to have a lot of free time to go over 117,460 sentences.
I worded it wrong, sorry
It happens. 👍
Until we run out of native-owned Japanese sentences for you to translate, I don't think you need to pressure anyone into proofreading all these.
I know how boring and tedious this can be. I proofread all the English sentences in the Tanaka Corpus up to a given length and adopted the ones that I thought sounded good and natural, editing the ones that could easily be edited, since they weren't already linked to various languages, which complicates editing. I ignored the long, multi-sentence examples, thinking I would eventually proofread those later.
Here are some sentences you could translate owned by 3 of our Japanese contributors with lots of sentences.
bunbuku's Japanese sentences with no translations.
small_snow's Japanese sentences with no translations
tommy_san's Japanese sentences with no translations
I mentioned it because I originally came here from Clozemaster, and the orphan sentences turned up more often than not in the Japanese section, and some of them made no sense to me
I’m not worried about running out of sentences to translate
Clozemaster is not enough good for learning a language.
iKnow has better sentence examples for words in order: https://iknow.jp/content/japanese
I use clozemaster. I have an extension to mark words.
The 'English from Magyar' is not only contains translations of Hungarian, but translations of translations, and later if the sentence is changed in Tatoeba, in clozemaster it remains unchanged.
I think the quality of Clozemaster varies from language pair to language pair. The English->Russian pair is good, and has helped me quite a bit, but there are other pairs that have the problems you describe.
Norjan oppiminen on samanlaista hupia; tietokantaa ei ole päivitetty Tatoebasta pitkään aikaan. Toisaalta minulla on tarpeeksi esitietoja, että huomaan aika monet kirjoitusvirheet eikä se ole ainoa menetelmä, jota käytän.
I've stopped using Clozemaster months ago and switched to Anki today using confirmed sentences from here
I also have a wide array of resources for Japanese
I just mentioned them because I found out about this through there.
(In the meantime, the Kabyle competitors also passed a mile stone as a rocket – 300,000 sentences!)
Is this a bot account? https://tatoeba.org/eng/user/profile/Bilgi
I don't know about you guys, but I think new users shouldn't be allowed to add so many sentences at once.
>I think new users shouldn't be allowed to add so many sentences at once.
It belongs to me. I could have added those sentences with my main account, but I preffered to create a separate one that is dedicated to that content. I have some other "special purpose" accounts, too. Since I use the same IP address, anyone that has access to IP logs can see that it's me. Maybe I should slow down the speed a bit. Sorry for the inconvenience.
Thanks for replying, soliloquist. I thought it was a troll spamming the Turkish Corpus.
Thanks for your concern.
** Updated Projects **
Tab-delimited Bilingual Sentence Pairs
Bilingual Sentence Pairs - Selected Sentences from the Tatoeba Corpus
** Related Stats **
Native-speaker-owned sentences in each language that are linked to English sentences on List 907
Counts by Username of Native Speaker-Owned Sentences Linked to Sentences on List 907
Thanks to all of you who help make these projects possible.
Found a search engine bug.
The result of searching "sweep" does not include sentences with "swept".
Can anyone make a proper issue on GitHub?
Is it a bug? I don't think so. Just one of the irregular verbs.
Do you see 'saw' or 'seen' if you search for word 'see'?
I had no idea that it doesn't work with irregular verbs. Forgive me please.
Forgive for what?
You had an assumption that this is a bug. I would say, it is not a bug. I'm not a programmer or something, I just say what I've seen.
To elaborate: We're talking about inexact ("fuzzy") matches that involve stemming, a feature supported for about two dozen languages, which are listed on this page:
The stemmer works by removing common suffixes from both the search words and the words contained in sentences to see whether the remaining parts of the words match. For instance, if you remove "-ing" from "walking" and "-ed" from "walked", the results match (and also match the bare form "walk").
There are two things to know about the stemmer:
(1) It doesn't belong to us, so the best we can do is pass along reports, and I don't think we even do that for individual words.
(2) It is not supposed to be 100% accurate. We expect to see both false positives (for example, "sing" matches "singe") and false negatives ("sweep" doesn't match "swept", as you saw).
You can explicitly combine word forms with a vertical bar in order to compensate for false negatives, and you can use the equals sign to specify an exact match in order to compensate for false positives. Note that forms recognized by the stemmer do not necessarily have to be verbs. For instance, "beauty" will match "beautiful".
For irregular verbs, you need to search like this.
Here is a page set up with all forms of the English verbs included in the search links.
Stemming in Kabyle is a complex process because of its morphology.
nouns and adjectives in kabyle have a state : free state and annexed state. So, there is a change at the beginning of the word following the state.
Verbs also are very inflectional, so it's hard to find the stem from a flexion.
We are working on a stemmer and a lemmatizer for Kabyle language. We will share the work when done
Interesting. Have you been in touch with the people at Manticore Search, who produce our search engine?
I've just heard about it. Is there any link, any contact information?
Manticore Search's stemming system is actually based on Snowball https://snowballstem.org/ , which is a specialized programming language for writing stemmers as well as a collection of stemmers for various languages.
AFAIK, Kabyle is an Afroasiatic language distantly related to Arabic, so you might want to have a look at Snowball's Arabic stemmer: https://github.com/snowballstem...hms/arabic.sbl
As for sharing your work, see their CONTRIBUTING.rst file https://github.com/snowballstem...NTRIBUTING.rst and maybe also have a look at what it took to get some other languages added (the most recent addition being Yiddish: https://github.com/snowballstem/snowball/pull/137 )
Thank you @Yorwba for sharing. I'll see this.
Indeed, Kabyle belongs to the Berber family, the same branch as Arabic, Hebrew, Amharic, . the derivation and inflectional systems are almost similar. But the script is different. Kabyle is written with Latin script.
I will see the project.
Congratulations to the Kabyle team for the work done.
It would be nice if these had translations
Kabyle sentences without translations = 252,055
Kabyle sentences with translations = 86,876
Indeed, but that is up to non-natives to do, mostly, since it is encouraged for people to only or mostly translate to their strongest language.
(Some of the short sentences are without punctuation. Maybe they should have some.)
Certainly, I'll try to translate most of them every day.
Thank you for reminding me that. ;-)
** Stats & Graphs **
Tatoeba Stats, Graphs & Charts have been updated:
Thanks. Would you consider adding the original sentence ratio (ocnt/scnt) into the chart like you do with tcnt/scnt? The "Base of Sentences" part on the Downloads page probably has the necessary data.
I can't promise but I'll try to implement this feature some day.
Just take your time, thank you.
Tanemmirt-nwen a yiwaziwen n tutlayt Taqbaylit.
So I've just returned to Tatoeba after a long absence (like, years) and I'm super happy to see you're all still here, going from strength to strength. What a wonderful project this is! That's all I wanted to say. :-)
Happy to read this.
Welcome back among us, Sir.
Welcome back! We're happy you've decided to return. 😊
All accounts without a (nick)name with unfilled profile seem suspicious.
If a person is serious and honest with Tatoeba and its members, then it shouldn't be any problem to share some information about themselves.
Nem jelent az még semmit. Bármilyen nevet beírhatok.
Én határozom meg, melyik nyelvből milyen jól állok. Ami végül dönt, hogy hogyan látják a tevékenységemet mások. Ha 4 csillagra jelölök valamit, amihez értelmeset nem tudok hozzáböfögni, látható lesz.
A név nem ilyen, a végtelenségig leplezhetem, legyek akár kártékony, akár aranyat érő.
Attól, hogy a TAJ-számomat nem adtam meg, még nem kell hogy kinézzek gyanúsnak, vannak fenn az oldalon ezerrel olyan "felhasználók", akik nem kellőek, mert pl reklámprofilok csak, és azoknak nagyon is ki van töltve a profiljuk. Szép képük is van, logóval.
Amit be lehetne vezetni, hogy legalább a reklámprofilok megfogyatkozzanak, pl összetetteb regisztráció; weboldal belinkelésének engedélyezése csak akkor legyen engedélyezett, ha már valamennyit hozzájárult a projekthez az adott felhasználó. Inc. Ltd. stb rövidítések tiltása nevek megadásánál, 'store' szó tiltása felhasználónévben, a felhasználónévvel azonos név megadásának tiltása a profilnál, a felhasználónévvel azonos karaktersorozatot tartalmazó weboldal belinkelésének tiltása, pl xyz felhasználónévvel www.xyz.com
Botok kiszűréséhez nem értek, de annak is biztos megvan a technikai módja, ha pedig erőforrásigényes, érdemes csak egy-egy gyanús felhasználónál (rövid időn belül sok mondatot író, nem fordítási tevékenységet folytató felhasználók) biztosra megmondhatóra kivizsgálni.
Ei kannata nostaa osallistumiskynnystä vaatimalla ylimääräistä työtä.
** Stats - 2020-11-15 **
This week only includes the 117 usernames with native-speaker contributions that have linked sentences since the last exported data, one week ago.