menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Wall (6,034 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages subdirectory_arrow_right

Pfirsichbaeumchen

3 hours ago

feedback

Igider

12 hours ago

feedback

Luce

17 hours ago

subdirectory_arrow_right

PaulP

22 hours ago

feedback

Pfirsichbaeumchen

yesterday

subdirectory_arrow_right

Igider

yesterday

subdirectory_arrow_right

belkacem77

yesterday

subdirectory_arrow_right

inkepa

3 days ago

subdirectory_arrow_right

soliloquist

3 days ago

subdirectory_arrow_right

soliloquist

3 days ago

Pfirsichbaeumchen Pfirsichbaeumchen 6 days ago November 22, 2020 at 8:40:01 AM UTC link Permalink

Ergulis (Josef): https://tatoeba.org/deu/user/profile/Ergulis

[DEU] Korpuspflegerkandidat für Tschechisch

Josef kandidiert als Korpuspfleger für Tschechisch, um bei der Korrektur fehlerhafter Sätze nicht mehr aktiver Mitglieder zu helfen. Wie immer ist jeder eingeladen, sich hierzu in einer Privatnachricht zu äußern (einfach auf die Verknüpfung am Ende dieser Nachricht klicken).

[ENG] Corpus Maintainer Candidate for Czech

Josef has offered himself as a corpus maintainer for Czech to help apply necessary corrections to sentences owned by inactive members. As usual, you are all invited to give us your feedback in a private message (click on the link at the end of this message).

[EPO] Kandidato por iĝi bontenanto de la ĉeĥa frazaro

Josef kandidatas por iĝi bontenanto de la ĉeĥa frazaro kaj do helpi la korektadon de eraraj frazoj de anoj ne plu aktivaj. Kiel ĉiam ni invitas ĉiun komenti pri tio en privata mesaĝo (simple alklaku la ligilon je la fino de ĉi tiu mesaĝo).

[JPN] チェコ語コーパス整備員候補者
Josefさんは、非アクティブなメンバーが所持する例文を必要に応じて改善できるチェコ語コーパス整備員になりたいと志願しています。従来通り、プライベートメッセージでフィードバックをお寄せください。(このメッセージの文末にあるリンクをクリックしてください。)

https://tatoeba.org/private_mes...rsichbaeumchen

{{vm.hiddenReplies[36188] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pfirsichbaeumchen Pfirsichbaeumchen 3 hours ago November 29, 2020 at 1:11:54 AM UTC link Permalink

Josef ...

... ist jetzt Korpuspfleger für das Tschechische.
... is corpus maintainer for Czech now.
... nun estas frazara bontenanto de la ĉeĥa frazaro.

Igider Igider 12 hours ago, edited 11 hours ago November 28, 2020 at 4:56:59 PM UTC, edited November 28, 2020 at 5:12:07 PM UTC link Permalink

Iwiziwen/ Tiwiziwin,

Lemdet qbel "ilugan n tira" d wamek ara teddum ɣer wannar-a, mulac d aserwet kan. Ma yella ad nesseɣtay kan, d aḍeggeṛ n wakud akked tezmert n wallen d weɛrur ara tent-yaɣen... Mulac ad ɣ-tḥettmem ad nettlusu nnwaḍeṛ-nni n tqeṛɛet n lgazuz (am win akken-nni meskin!)! Cuḥḥet-aɣ ttxil-wet/kent!

Tanemmirt.

Luce Luce 17 hours ago November 28, 2020 at 11:57:52 AM UTC link Permalink

« Vava atan dagi » blayak gan Salahnamar so metir kan Kotava.

Pfirsichbaeumchen Pfirsichbaeumchen yesterday, edited yesterday November 28, 2020 at 2:11:15 AM UTC, edited November 28, 2020 at 2:12:24 AM UTC link Permalink

@PaulP

Alles Gute zum Geburtstag, lieber Paul! Many happy returns of the day! Feliĉan naskiĝotagon! 🎉

{{vm.hiddenReplies[36220] ? 'expand_more' : 'expand_less'}} hide replies show replies
PaulP PaulP 22 hours ago November 28, 2020 at 6:55:22 AM UTC link Permalink

Danke schön, liebe Lisa! Koran dankon, thanks a lot!

inkepa inkepa 4 days ago, edited 4 days ago November 24, 2020 at 8:55:27 AM UTC, edited November 24, 2020 at 8:59:19 AM UTC link Permalink

I made a small script on google colab to download + search through the Toki Pona sentences for easy-to-detect possible errors. It was fun to write :)

https://github.com/increpare/ta...ellcheck.ipynb
(it'a a notebook link - code is mixed in with results. If you scroll down you can see the reports)

I think I'll modify it to check for errors in new sentences only, so I can give it a run every Sunday and see what new things have appeared :)

Would there be any use for something like this for other languages?

{{vm.hiddenReplies[36196] ? 'expand_more' : 'expand_less'}} hide replies show replies
soliloquist soliloquist 4 days ago November 24, 2020 at 7:14:33 PM UTC link Permalink

Thank you. It's really a good idea.

> Would there be any use for something like this for other languages?

I had created some accounts to monitor spelling errors in the Turkish corpus using vocabulary items.

https://tatoeba.org/eng/sentences/show/8243466

It would be nice to have a similar script that collects the vocabulary items of those accounts (they can be added to the script manually), searches them in the Turkish corpus and reports the results.

{{vm.hiddenReplies[36197] ? 'expand_more' : 'expand_less'}} hide replies show replies
inkepa inkepa 4 days ago November 25, 2020 at 12:21:57 AM UTC link Permalink

Oh if I understand correctly: you have an account where you manually suggest spelling changes. And it would be nice to have an account that would scrape all these suggestions (or where you could enter them yourself) and then search automatically for where else in the corpus they might apply?

inkepa inkepa 4 days ago, edited 3 days ago November 25, 2020 at 2:50:49 AM UTC, edited November 25, 2020 at 4:59:20 AM UTC link Permalink

@soliloquist

I did a quick script that uses your list of replacements (it ignores ones mentioning with "*", "..." , and "/" for now [what does the * mean?] )

scroll down here to see some sample results:

https://github.com/increpare/ta...ellcheck.ipynb

How does that look to you? If it's right I can expand it to cover the other rules types also (and you could add future rules yourself using your arrow notation).

(if you click on the "open in colab" link at the top of the doc you can re-run it all yourself, though it takes a few minutes to do everything (go to runtime -> run all ) )

{{vm.hiddenReplies[36199] ? 'expand_more' : 'expand_less'}} hide replies show replies
soliloquist soliloquist 3 days ago November 25, 2020 at 10:41:04 AM UTC link Permalink

Thanks a million for adapting the script for Turkish. The reason I put asterisks on some of the vocabulary items was to include their suffixed forms. Otherwise the search finds only exact matches, without suffixed ones. For example,

herşey -> her şey

Without the asterisk, the search finds only 'herşey' but not the suffixed forms like 'herşeyi', 'herşeye', 'herşeyde' etc. So it's kind of a stemming support. Your script does a great job as is, but if you could find a way to include other forms, it would be much better.

Thank you very much again.

{{vm.hiddenReplies[36200] ? 'expand_more' : 'expand_less'}} hide replies show replies
inkepa inkepa 3 days ago November 25, 2020 at 11:49:58 AM UTC link Permalink

What's the difference between

"Haked... -> Hak ed..."

and

"* Öğe -> Öge"

then?

Oh, is there no difference? I see when I click on them they both search for "blah*"

{{vm.hiddenReplies[36201] ? 'expand_more' : 'expand_less'}} hide replies show replies
soliloquist soliloquist 3 days ago November 25, 2020 at 1:13:33 PM UTC link Permalink

> What's the difference between
> "Haked... -> Hak ed..."

The latter is the correct form. '...' means that the word certainly has a suffix so you can think of it as 'haked*'.

> and
> "* Öğe -> Öge"

Sorry for the confusion. The asterisk at the beginnig of that example has a different meaning. There are several other examples like that. You can ignore them. We have two Turkish language associations and they have differences of opinion about spelling of some words and that word is one of them. I put an asterisk at the beginning of them to indicate that. I now replaced them with ※ to avoid confusion.

Examples like,

...pde -> ...pte

...sden -> ...sten

show suffix errors. Such words end with those letters without having other suffixes. They can be searched on Tatoeba by putting an asterisk before them. (*pde, *sden etc.). If it's difficult to add them to the script, you can ignore them. Just having suffix support would be sufficient.

{{vm.hiddenReplies[36202] ? 'expand_more' : 'expand_less'}} hide replies show replies
inkepa inkepa 3 days ago November 25, 2020 at 1:59:46 PM UTC link Permalink

Ah ok! I've added support for "..." rules. There are some false positives (e.g. "...sden" matches Dresden), but that's ok I guess.

Here's the report by itself:
https://gist.github.com/increpa...719d623ebeda23

I also wasn't sure if "Haked... -> Hak ed..." should also be triggered if it sees "Haked" by itself, or if it *needs* to be a prefix of something else. Right now "hacked" itself also triggers the replacement recommendation.


(here's the notebook with all the code + generated report at the bottom -https://colab.research.google.c...ellcheck.ipynb
If you go there next week and hit run-all (takes about 5-10 minutes to run) you'll get an updated report based on an updated version of the tatoeba database (the downloadable data files get updated on a saturday).

If you make your own copy you can also add/edit rules. - they're right at the top. Just edit them and run all.

If you don't want to that's ok, but I just want to tell you everything you might need so you're not reliant on me in case I'm busy with other stuff.

If that's all too much though just drop me a line and I can give it a rerun)

{{vm.hiddenReplies[36203] ? 'expand_more' : 'expand_less'}} hide replies show replies
inkepa inkepa 3 days ago, edited 3 days ago November 25, 2020 at 2:06:11 PM UTC, edited November 25, 2020 at 2:19:01 PM UTC link Permalink

oh, I noticed there's some big group of sentences that're messed up/clogged together - it's possibly getting confused because some people use two single quotes ' for a double quotation mark "

(example
https://tatoeba.org/deu/sentences/show/7980614
- is that a common Turkish habit? :P )

I can work on fixing that...

edit: fixed, and the report/code updated.

{{vm.hiddenReplies[36204] ? 'expand_more' : 'expand_less'}} hide replies show replies
soliloquist soliloquist 3 days ago November 25, 2020 at 3:38:19 PM UTC link Permalink

> - is that a common Turkish habit? :P )

No, it's just a trivial and stylistic error. :-) But it's better to standardize them to avoid having duplicates. Thanks for picking them up.

soliloquist soliloquist 3 days ago, edited 3 days ago November 25, 2020 at 3:38:29 PM UTC, edited November 25, 2020 at 3:40:17 PM UTC link Permalink

Thanks. The script now detects the suffix errors (in which the error is in the suffix itself) nicely, but misspelled words with suffixes (in which the error is in the root, not in the suffix) are still ignored.

For example,

herşey -> her şey

It only detects #8904790, but it should also detect #9094761 and #8871066. They have the same error, but I guess it is set to search terms as 'whole words' so the suffixed forms are ignored. Many of the terms are in a similar situation. Only the ones without suffixes are found.

{{vm.hiddenReplies[36206] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba 3 days ago November 25, 2020 at 4:22:20 PM UTC link Permalink

In that case, the rule should be written as

herşey... -> her şey...

I guess. (But is there a difference between rules ending in ... and other rules, then?)

@inkepa: You might also want to take a look at Aiji's notebooks: https://tatoeba.org/eng/wall/show_message/34004

{{vm.hiddenReplies[36207] ? 'expand_more' : 'expand_less'}} hide replies show replies
inkepa inkepa 3 days ago November 25, 2020 at 4:45:43 PM UTC link Permalink

@Yorwba hey, that's cool - I didn't know about it! :)

soliloquist soliloquist 3 days ago November 25, 2020 at 5:45:48 PM UTC link Permalink

> In that case, the rule should be written as
> herşey... -> her şey...

Yes. Thanks for clarifying.

> But is there a difference between rules ending in ... and other rules, then?

No, not really, as long as ... is placed correctly.

inkepa inkepa 3 days ago, edited 3 days ago November 25, 2020 at 4:43:02 PM UTC, edited November 25, 2020 at 4:46:06 PM UTC link Permalink

@soliloquist Oh is it really that all rules "A -> B" are secretly "A... -> B..." ?

{{vm.hiddenReplies[36208] ? 'expand_more' : 'expand_less'}} hide replies show replies
soliloquist soliloquist 3 days ago, edited 3 days ago November 25, 2020 at 5:46:36 PM UTC, edited November 25, 2020 at 6:24:59 PM UTC link Permalink

That's it! Here's the new list. I hope it works this time.

...çca -> ...çça
...çce -> ...ççe
...çci -> ...ççi
...çcı -> ...ççı
...çcu -> ...ççu
...çcü -> ...ççü
...çda -> ...çta
...çdan -> ...çtan
...çde -> ...çte
...çden -> ...çten
...fca -> ...fça
...fce -> ...fçe
...fci -> ...fçi
...fcı -> ...fçı
...fcu -> ...fçu
...fcü -> ...fçü
...fda -> ...fta
...fdan -> ...ftan
...fde -> ...fte
...fden -> ...ften
...hca -> ...hça
...hce -> ...hçe
...hci -> ...hçi
...hcı -> ...hçı
...hcu -> ...hçu
...hcü -> ...hçü
...hda -> ...hta
...hdan -> ...htan
...hde -> ...hte
...hden -> ...hten
...kca -> ...kça
...kce -> ...kçe
...kci -> ...kçi
...kcı -> ...kçı
...kcu -> ...kçu
...kcü -> ...kçü
...kda -> ...kta
...kdan -> ...ktan
...kde -> ...kte
...kden -> ...kten
...pca -> ...pça
...pce -> ...pçe
...pci -> ...pçi
...pcı -> ...pçı
...pcu -> ...pçu
...pcü -> ...pçü
...pda -> ...pta
...pdan -> ...ptan
...pde -> ...pte
...pden -> ...pten
...sca -> ...sça
...sce -> ...sçe
...sci -> ...sçi
...scı -> ...sçı
...scu -> ...sçu
...scü -> ...sçü
...sda -> ...sta
...sdan -> ...stan
...sde -> ...ste
...sden -> ...sten
...şca -> ...şça
...şce -> ...şçe
...şci -> ...şçi
...şcı -> ...şçı
...şcu -> ...şçu
...şcü -> ...şçü
...şda -> ...şta
...şdan -> ...ştan
...şde -> ...şte
...şden -> ...şten
acenta... -> acente...
aç gözlü... -> açgözlü...
açıkca... -> açıkça...
adele... -> adale...
afedersin... -> affedersin...
aksesuvar... -> aksesuar...
aktrist... -> aktris...
akıl almaz... -> akılalmaz...
Alaman... -> Alman...
allerji... -> alerji...
alt üst... -> altüst...
alış veriş... -> alışveriş...
aliminyum... -> alüminyum...
ambülans... -> ambulans
ampül... -> ampul...
ana okulu... -> anaokulu...
antreman... -> antrenman...
apandist... -> apandisit...
aperatif... -> aperitif...
Arabça... -> Arapça...
arasıra... -> ara sıra...
ardarda... -> art arda...
Arjentin... -> Arjantin...
atmış... -> altmış... (çoğu false positive)
avusturalya... -> avustralya...
ayırım... -> ayrım...
Azarbaycan... -> Azerbaycan
Azarbeycan... -> Azerbaycan
Azerbeycan... -> Azerbaycan
banço... -> banjo...
başbaşa... -> baş başa...
başı boş... -> başıboş...
belkide... -> belki de...
benle... -> benimle...
beysbol... -> beyzbol...
bir kaç... -> birkaç...
bir çok... -> birçok...
birarada... -> bir arada...
birden bire... -> birdenbire...
Biritan... -> Britan...
birsürü... -> bir sürü...
Brazilya... -> Brezilya...
bu gün... -> bugün...
bugünki... -> bugünkü...
bulüz... -> bluz...
burda... -> burada...
buyrun... -> buyurun...
büfte... -> bifte...
büyük anne... -> büyükanne...
büyük baba... -> büyükbaba...
can kurtaran... -> cankurtaran...
cimnastik... -> jimnastik...
çarşanba... -> çarşamba...
çeki düzen... -> çekidüzen...
çokaz... -> çok az...
çoşku... -> coşku...
dahada... -> daha da...
deniz aşırı... -> denizaşırı...
değilmi... -> değil mi...
deyer... -> değer...
deyme... -> değme...
doğumgünü... -> doğum günü...
döküman... -> doküman...
döğdü... -> dövdü...
döğüş... -> dövüş...
dünki... -> dünkü...
düz taban... -> düztaban...
eczahane... -> eczane...
eksoz... -> egzoz...
elele... -> el ele...
entellektüel... -> entelektüel...
eposta... -> e-posta...
eylence... -> eğlence...
eylenmek... -> eğlenmek...
fantazi... -> fantezi...
farked... -> fark ed...
farket... -> fark et...
farzed... -> farz ed...
farzet... -> farz et...
Fırans... -> Frans...
filim... -> film...
fonksyon... -> fonksiyon...
fotograf... -> fotoğraf...
gardrop... -> gardırop...
gurup... -> grup...
gök kuşağı... -> gökkuşağı...
gök yüzü... -> gökyüzü...
göz yaşı... -> gözyaşı...
gözardı... -> göz ardı...
gözkulak... -> göz kulak...
gözüpek... -> gözü pek...
haftasonu... -> hafta sonu...
haked... -> hak ed...
haket... -> hak et...
hakket... -> hak et...
halbu ki... > halbuki...
hastahane... -> hastane...
hava alanı... -> havaalanı...
hava limanı... -> havalimanı...
hem fikir... -> hemfikir...
hemde... -> hem de...
her hangi... -> herhangi...
hergün... -> her gün...
herkez... -> herkes...
herne... -> her ne...
heryer... -> her yer...
herzaman... -> her zaman...
herşey... -> her şey...
hiç bir... -> hiçbir...
hiçkimse... -> hiç kimse...
hiçte... -> hiç de...
humani... -> hümani...
ısraf... -> israf...
iki yüzlü... -> ikiyüzlü...
ilk okul... -> ilkokul...
insan oğlu... -> insanoğlu...
insiyatif... -> inisiyatif...
israr... -> ısrar...
istakoz... -> ıstakoz...
istambul... -> istanbul...
itibariyle... -> itibarıyla...
iyiki... -> iyi ki...
kamu oyu... -> kamuoyu...
kapşon... -> kapüşon...
kareografi... -> koreografi...
kaysı... -> kayısı...
klavuz... -> kılavuz...
klüp... -> kulüp...
kolleksiyon... -> koleksiyon...
kominist... -> komünist...
kompartman... -> kompartıman...
koperatif... -> kooperatif...
kılınç... -> kılıç...
kıral... -> kral...
kıraliyet... -> kraliyet...
kıraliçe... -> kraliçe...
kırallık... -> krallık...
kızarkadaş... -> kız arkadaş...
kızkardeş... -> kız kardeş...
Kürdçe... -> Kürtçe...
labaratuar... -> laboratuvar...
labaratuvar... -> laboratuvar...
madem ki... -> mademki...
mahçup... -> mahcup...
makina... -> makine...
malesef... -> maalesef...
malolmak... -> mal olmak...
Marry... -> Mary...
Mary'e... -> Mary'ye...
Mary'i... -> Mary'yi...
Mary'le... -> Mary'yle...
matamatik... -> matematik...
menejer... -> menajer...
metod... -> metot...
meyva... -> meyve...
meşkul... -> meşgul...
motorsiklet... -> motosiklet...
müdahele... -> müdahale...
müsade... -> müsaade...
müstehak... -> müstahak...
mütevazi... -> mütevazı...
nerde... -> nerede...
neyseki... -> neyse ki...
nufus... -> nüfus...
okur yazar... -> okuryazar...
onaltı... -> on altı...
onbeş... -> on beş...
onbir... -> on bir...
ondokuz... -> on dokuz...
ondört... -> on dört...
oniki... -> on iki...
onla... -> onunla...
onsekiz... -> on sekiz...
onyedi... -> on yedi...
onüç... -> on üç...
orda... -> orada...
orjinal... -> orijinal...
orta okul... -> ortaokul...
oysa ki... -> oysaki...
pantalon... -> pantolon...
parlemento... -> parlamento...
pastahane... -> pastane...
pekaz... -> pek az...
pekçok... -> pek çok...
penbe... -> pembe...
perşenbe... -> perşembe...
peşpeşe... -> peş peşe...
pilaj... -> plaj...
postahane... -> postane...
proğram... -> program...
rasgele... -> rastgele...
rasgelm... -> rast gelm...
raslantı... -> rastlantı...
sarfed... -> sarf ed...
sarfet... -> sarf et...
sarmısak... -> sarımsak...
senle... -> seninle...
sivri sinek... -> sivrisinek...
sohpet... -> sohbet...
sueter... -> süveter...
süpriz... -> sürpriz...
şöför... -> şoför...
tabi ki... -> tabii ki...
taktir... -> takdir...
terked... -> terk ed...
terket... -> terk et...
tesbih... -> tespih...
tesbit... -> tespit...
traş... -> tıraş...
Türküye... -> Türkiye...
umru... -> umuru...
ünüversite... -> üniversite...
ünvan... -> unvan...
vaz geç... -> vazgeç...
yada... -> ya da...
yanlız... -> yalnız...
yanyana... -> yan yana...
yanısıra... -> yanı sıra...
yinede... -> yine de...
yüksek okul... -> yüksekokul...
yüz ölçüm... -> yüzölçüm...
zıttı... -> zıddı...

{{vm.hiddenReplies[36211] ? 'expand_more' : 'expand_less'}} hide replies show replies
inkepa inkepa 3 days ago, edited 3 days ago November 25, 2020 at 6:28:24 PM UTC, edited November 25, 2020 at 6:28:33 PM UTC link Permalink

Your new list had a *lot* of false positives because of
"onla... -> onunla..." triggering on "onlar"/"onlarla"/etc. :P

( https://gist.github.com/increpa...61871880cff504 if you really want to see)

So I removed the "..." from the onla rule, which gives

https://gist.github.com/increpa...829039a9c1f5f5

How's that look?

{{vm.hiddenReplies[36213] ? 'expand_more' : 'expand_less'}} hide replies show replies
soliloquist soliloquist 3 days ago, edited 3 days ago November 25, 2020 at 6:31:45 PM UTC, edited November 25, 2020 at 6:36:12 PM UTC link Permalink

You're right. We should remove ... from it. That would give a lot of false positives.

It should only include exact matches.

onla -> onunla

(without ...)

soliloquist soliloquist 3 days ago November 25, 2020 at 6:35:23 PM UTC link Permalink

The suffixes now seem to be handled well. #8904790, #9094761 and #8871066 are all listed. Thank you very much.

{{vm.hiddenReplies[36215] ? 'expand_more' : 'expand_less'}} hide replies show replies
inkepa inkepa 3 days ago November 25, 2020 at 7:49:05 PM UTC link Permalink

My pleasure :)

belkacem77 belkacem77 yesterday November 27, 2020 at 6:37:10 PM UTC link Permalink

I wrote a script with Python to detect some syntactic errors (affixes, bad letters, punctuation..). It's a basic script designed only for Kabyle I shared on github

{{vm.hiddenReplies[36218] ? 'expand_more' : 'expand_less'}} hide replies show replies
Igider Igider yesterday November 27, 2020 at 7:03:00 PM UTC link Permalink

Great! I look forward to that.
It would be very helpful for the Kabyle language as well as for other languages.
@Keep going @belkacem77

fjay69 fjay69 13 days ago November 15, 2020 at 8:17:15 AM UTC link Permalink

We have over 200 thousands Japanese sentences now!
現在、20万以上の日本語の文があります!

{{vm.hiddenReplies[36141] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pfirsichbaeumchen Pfirsichbaeumchen 13 days ago, edited 13 days ago November 15, 2020 at 8:24:11 AM UTC, edited November 15, 2020 at 8:26:34 AM UTC link Permalink

A reason for great joy. Hooray! おめでとうございます。大喜びですね😊

Igider Igider 13 days ago, edited 13 days ago November 15, 2020 at 9:04:00 AM UTC, edited November 15, 2020 at 9:04:25 AM UTC link Permalink

Congrats!! よくやった!Ayyuz!

Hybrid Hybrid 13 days ago November 15, 2020 at 12:53:46 PM UTC link Permalink

Congratulations to everyone!

small_snow small_snow 13 days ago November 15, 2020 at 10:16:26 PM UTC link Permalink

I see; https://tatoeba.org/jpn/sentenc...&sort_reverse=
✨🎉

CK CK 13 days ago, edited 13 days ago November 16, 2020 at 12:40:43 AM UTC, edited November 16, 2020 at 12:41:09 AM UTC link Permalink

Of these Japanese sentences, ...

67,731 are owned by the 51 native Japanese speakers.

117,460 are "orphans" (sentences with no owners)

The rest are owned by non-native Japanese speakers.

For those who are seriously studying Japanese, I would recommend that you only trust the ones owned by native Japanese speakers.

{{vm.hiddenReplies[36147] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez 13 days ago, edited 13 days ago November 16, 2020 at 1:20:23 AM UTC, edited November 16, 2020 at 3:08:54 AM UTC link Permalink

(redacted)

It'd be great if we could get more regular Japanese speakers on board with this project

I appreciate all the work that bunbuku and small_snow have done in maintaining the Japanese corpus, and feel like they deserve more help

Edit: forgot to mention tommy_san and I’m probably forgetting others

{{vm.hiddenReplies[36148] ? 'expand_more' : 'expand_less'}} hide replies show replies
bill bill 13 days ago November 16, 2020 at 1:27:59 AM UTC link Permalink

No one in their right mind would do that. You would have to have a lot of free time to go over 117,460 sentences.

{{vm.hiddenReplies[36149] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez 13 days ago November 16, 2020 at 1:32:59 AM UTC link Permalink

I worded it wrong, sorry

{{vm.hiddenReplies[36150] ? 'expand_more' : 'expand_less'}} hide replies show replies
bill bill 13 days ago November 16, 2020 at 1:36:04 AM UTC link Permalink

It happens. 👍

{{vm.hiddenReplies[36151] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK 13 days ago, edited 13 days ago November 16, 2020 at 1:53:52 AM UTC, edited November 16, 2020 at 2:18:41 AM UTC link Permalink

Until we run out of native-owned Japanese sentences for you to translate, I don't think you need to pressure anyone into proofreading all these.

I know how boring and tedious this can be. I proofread all the English sentences in the Tanaka Corpus up to a given length and adopted the ones that I thought sounded good and natural, editing the ones that could easily be edited, since they weren't already linked to various languages, which complicates editing. I ignored the long, multi-sentence examples, thinking I would eventually proofread those later.

Here are some sentences you could translate owned by 3 of our Japanese contributors with lots of sentences.

https://tatoeba.org/eng/sentenc...n&user=bunbuku
bunbuku's Japanese sentences with no translations.

https://tatoeba.org/eng/sentenc...ser=small_snow
small_snow's Japanese sentences with no translations

https://tatoeba.org/eng/sentenc...user=tommy_san
tommy_san's Japanese sentences with no translations

{{vm.hiddenReplies[36152] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez 12 days ago November 16, 2020 at 5:16:22 AM UTC link Permalink

I mentioned it because I originally came here from Clozemaster, and the orphan sentences turned up more often than not in the Japanese section, and some of them made no sense to me
I’m not worried about running out of sentences to translate

{{vm.hiddenReplies[36153] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo 12 days ago, edited 12 days ago November 16, 2020 at 11:02:16 AM UTC, edited November 16, 2020 at 11:05:06 AM UTC link Permalink

Clozemaster is not enough good for learning a language.
iKnow has better sentence examples for words in order: https://iknow.jp/content/japanese

I use clozemaster. I have an extension to mark words.
The 'English from Magyar' is not only contains translations of Hungarian, but translations of translations, and later if the sentence is changed in Tatoeba, in clozemaster it remains unchanged.

{{vm.hiddenReplies[36155] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 12 days ago November 16, 2020 at 6:04:11 PM UTC link Permalink

I think the quality of Clozemaster varies from language pair to language pair. The English->Russian pair is good, and has helped me quite a bit, but there are other pairs that have the problems you describe.

Thanuir Thanuir 12 days ago November 16, 2020 at 7:27:30 PM UTC link Permalink

Norjan oppiminen on samanlaista hupia; tietokantaa ei ole päivitetty Tatoebasta pitkään aikaan. Toisaalta minulla on tarpeeksi esitietoja, että huomaan aika monet kirjoitusvirheet eikä se ole ainoa menetelmä, jota käytän.

DJ_Saidez DJ_Saidez 12 days ago, edited 12 days ago November 17, 2020 at 4:13:10 AM UTC, edited November 17, 2020 at 4:13:35 AM UTC link Permalink

I've stopped using Clozemaster months ago and switched to Anki today using confirmed sentences from here
I also have a wide array of resources for Japanese

I just mentioned them because I found out about this through there.

maaster maaster 6 days ago, edited 6 days ago November 22, 2020 at 7:09:03 AM UTC, edited November 22, 2020 at 9:49:10 AM UTC link Permalink

Congratulations!

(In the meantime, the Kabyle competitors also passed a mile stone as a rocket – 300,000 sentences!)

{{vm.hiddenReplies[36187] ? 'expand_more' : 'expand_less'}} hide replies show replies
K_hina K_hina 6 days ago November 22, 2020 at 4:10:53 PM UTC link Permalink

Thank you @maaster.

Selyan Selyan 5 days ago November 23, 2020 at 11:18:36 AM UTC link Permalink

@maaster
Thank you

Igider Igider 5 days ago November 23, 2020 at 11:32:16 AM UTC link Permalink

Thank you @Maaster.
Regards.

bill bill 6 days ago, edited 6 days ago November 22, 2020 at 2:56:11 PM UTC, edited November 22, 2020 at 2:59:44 PM UTC link Permalink

Is this a bot account? https://tatoeba.org/eng/user/profile/Bilgi

I don't know about you guys, but I think new users shouldn't be allowed to add so many sentences at once.

{{vm.hiddenReplies[36189] ? 'expand_more' : 'expand_less'}} hide replies show replies
soliloquist soliloquist 6 days ago, edited 6 days ago November 22, 2020 at 3:16:24 PM UTC, edited November 22, 2020 at 3:50:17 PM UTC link Permalink

>I think new users shouldn't be allowed to add so many sentences at once.

It belongs to me. I could have added those sentences with my main account, but I preffered to create a separate one that is dedicated to that content. I have some other "special purpose" accounts, too. Since I use the same IP address, anyone that has access to IP logs can see that it's me. Maybe I should slow down the speed a bit. Sorry for the inconvenience.

{{vm.hiddenReplies[36190] ? 'expand_more' : 'expand_less'}} hide replies show replies
bill bill 6 days ago November 22, 2020 at 3:38:02 PM UTC link Permalink

Thanks for replying, soliloquist. I thought it was a troll spamming the Turkish Corpus.

{{vm.hiddenReplies[36191] ? 'expand_more' : 'expand_less'}} hide replies show replies
soliloquist soliloquist 6 days ago November 22, 2020 at 3:39:03 PM UTC link Permalink

Thanks for your concern.

CK CK 7 days ago November 22, 2020 at 3:56:35 AM UTC link Permalink

** Updated Projects **

http://www.manythings.org/anki/
Tab-delimited Bilingual Sentence Pairs

http://www.manythings.org/bilingual/
Bilingual Sentence Pairs - Selected Sentences from the Tatoeba Corpus

** Related Stats **

http://tatoeba.ueuo.com/stats-2...907-links.html
Native-speaker-owned sentences in each language that are linked to English sentences on List 907

http://tatoeba.ueuo.com/stats-2...ed-to-907.html
Counts by Username of Native Speaker-Owned Sentences Linked to Sentences on List 907


Thanks to all of you who help make these projects possible.

fjay69 fjay69 10 days ago, edited 10 days ago November 18, 2020 at 2:33:17 PM UTC, edited November 18, 2020 at 11:32:55 PM UTC link Permalink

Found a search engine bug.
The result of searching "sweep" does not include sentences with "swept".
Can anyone make a proper issue on GitHub?

{{vm.hiddenReplies[36167] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo 10 days ago November 18, 2020 at 5:02:41 PM UTC link Permalink

Is it a bug? I don't think so. Just one of the irregular verbs.
Do you see 'saw' or 'seen' if you search for word 'see'?

{{vm.hiddenReplies[36168] ? 'expand_more' : 'expand_less'}} hide replies show replies
fjay69 fjay69 10 days ago November 18, 2020 at 5:10:46 PM UTC link Permalink

I had no idea that it doesn't work with irregular verbs. Forgive me please.

{{vm.hiddenReplies[36169] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo 10 days ago November 18, 2020 at 6:02:33 PM UTC link Permalink

Forgive for what?
You had an assumption that this is a bug. I would say, it is not a bug. I'm not a programmer or something, I just say what I've seen.

{{vm.hiddenReplies[36170] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 9 days ago, edited 9 days ago November 19, 2020 at 2:22:25 PM UTC, edited November 19, 2020 at 2:24:17 PM UTC link Permalink

To elaborate: We're talking about inexact ("fuzzy") matches that involve stemming, a feature supported for about two dozen languages, which are listed on this page:

https://en.wiki.tatoeba.org/art...w/text-search#

The stemmer works by removing common suffixes from both the search words and the words contained in sentences to see whether the remaining parts of the words match. For instance, if you remove "-ing" from "walking" and "-ed" from "walked", the results match (and also match the bare form "walk").

There are two things to know about the stemmer:

(1) It doesn't belong to us, so the best we can do is pass along reports, and I don't think we even do that for individual words.
(2) It is not supposed to be 100% accurate. We expect to see both false positives (for example, "sing" matches "singe") and false negatives ("sweep" doesn't match "swept", as you saw).

You can explicitly combine word forms with a vertical bar in order to compensate for false negatives, and you can use the equals sign to specify an exact match in order to compensate for false positives. Note that forms recognized by the stemmer do not necessarily have to be verbs. For instance, "beauty" will match "beautiful".

CK CK 10 days ago November 18, 2020 at 11:33:16 PM UTC link Permalink

For irregular verbs, you need to search like this.

sleep|slept
swim|swam|swum

Here is a page set up with all forms of the English verbs included in the search links.

http://tatoeba.ueuo.com/eng-verbs.html

{{vm.hiddenReplies[36172] ? 'expand_more' : 'expand_less'}} hide replies show replies
belkacem77 belkacem77 7 days ago, edited 7 days ago November 21, 2020 at 1:28:55 PM UTC, edited November 21, 2020 at 5:31:24 PM UTC link Permalink

Stemming in Kabyle is a complex process because of its morphology.

nouns and adjectives in kabyle have a state : free state and annexed state. So, there is a change at the beginning of the word following the state.

Verbs also are very inflectional, so it's hard to find the stem from a flexion.

We are working on a stemmer and a lemmatizer for Kabyle language. We will share the work when done

{{vm.hiddenReplies[36179] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 7 days ago November 21, 2020 at 3:33:38 PM UTC link Permalink

Interesting. Have you been in touch with the people at Manticore Search, who produce our search engine?

{{vm.hiddenReplies[36180] ? 'expand_more' : 'expand_less'}} hide replies show replies
belkacem77 belkacem77 7 days ago November 21, 2020 at 5:29:11 PM UTC link Permalink

I've just heard about it. Is there any link, any contact information?

{{vm.hiddenReplies[36181] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba 7 days ago November 21, 2020 at 7:12:34 PM UTC link Permalink

Manticore Search's stemming system is actually based on Snowball https://snowballstem.org/ , which is a specialized programming language for writing stemmers as well as a collection of stemmers for various languages.

AFAIK, Kabyle is an Afroasiatic language distantly related to Arabic, so you might want to have a look at Snowball's Arabic stemmer: https://github.com/snowballstem...hms/arabic.sbl

As for sharing your work, see their CONTRIBUTING.rst file https://github.com/snowballstem...NTRIBUTING.rst and maybe also have a look at what it took to get some other languages added (the most recent addition being Yiddish: https://github.com/snowballstem/snowball/pull/137 )

{{vm.hiddenReplies[36182] ? 'expand_more' : 'expand_less'}} hide replies show replies
belkacem77 belkacem77 7 days ago, edited 7 days ago November 21, 2020 at 7:23:36 PM UTC, edited November 21, 2020 at 7:24:14 PM UTC link Permalink

Thank you @Yorwba for sharing. I'll see this.
Indeed, Kabyle belongs to the Berber family, the same branch as Arabic, Hebrew, Amharic, . the derivation and inflectional systems are almost similar. But the script is different. Kabyle is written with Latin script.
I will see the project.

alemfarid alemfarid 8 days ago November 20, 2020 at 9:14:46 PM UTC link Permalink

Congratulations to the Kabyle team for the work done.

{{vm.hiddenReplies[36174] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK 8 days ago November 21, 2020 at 12:20:43 AM UTC link Permalink

It would be nice if these had translations

https://tatoeba.org/eng/sentenc...filter=exclude
Kabyle sentences without translations = 252,055

https://tatoeba.org/eng/sentenc...s_filter=limit
Kabyle sentences with translations = 86,876

{{vm.hiddenReplies[36176] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 7 days ago November 21, 2020 at 7:42:14 AM UTC link Permalink

Indeed, but that is up to non-natives to do, mostly, since it is encouraged for people to only or mostly translate to their strongest language.

(Some of the short sentences are without punctuation. Maybe they should have some.)

Igider Igider 7 days ago, edited 7 days ago November 21, 2020 at 9:05:34 AM UTC, edited November 21, 2020 at 9:06:23 AM UTC link Permalink

Certainly, I'll try to translate most of them every day.
Thank you for reminding me that. ;-)