clear
swap_horiz
search

Tips

Here you can ask general questions like how to use Tatoeba, report bugs or strange behavior, or simply socialize with the rest of the community.

Before asking a question, make sure to read the FAQ.

Wall (4605 threads)

keyboard_arrow_left 1234567...461
puzz
2017-01-04 20:48 - 2017-01-04 20:49
Hi all,

I discovered Tatoeba a few months ago and I really like the idea of this project (and I also try to be a regular contributor for Croatian).

I'm a developer and I played with the idea to use the Tatoeba database for a language learning app. After a few experiments I decided to implement it as an Android app.

So, just in case somebody wants to try it (and maybe give me some feedback), the application is available here: https://play.google.com/store/a...10000sentences

I know the design could be better, but I worked on this project alone and I'm no designer. At least I tried hard to make it simple and intuitive to use. There are also other areas where I know it needs some improvements (for example the stats "page"), but it will come with time.

Hope you like it.

PS. Of course, the application is free and open source. The source can be downloaded here: https://github.com/tkrajina/10000sentences
hide replies
halfdan
2017-01-05 13:01
There's already an app / website that does something very similar: https://www.clozemaster.com/

Yours looks very nice though!
hide replies
puzz
2017-01-05 17:52 - 2017-01-05 19:30
Thanks!

Yes, I'm aware about Clozemaster. I saw the announcement on reddit. At the time I already had a proof of concept for my project. It was a web app but I decided to discontinue, because it relied on the idea that people will upload voice recordings and synchronise it with texts. And getting people to do that was too time-consuming (I'm not good at community building).

After a user uploaded a recording, there were a couple of quizzes for each recording/text. One of them was this "guess next word". I still have the website running here: https://130.211.134.78/exercise...16586728765022 (this example is in Armenian). But I'll probably shut it down (or open-source) sooner or later.

Later I realised there is already this cool community (Tatoeba) and that "guess the word" could be a nice small Android app, so I simplified the idea and rewrote it, and this is the result.
TRANG
2017-01-05 20:19
I'm always happy to see people reuse Tatoeba's data :)

Nice to know your app is open source as well!
captcrouton
2017-01-09 11:35
I downloaded 10,000 sentences and it has a nice addictive quality about it. I've done about 60 sentences in Ukrainian, which I'm learning.

I'm not sure what the read sentence button is supposed to do. Does it give you the audio if it's available? That would be helpful but as far as I can see, none of my sentences had that.

Keep up the good work. I'll let you know when I hit my first thousand.
hide replies
puzz
2017-01-09 11:53
Yes, the "Read sentence" should read that sentence. But it depends whether your Android phone has the TTS package for that language. If not -- there are some external apps which provide for languages missing on most Android phones (you can search the google play store). I use Acapella TTS, but I can't see Ukrainian, there :(

If your phone *has* Ukranian TTS but you still have no voice, maybe it's just that the media volume is too low. When you use the volume buttons by default Android changes the "Notifications" volume. You need to change "Media" volume.

Thanks for the feedback! And, yes, I'd be happy if you let me know when you reach the first 1000 (I'm about to do that today for one of my (two) languages).
hide replies
captcrouton
2017-01-13 12:46
I really enjoy this app. I'm about 350 sentences in. I teach English to Ukrainian children and was wondering about producing a kid friendly version that I could infuse with vocabulary from our lessons. I don't know anything about development. Is this the right venue to have this conversation?

One thought I had while I was playing. It would be nice to be able to export useful sentences into a personal phrasebook. I haven't fully explored all the features. Is that something that's already there?

Again, it's a great app when I'm waiting for a bus or an elevator (or for my wife to get ready)

Have a great day.

hide replies
Selena777
2017-01-15 08:03
Hello.
Yes, you can add sententes to your personal favourite list (there is an icon in the shape of a heart above every sentence). Also you can create lists of sentences, private or public (for example, "English sentences which are appropriate for young kids", "Ukrainian beginner's level sentences" and so on.
deniko
2017-01-18 17:12
Hi captcrouton,

I'm not 100% convinced using tatoeba's sentences to teach kids is a very good idea. I translated a number of sentences I wouldn't want my - or any other children - to see.
hide replies
captcrouton
an hour ago
Deniko,

Yes, I've discovered this using this app. I guess I've only used Tatoeba for examples of words I was learning or teaching. After reading a random pool of about 1500 sentences, I definitely won't be bringing Tatoeba into the classroom. It still is a great resource though.
Ricardo14
yesterday
** Orphan sentences **

According to https://tatoeba.org/eng/activit...pt_sentences/, there are 189,407 "orphan" sentences on Tatoeba.

Perhaps we can try to "adopt" some of them, correct or delete (if necessary).
Some of them are not even sentences.

Orphan sentences in English - 48,236 sentences - https://tatoeba.org/eng/activit..._sentences/eng

Orphan sentences in French - 4,757 sentences - https://tatoeba.org/eng/activit..._sentences/fra

Orphan sentences in Ukranian - 1,547 sentences - https://tatoeba.org/eng/activit..._sentences/ukr

Orphan sentences in Spanish - 907 sentences - https://tatoeba.org/eng/activit..._sentences/spa

Orphan sentences in Hungarian - 506 sentences - https://tatoeba.org/eng/activit..._sentences/hun


Orphan sentences in Russian - 89 sentences - https://tatoeba.org/eng/activit..._sentences/rus

Orphan sentences in Turkish - 19 sentences - https://tatoeba.org/eng/activit..._sentences/tur

Orphan sentences in German - 1 sentence - https://tatoeba.org/eng/activit..._sentences/deu

Orphan sentences in Esperanto - 1 sentence -https://tatoeba.org/eng/activit..._sentences/epo

*****

maybe not even possible (?) to solve

Orphan sentences in Japanese - 123,240 sentences - https://tatoeba.org/eng/activit..._sentences/jpn

Orphan sentences in Korean - 348 sentences - https://tatoeba.org/eng/activit..._sentences/kor
hide replies
Aiji
23 hours ago
I am behind you on this matter at 100%.
For French (and English and Japanese), the difficulty comes from the "historical" sentences, the jpn<>eng<>fra set where there are MANY wrong sentences to correct or unlink. But the French sentences are currently being taken care of (by me).

There is the same thing with the @needs native check (but for French, it is almost finished :) )
hide replies
Ricardo14
18 hours ago
I have finished the Portuguese and I'm going to finish the Spanish sentences tagged as @NNC. :D Then I'll take a look on a couple of others

> the difficulty comes from the "historical" sentences

I see. I wish there were a solution for that. We can't delete them because they may be right (they might be said on the streets, for example).
Aiji
22 hours ago
[BUG]
When browsing sentences in advanced research, there is the icon for the audio for all the sentences (main, direct translation, indirect translation). If I link one sentence, all the icons for translations disappear.
Pfirsichbaeumchen
4 days ago
Bontenanto por la pola frazaro (Korpuspfleger für das Polnische, Corpus Maintainer for Polish)

Jeedrek: https://tatoeba.org/user/profile/jeedrek.

[EPO] Jeedrek kandidatas por iĝi bontenanto de la pola frazaro. Ni petas, ke vi ne hezitu skribi vian opinion per la suba ligilo. La ĉefa tasko de bontenanto estas korekti laŭ niaj reguloj erarajn frazojn de anoj, kiuj estas ne plu aktivaj.

[DEU] Jeedrek kandidiert als Korpuspfleger für das Polnische. Bitte zögert nicht, uns mit Hilfe der unten angegebenen Verknüpfung eure Meinung mitzuteilen. Die Aufgabe eines Korpuspflegers besteht hauptsächlich darin, im Rahmen unserer Regeln fehlerhafte Sätze nicht mehr aktiver Mitglieder zu korrigieren.

[ENG] Jeedrek is applying to become a corpus maintainer for Polish. Please don't hesitate to use the link below to tell us your opinion. The main responsibility of a corpus maintainers is to correct, in compliance with our rules, erroneous sentences of members that are no longer active.

Skribu al ni mesaĝon (schreibt uns eine Nachricht, send us a message): http://tatoeba.org/private_mess...sichbaeumchen.
hide replies
Pfirsichbaeumchen
yesterday
Jeedrek nun estas frazara bontenanto/ist jetzt Korpuspfleger/is now a corpus maintainer.
hide replies
Ricardo14
yesterday
Congratulations, Jeedrek! Powodzenia!
Wezel
yesterday
Congrats! I’m very glad that at last someone can clean up the Polish corpus.
herrsilen
yesterday
Congratulations!
mraz
yesterday
Kedves Jeedrek!
Gratulálok, jó egészséget és sikeres munkát kívánok!
Üdvözlettel. mraz
jeedrek
yesterday - yesterday
Thank you all!
I am happy to help improve Tatoeba's quality. There's a lot to do in the Polish corpus!
deyta
3 days ago - 3 days ago
For OsoHombre's sentences, +18 warning is required.

Osohombre's sentences contain extreme sex, death, suicide, violence and even religion.
When I translate my psychology is breaking down.

OsoHombre'nin cümleleri için +18 ve hatta +21 uyarısı konmalı.
OsoHombre'nin cümleleri aşırı düzeyde seks, ölüm, intihar, şiddet ve hatta din içeriyor.

çeviri yaparken psikolojim bozuluyor.
hide replies
OsoHombre
3 days ago
With all due respect, you're completely mistaken about that. You're free to translate whatever you want and leave whatever you want. No one is obliging you to translate my sentences. What I'm doing is translation, not +18 stuff and blasphemy.
OsoHombre
3 days ago
> When I translate my psychology is breaking down.

Take antidepressants and go take some fresh air. That would be better for you, I think. Let normal people do a normal job on a normal website.
hide replies
OsoHombre
3 days ago
Although I'm not supposed to talk about my personal life here, I inform you that I am an observant Muslim, a family man and a father, yet this doesn't prevent me from writing about such things. So please don't make things seem what they are not.
hide replies
deyta
3 days ago
Mutual discussion is not allowed here.

You do not have to show strength.
We are not at war.

I am an ordinary person.
There are many people who visit this site.

I just expressed my personal opinion.
Somebody had to say something.

This is not "Urban Dictonary".

I suppose you added 100.000 English sentences.
I would not want them all to be that kind.

At least it should be a more effective warning system.
Thanks.
----------------------------------------------------------

burada karşılıklı tartışmaya izin verilmiyor.

güç gösterisi yapmanıza gerek yok.
savaşta da değiliz.

ben sıradan bir insanım.
bu siteyi ziyaret eden bir çok insan var.

ben sadece kişisel düşüncemi ifade ettim.
birisinin bir şeyler söylemesi gerekiyordu.

burası "Urban Dictonary" değil.

100 bin tane ingilizce cümle eklediğinizi düşünelim.
hepsinin o türde olmasını istemezdim.

en azından daha etkili bir uyarı sistemi olmalı.
teşekkürler.
OsoHombre
3 days ago
To all Muslims on Tatoeba, let me clarify some things here:

First of all, I'm an observant Muslim myself. I pray five times a day, and I observe Ramadan. I'm not crazy, not a drug addict. I neither smoke nor do I drink alcohol, so please read this carefully:

1- It is not illicit (haram) to write or translate example sentences like some of those I've posted on this website. Why? Because Prophet Muhammad (peace be upon him) said that "actions are with intentions" (إنّما الأعمال بالنّيات), therefore if I give an example like 'Cocain is good', this doesn't necessarily means that I'm a cocain addict and that I'm praising the "virtues" of cocain and encouraging people to take cocain. OK?

2- Besides, my intention on this website is to seek knowledge and spread it, not to encourage people to be criminals, drug addicts, pedophiles, abusers, gamblers, rapists, perverts, fornicators, or adulterers. Publishing sentences that are related to crimes or major sins (كبائر) doesn't mean that I do what I write about (my personal life doesn't have anything to do with anything I publish) and my intentions are just to show people how to express or translate information from one language into another. How is an Arabic learner or translator supposed to know how coke, meth, dope or necrophilia are called or how to be used in different contexts if we don't give examples sentences about them?

3- Don't be like those people to whom everything is illicit and taboo. You can be a Muslim and talk about sins without necessarily doing them or feeling guilty of talking about them. I think that our intention here on Tatoeba is not to promote anything except translation. Therefore (and this is a message to all Muslim youth), please don't confuse talking about something and doing it.

So please don't misundestand, don't misinterpret, don't extrapolate, and don't exaggerate things. In addition to that, don't involve me personally, as a simple contributor, into the stuff I write about, no matter how 'horrible' or 'disgusting' or 'shocking' it may be. If I'm a journalist and I'm writing about how a psychopathic killer cut the body of a child into pieces, this doesn't mean that I think like the psychopath that did it or that I did it myself. Please come back to your reason.

Thank you and Allah (God) bless you
hide replies
Aiji
2 days ago
Religion has nothing to do with the work we are doing here. Therefore, religious topics or discussion do not belong to this wall. If you want to discuss the matter with someone, please use private messages.
I am saying that in total respect of everybody's beliefs, may they be religious political or private. And I am saying that so everybody can continue what he is doing in peace. As you said it yourself, we are here to contribute to a common thing, not to debate beliefs nor how one should follow these. This is a personal thing, hence should stay personal.
deniko
yesterday - yesterday
Hi deyta,

> Osohombre's sentences contain extreme sex, death, suicide, violence and even religion.

Those topics are not forbidden here, moreover, I don't think they should be forbidden.

This site is for people who learn languages, and, I believe, also for translators who can master their skills or use it to find examples of certain phrases. If a respectable newspaper can use a sentence like "Tom converted to Islam", "Mary was sexually assaulted by Tom", "Tom was gang-raped", "Fadil was shot six times", etc., there is no reasons such sentences should not be allowed here.

Moreover, even explicit profanity should not be forbidden here. This is the way people talk. We need to be able to understand that and translate that to other languages.

> When I translate my psychology is breaking down.

Having said what I said above, I can relate. I never saw anything particularly nasty from OsoHombre, but there were a lot of unlpeasant sentences from other users I didn't want to translate. So I just ignored them. You don't have to translate everything.
hide replies
Impersonator
yesterday - yesterday
> I don't think they should be forbidden

I don’t think anyone argues for forbining such topics. I believe the idea is to have a better classification for such sentences (“+18 warning”), and a way to hide them for users who find them disturbing.

I’m not sure a “+18 warning” is the best way to do this. Perhaps the existing tag system can handle this, and we just need a way to hide some sentences, which contain certain tags. throughout the site. Perhaps we also need a way to assign some tags to the sentence when submitting it (because otherwise it would be visible for some time before someone gets to tag it).
hide replies
deniko
yesterday - yesterday
Fair enough.

I was thinking about something like that myself, but not to protect us, translators (I believe we are all mature adults and can easily ignore what we don't like), but for cases like this one:

https://tatoeba.org/eng/wall/sh...#message_27902

So basically @captcrouton uses tatoeba corpus to teach Ukranian children English. That's where we need tags like "Kids Friendly" or "Not 18+".
TRANG
yesterday
To me this is not a problem you solve on a global scale, but on an individual scale. I don't think we can assign a group of people whose job would be to decide what is "mature" content. We don't have the same sensitivity to sexual, obscene, violent content. Each user should have their own custom filter.

For instance, let's say you consider that a user is adding too many sentences that bother you (no matter the reason), you could put them on your "blacklist", and you (and just you) wouldn't see their sentences anymore. This kind of feature would lead to less controversy than moderator team who can decide what sentences should be hidden to the whole world. But this kind of feature is also technically harder to implement.
OsoHombre
2 days ago
Is there a feature to block a person from sending you abusive messages? I have received 2 private messages containing abusive words from a Tatoeba user. I would like to report that and block that user for good.
hide replies
deniko
2 days ago - 2 days ago
I would suggest taking screenshots of those messages and sending them to an admin you're comfortable sharing that with.

I don't think there is a feature to block/ignore/blacklist a certain user though, as well as certain keywords.
hide replies
OsoHombre
2 days ago
I think we shouldn't blacklist keywords. If someone uses very crude words, they need to be visible and legible to everybody to see how crude the message was. I tried to take a screenshot but I don't know how to attach it to the message.
hide replies
deniko
2 days ago
Right, you can't really attach anything to messages, and I don't think it would be a good idea to use an image sharing website in this particular case.

Probably just send it as a quoted text then. I'm not sure how admins can address such an issue, probably issue a warning or something, a few warnings = ban. I don't think abusive PMs are acceptable here.
hide replies
OsoHombre
2 days ago
We are kind of vulnerable in this case. If the website allows social interaction through PM's, then there is a risk of abusive behavior and/or harassment. Therefore Tatoeba has to do something about that. I have read in Tatoeba's rules something like we're not allowed to say bad things about users in public (on the wall or in comments), and this prevents me from copying and pasting that user's rude language on this wall to publicly denounce them. Copying and pasting his texts as quotes may not be trusted by admins either. Anyone could alter somebody else's statements and make them uglier than the original ones. I need to send screenshots as proof.
hide replies
Guybrush88
yesterday - yesterday
then using an external image sharing website should be the best solution for this

(I edited because I forgot to specify it was about the screenshot part)
deniko
yesterday
You can ask admin's email and send screenshots there.

Also, not sure whether tatoeba somehow encrypts personal messages, I suspect they are not encrypted and stored in the database as is. In this case at least those who have direct access to the database can always take a look at those messages directly, if hard proof is required.
hide replies
OsoHombre
yesterday
I will e-mail an admin about that and see what they can do. Thank you for your answers and suggestions of help.
TRANG
yesterday - yesterday
You can send an email to community-admins@tatoeba.org when you need to report something.

Edit: or rather when you need to report *someone* who is causing trouble to you or to the community. To report technical issues or such, you should write on the Wall, or to team@tatoeba.org.
OsoHombre
9 days ago
Tomorrow is Valentine's Day when people celebrate love. Here are a few of my sentences about love for those who are interested in translating them:

http://tatoeba.org/eng/sentence...mp;sort=random

I would appreciate any effort to help me.
hide replies
Ricardo14
9 days ago
Let's do it ;D
hide replies
CK
CK
9 days ago - 9 days ago
Here are all the English sentences that I've proofread that have either the word "love" or the word "Valentine".

https://tatoeba.org/eng/sentenc...d&list=907


The following is the same advanced search, limited to sentences that haven't yet been translated into any language.

https://tatoeba.org/eng/sentenc...p;sort=created


You can further fine-tune this advanced search by choosing your own native language under the "Translations" after the "Exclude". This will then show you all proofread English sentences that have not yet been translated into your own language.

For example, here is the same search to find those sentences that are not yet translated into Portuguese.

https://tatoeba.org/eng/sentenc...p;sort=created

This is the same search fine-tuned for native Arabic speakers.

https://tatoeba.org/eng/sentenc...p;sort=created
hide replies
Ricardo14
8 days ago
Thanks, CK!
PaulP
8 days ago - 8 days ago
I see there many near duplicates. When we have the sentence „Tom doesn't love me.”, what sense does it have to create „Fadil doesn't love me.” and to start translating it in many languages?
hide replies
Aiji
8 days ago
I think that is not a new problem unfortunately.
In the future, I hope we may get a generic function for names, countries, etc.
I've recently had to adopt maybe twenty times the "same" translation of "What is the average salary in [country name]". Made me think about the same thing you're mentioning but I thought that in the future these sentences could be all merged into "What is the average salary in [generic country]".
Of course, such a feature involves many difficulties specific to each language but I think it was already discussed in the past.
hide replies
deyta
8 days ago
+1
OsoHombre
7 days ago
Aiji:
I am personally against the adoption of what you called [generic country]. Why? Because the website already has 300 languages and one could be interested in knowing how 'Kenya' is called in all 300 languages, then how Russia, Australia, Egypt and South Africa are called. Therefore, and for the sake of having a rich and diverse corpus in each language, I strongly recommend that we don't adopt standard names for persons, cities, administrative subdivisions and countries. After all, the world can't be revolving around just one city. I need to contribute example sentences specific to some cities like Istanbul, Jerusalem and Bangkok, and in a city like Boston, you wouldn't find mosques and Buddhist temples like those you find in other places of the world, besides, there are no bazaars or pyramids in or near Boston. In my opinion, people live in different parts of the world, they represent their own language communities and they should be free to use the languages they know to express their own linguistic reality.
hide replies
Ricardo14
7 days ago
+1

Also, there are some linguistic points here...
As everybody knows, not all languages use the latin script and even some that use them do not "apply" them on the same way for proper names, for example.

some examples:

Tom - In Hungarian, there's also "Tomi" https://tatoeba.org/eng/sentenc...eng&to=hun - https://tatoeba.org/eng/sentences/show/4381219

Τομ - Greek - https://tatoeba.org/eng/sentenc...eng&to=ell (no change)

Том - Russian - https://tatoeba.org/eng/sentenc...eng&to=rus (no change)

but

Ricardo

Greek - Ριχάρδος, Ριχαρδε, Ριχαρδον
Russian - Рикардо
Turkish - Ricardo, Ricardı

there are much more examples so far but this show us that just making strict 'Tom'
and "Mary' might not give us all the examples and show us all the 'beauties' of each language. Besides, there's a practical question here. For example, how can I say 'Cyprus" in Greek, Russian, Hungarian and Turkish? Animes are from Japan, Russia will hold the next FIFA Confederations Cup, etc
CK
CK
8 days ago - 8 days ago
Using standard names does help us more quickly get groups of translations all linked to the same pattern.

For example, someone writes a sentence in German with "Tom" that is translated into Spanish, and then someone translates that Spanish sentence into Japanese which is then translated into Polish, etc. Eventually, we get a lot of sentences that are indirectly translated which potentially can be linked together.

We will always have some newcomers contributing "non-standard names", but perhaps we should discourage people from just flooding the database with a new set of names like this.

There's a well-known proverb that's perhaps applicable.

https://tatoeba.org/eng/sentences/show/1422381
Don't change horses in midstream.

I assume that it's possible to talk about people named Tom, Mary, John and Alice in Arabic. We transcribe foreign names into Japanese all the time and it doesn't seem to be a problem here in Japan.

http://www.biography.com/people...mous-named-tom
http://www.biography.com/people...ous-named-mary
http://www.biography.com/people...ous-named-john
http://www.biography.com/people...us-named-alice

Perhaps some of you will find the following interesting.
This shows you how it's possible to change "wildcard" names.

Sentence Patterns: Substitutions
http://aitstudy.com/sub/

Of course, the computer programming would be a little more complex for languages that make grammatical changes to names.

I plan to “harvest” any of the good English sentences with names that aren’t the standard wildcard names and resubmit them with the wildcard names.

We already have 132,906 English sentences with these 4 standard wildcard names.
http://tatoeba.org/sentences/se...e&from=eng

119,675 of these are on my list of proofread English sentences (List 907).
https://tatoeba.org/eng/sentenc...amp;sort=words

94,067 of these have audio.
https://tatoeba.org/eng/sentenc...amp;sort=words


Reference:

Wildcards Used to Help Avoid Too Many Near Duplicates
http://bit.ly/tatoebawildcards

These are the guidelines that I try to follow and that many other members follow.
hide replies
Aiji
8 days ago
The last link you showed is what I think would be the best for here. If we could have some generic <male name>, <female name>, etc., each language could have "local" names. For the four names you're giving for example, in French it would more naturally be Thomas/Tom, Marie, Jean, Alice.

Of course a name is a name, but translating them is always under questioning so wildcards would be a good compromise in my opinion.
Having wildcards would also be a good occasion to show/read some names of different countries (for example, there aren't so many John in France...), that is a good and important cultural aspect of corpora of sentences.
hide replies
OsoHombre
yesterday
To be completely frank, when I first read the word "harvest" I was a little bit shocked. I imagined aliens visiting the Earth to "harvest" human children or something. Although our sentences can later be used under the Creative Commons Attribution 2.0 license (CC-BY 2.0), I consider that our sentences shouldn't be massively copied and re-written differently just to have some name replaced with another name for the sheer sake of willing to promote a set of standard names (something that Tatoeba doesn't officially recognize at all) with another set of standard names. Imagine someone else thinking that way and deciding to re-write all of the 132,906 sentences that contain Tom and Mary just to replace those two names with different names. Wouldn't this result in a disaster of cataclysmic proportions? Wouldn't this make the website a mere collection of near-identical sentences that are completely boring to translate? And... doesn't this completely contradict our aims to avoid near-identical sentences? After all, if someone gives themselves the right to re-write 3,000 sentences for the mere sake of using Tom instead of Jamal, then why shouldn't somebody have the right to re-write 300,000 sentences using Jamal or Shinzo instead of an unofficial standard name? Don't members have equal rights here? This is why one should really think before they act.
deniko
8 days ago
While I think it's a good idea to MOSTLY use a few standard names/countries/nationalities, I don't think this should be formalized in tatoeba's rules, or even that users should be discouraged to use different names.

It's often useful to see patterns of how different names/nationalities work in different cases. I'll use Ukrainian as an example, but this apply to other Slavic languages as well. I'll use transliteration instead of writing using Cyrillic.

Tom = Tom
Mary = Meri
Other names I'm going to use: Yulia (female), Andriy (male), Boria (male)

Tom loves Mary = Tom kohaye Meri.

Knowing this pattern, how can you come up with this:

Mary loves Tom = Meri kohaye Toma (note the ending, Tom->Toma)
Andriy loves Yulia = Andriy kohaye Yuliu (Yulia->Yuliu)
Yulia loves Andriy = Yulia kohaye Andriya (Andriy->Andriya)
Yulia loves Boria = Yulia kohaye Boriu (Boria->Boriu)

You do need more names than four if you want to understand how they work in different combinations.

That's more than true for countries and nationalities as well.

Even in French, if you know how to say:

I live in France = J'habite en France.

how can you come up with the following:

I live in Canada = J'habite au Canada.
I live in the United States = J'habite aux États-Unis.

So while this pair:
I live in France.
I live in Canada.

looks like a near-duplicate in English, it doesn't look like a near duplicate in French:
J'habite en France.
J'habite au Canada.

So please don't formalize anything regarding using names, countries, and nationalities. Part of the experience you get learning a languages is looking at those patterns, when you throw in different names in very similar sentences and see how they behave.
hide replies
OsoHombre
7 days ago
Deniko:
> looks like a near-duplicate in English, it doesn't look like a near duplicate in French:
J'habite en France.
J'habite au Canada.

You have given a perfect example to illustrate the point. Have you noticed how the French preposition changes before each country name? Besides, I would like to add the following: online Arabic resources are not always practical and complete. In terms of names of countries, this website offers us a possibility to show Arabic learners how to properly use the name of a country or a city in different contexts. This is just one argument among many others that I have in favor of letting users enjoy the freedom of expressing themselves freely in the languages they use.

> So please don't formalize anything regarding using names, countries, and nationalities.

I personally should be surprised that Tatoeba formalize this. Opening the website to 300 different languages and formlizing such a thing are two very contradictory things, I think. It is not in Tatoeba's interest to do it. It is better for Tatoeba to be a good and huge collaborative project similar to Wikipedia instead of formalizing such things. This said, I believe that every member of the project is free to choose whatever they like, but I demand that my freedom of expression be respected on this website, and as I said it many times already, as long as Tatoeba doesn't formally require the use of standard names, I will not use them and, once again, it is not in Tatoeba's interest to do it. Tatoeba looks like a great and ambitious project and such a project needs to open up to this huge and diverse world where more than 7,000 languages are still spoken.

hide replies
Guybrush88
7 days ago
This is the case also for Italian. With countries, one can also understand their gender: for example, Brasil and France have a neutral gender, but, in Italian, Brasil is masculine ("il Brasile") and France is feminine ("la Francia"), so having different possibilities may be useful for learners, since they can see the different genders a country can have in Italian
hide replies
OsoHombre
7 days ago
Guybrush:
The world is so rich and in some languages, even a person's name can be modified by the syntaxic environment via a demonstrative or a preposition. For example, Arabic nouns (including Arabic proper nouns) are affected by what is called 'huruuf al-jar' (or what may be referred to as prepositions). In a sentence like 'I saw Fadil' and a sentence as 'I told Fadil', the Arabic gramamtical form of the noun Fadil is different:

رأيتُ فاضلاً
قلت لفاضلٍ
Transcription: Ra'aytu Fadilan. Qultu liFadilin.

A learner needs to observe such changes and if we limit our standard-name choice to Tom and Mary, Arabic wouldn't be able to display such grammatical features. In fact, we shouldn't base our choices on just one language. Let the world be a natural world and not be affected by choices related to technical criteria. In fact, limiting standard names to two Englosh names could affect both the education and scientific value of Tatoeba's corpus.
hide replies
odexed
7 days ago
I may be wrong but I think huruf al-jar is when we use prepositions like "في البيت" but you gave some good examples of إعراب
hide replies
OsoHombre
7 days ago
'Harf al-jarr' is part of the terminology of Arabic traditional grammar. In modern grammar, we may refer to it as a preposition. In traditional grammar, they call it 'harf al-jarr' because when it immediately precedes a noun, it causes it to end with an /i/ or the 'kasra' diacritic.
OsoHombre
7 days ago
CK:
> Using standard names does help us more quickly get groups of translations all linked to the same pattern.

This brings us back to an eternal debate between natural-language supporters and computer linguists that tend to think that everything should be standardized and mechanized for the purpose of developing language software programs. My opinion is that we should let people express themselves in a natural way. If my friend's name is Fadil, then I prefer to write about Fadil and not Tom. If my city is Cairo or Athens, then I prefer to use my city's name. Linguistically speaking, this is much more natural and interesting than recommending (thank God it's not imposing) people to use a limited set of standard names. This could even block the imagination of people, I think. If I lived in a small Indian village, I would be writing about that village and all the surrounding villages and towns, not about Boston or New York I know nothing about.
hide replies
Hybrid
6 days ago - 6 days ago
>we should let people express themselves in a natural way.

I agree. We shouldn't be forced to use wildcards. I don't want Anne of the Green Gables to become Mary of Boston :)
hide replies
OsoHombre
5 days ago
Hybrid:
It was a good example. Thank you.
OsoHombre
7 days ago - 7 days ago
CK:
> We will always have some newcomers contributing "non-standard names", but perhaps we should discourage people from just flooding the database with a new set of names like this.

Discourage people? I don't think that the words Tatoeba and 'discourage' would make a good recipe. I have already explained my reasons and I what I expect you, guys to do, is to respect my choices and opinions as long as I'm being correct and logical, and not to use any coercive measures against me (as a motivated contributor) and any other user that wants to enjoy the freedom of being a member of their own language community. I think that this is a basic human right that's recognized by international institutions and that every member on Tatoeba should enjoy. To tell the truth, I was even a little bit shocked when I read the word 'discourage'.

> These are the guidelines that I try to follow and that many other members follow.

Yes, but I'm part of 7 billion other members of this planet Earth that maybe want to see things differently. Even if I'm the only one to think like that, I modestly think that 7,000 language communities should be represented in a much better way, not just with Tom, Mary and Boston.
gillux
6 days ago
I strongly believe that we should not change our way of writing sentences for technical reasons. Programs should adapt to languages, not the opposite.

How about relating near-duplicate sentences with a fuzzy matching algorithm? So that for example, on a given sentence page, one could see a list of near-duplicates, along with their translations. I believe such an algorithm could be quite effective, even if it can’t be perfect.
hide replies
OsoHombre
5 days ago
Gillux:
Although programming isn't my concern at all, I like your idea of developing an algorithm to solve the problem of near-identical sentences. My point here is: let programmers find solutions to technical problems and not "bother" normal contributors about that, because, frankly speaking, technological problems should not dictate their requirements on natural people speaking natural languages in a natural way. Those who want to develop talking robots and AI should solve their challenging problems by copying nature. They should not shape nature (and simplify it) in such a way to make it "easier" for them to develop their increasingly sophisticated programs. Just one more note: Gillux, I'm just expressing my view. I'm not being confrontational and I don't want this to turn into an argument, OK? It's just my frank opinion that happens to be very different from some other people's opinion.
deyta
5 days ago
+1
I agree with you.

Something must be done
If there are no limitations or solutions, tatoeba can turn into garbage.
hide replies
OsoHombre
5 days ago
Deyta:
In my opinion, Tatoeba can't turn into garbage as long as there are many admins and what the site refers to as 'corpus maintainters', there is no risk for the site to become a dump or a useless resource. In my opinion, it's up to Tatoeba to make sure that its open-to-the-public interface be used properly but this should be done while preserving its friendly and inviting atmosphere. After all, I think that if the website ambitions to grow (I'm sure it does) to become like Wikipedia, it needs to have the capacity to manage its quality but it also needs to do so without undermining the freedom of its contributors.
Aiji
3 days ago
I hadn't read your post well enough before but saying
"I plan to “harvest” any of the good English sentences with names that aren’t the standard wildcard names and resubmit them with the wildcard names."
is unacceptable as long as we don't have a real wildcard features on the site itself. Until that, no genericity (in the computer sense) so no reason to change people's preference arbitrary. Most of the arguments were given in the thread. So I think there is nothing to "harvest" for change.
hide replies
OsoHombre
3 days ago
+1.
OsoHombre
7 days ago
Paul:
I understand the issue with what you refer to as near-duplicates. I only have a human brain and I can't memorize and guess all the potential near-duplicates that exist on the website. I also guess that every user is free to choose what to translate and what not to translate. That's exactly what I do personally. Still, I will try to avoid what I think could become near-duplicates by avoiding what seems to be simple sentences that just anyone could make like 'X loves Y' or 'X went home.'
hide replies
OsoHombre
7 days ago
To Paul and everybody:
Oh, there is just one point I would like to warn about, if I may. I have noticed that CK re-wrote some of my 'Fadil-and-Layla' example sentences using the names Tom and Mary. I strongly object this because I think that in this context where we all wish to avoid unnecessary near-duplicates to avoid unnecessary translation work, re-writing 'Fadil-and-Layla' sentences as 'Tom-and-Mary' ones should technically constitute an intentional creation of near-duplicates. So I persoanlly think that if near-duplicates are created unintentionally, this is OK, but if they are created intentionally just to 'standardize' the corpus no matter what, I think this would only aggravate this issue of near-duplicates.
hide replies
OsoHombre
7 days ago
To Paul and everybody (2):

And what if...

...there were two or three Spanish-speakers who decide to only use Spanish names like Pedro, Santiago and Carmen, they contribute thousands of sentences using these names, then some day there comes a contributor who decides to translate all their sentences into English? Should the English translator be asked to only use Tom and Mary, making it thus impossible for them to find a sentence to translate from those Spanish sentences? Should the Spanish-speaking constributors be asked to re-write their thousands of sentences using Tom and Mary? Or should their thousands of sentences be ignored by any potential English translator altogehter? And in case the English translator re-submits translations using Tom and Mary and then the translation are back-translated into Spanish, wouldn't that result in a huge number of near-duplicates in the Spanish language? I honestly think that this choice of standardizing names is a non-viable solution. The world and human language are much bigger and richer to be reduced to a pair of English names and an American city.
hide replies
AlanF_US
7 days ago - 7 days ago
OsoHombre, I think your points are solid. Although I understand that restricting names to a small set can have certain benefits, it's certainly true that it can have drawbacks as well. I agree that homogeneity makes working with the corpus boring, whether you're reading existing sentences or contributing new ones. There's only so much time that I can spend in a world composed of one man, one woman, and one city before I want to jump out a window (figuratively speaking). I feel the same way when I experience many sentences based on the same grammatical paradigm: "I smiled." "You smiled." "He smiled." "She smiled." But the same would apply to a paradigm that varied only the place name: "He went to Boston." "He went to Paris." "He went to London." Or the first name: "John went to Boston." "Violeta went to Boston." "Ivan went to Boston." And on and on.

In my personal view (emphasis on "personal"), Tatoeba achieves the most when it tries to do what isn't already being done (often more effectively) elsewhere. Basic grammar is well suited to a textbook, or to a website with a small number of authors that emulates a textbook. Lists of geographical names can be found in a dictionary or atlas or other reference. There are undoubtedly other references for personal names. (Naturally, these are harder to obtain for minority languages, but they are still likely to exist.) But a textbook cannot capture the sheer variety of sentences that you can get from thousands of contributors around the world -- as long as they are challenging themselves not to come up with the easiest type of sentences to produce. Variety can be hard to achieve, but it's rewarding. Maybe it means starting with an external list of vocabulary items that we are not already covering, or items that Tatoeba members have specifically requested, or items that the contributor has found missing. Or maybe it involves tapping some personal source of creativity that could be different for each individual.

What I am trying to say, in short, is that varying personal and geographical names is helpful in increasing the variety and usefulness of our sentences, though it can't do the job by itself.

Several of the administrators had a discussion about the subject of names recently. I don't think that Trang will mind if I quote from her e-mail:

"It's fine [to use your] own sets of names, as long as [you do] it with common sense. We don't actually have standard names. There was never an official statement that contributors should be urged to use Mary, Tom or John over other names. We just have names that are more commonly used than others. ...

[O]verall, everyone has a different opinion about what names to use and how to translate names. So we definitely cannot make any rules about this.

If a contributor feels it is more important to use names that are more connected to their language/culture/identity, than to try to make sentences that 'fit' better in the Tatoeba corpus and/or are more 'convenient', it is their decision, and it is okay."
hide replies
Aiji
7 days ago
So much have been said above, that I will answer to your post (that makes a lot of sense) so people may see it.

First of all, I am neither for one or the other solution. I was just giving arguments for the wildcards option because CK mentioned it. Therefore I am certainly not for the option CK recommend and I am certainly not against it. But since so many counter-arguments came, let's play the game.
That being said, there was a point somewhere arguing the difference between linguists and computer scientists or something like that. Being part of the two worlds, I find this argument clearly irrelevant. And I would say that the problem comes from the opposite side, that is people who can't see in both worlds.

I have said that we could use wildcards for names, countries, cities, and I maintain my point. I also have said that this would imply a huge programming mess that I can easily imagine (that is the part of the computer world) as such a programming would have to consider the thing in each language independently (that is for the linguist world) making the things nearly impossible to extend to everything anyway.
However, not once have I said that we MUST use wildcards and ONLY wildcards for the sake of Tatoeba. As I said in the introduction, I am against that. That a simplistic approach that could only work for simple languages. I did not say that we should use ONLY four names, I have said the total opposite: "each language can have his set of local names", not "his set of four names".
So if you want to use Farid Layla Jamal Mohamad Kaori Hitomi Daisuke Daiki Pedro Carla Paul Ken Lin or whatever the hell you want, I WILL encourage you for sure (my first message was only nine lines long and I think I mentioned the cultural aspect, didn't I).

When I am told the difference between « la France » et « le Canada », thank you I think I know the difference in my own mother tongue. But again, irrelevant. If I have a French country wildcard of ten countries (or more): five male, five female, problem solved (again, that is the computer part). Then we have to deal with another issue, that is a corpus of sentences is not a dictionary. If you have a sentence with a representative set (that is, every possibilities of the language are represented several times in the wildcard) and you don't find Kenya, you can 1) take a dictionary 2) ask the sentence with Kenya like somebody ask for vocabulary. I'll stop here because the difference of corpora is a different debate.

And finally, there is another wonderful Tatoeba feature that is you can add as many translations as you want. So if a wildcard is appropriate, select the wildcard. If not, do not select the wildcard, I can't see any problem here. In French and several other latin languages, the <name wildcard> is probably the more obvious. In Japanese, the <country wildcard>, the <citizen name wildcard>, etc. are also obvious wildcards. But in a tricky situation, I may find the wildcard is not well-adapted so I will add another translation. And then we have 1 wildcarded sentence + 2 non-wildcarded sentences VS 18 non-wildcarded sentences.

Long story short, I am for you doing what you want to do and clearly against imposing names of one country to every languages. This would be an attack against the cultural aspect of languages. Personally, I think a name is a name so if I write a translation, I do not translate the name. If I write the sentence by myself, I use the name I want. A good wildcard option would NOT restrain anything, it would just be ONE MORE possibility (but again, programming load, etc.) that people could use, not must use. If somebody changes your sentences without your permission, you should strongly oppose that and tell people here.
hide replies
OsoHombre
6 days ago
Aiji:
I will answer your post point by point:

> Therefore I am certainly not for the option CK recommend and I am certainly not against it.

Neither am I against CK's option. What I ask for is that other people and I be left alone if we want to use our own names. I want to enjoy my freedom as a native Arabic speaker. If I feel and I don't want to have anything to do with Tom, Mary, and Boston as standard names, then I should be free to do that and my choice should be respected. So the idea is simple: everyone should be absolutely free to choose their own path to take.
OsoHombre
6 days ago
Aiji:
"And I would say that the problem comes from the opposite side, that is people who can't see in both worlds."

A problem? Hahaha... I'm no computer programmer and that obviously puts me in the other category, i.e. people who want to contribute natural sentences without taking into account programming criteria, but I enjoy my freedom as a person who produces speech without worrying about whether a computer is going to understand it or not. That's my personal choice and as far as I remember, Tatoeba doesn't oblige its users to take programming criteria into account either.

OsoHombre
6 days ago
Aiji:
> I WILL encourage you for sure (my first message was only nine lines long and I think I mentioned the cultural aspect, didn't I).

Thank you. Encouragement in the good sense is always welcome. Besides, I think that there is no reason for you to get upset. In my reply, I wasn't 'reproaching you' personally with believing or wanting to impose something on somebody else. I took advantage of my reply to you to reply to the other people that kept messaging me at every opportunity they got both publicly or in private to ask me to adopt that limited set of standard names. So please calm down and be cool. My apologies if my message sounded too direct or personal towards you, but please try to understand that I use this wall to deal with general issues, ideas, and attitude but not to personally quarrel with users.

OsoHombre
6 days ago
Aiji:
> When I am told the difference between « la France » et « le Canada », thank you I think I know the difference in my own mother tongue.

I don't even know French well. I am among the few people that were trained in English in my own country although French is the dominant foreign language here. The example sounded perfect to illustrate the point to other people, not to a French native speaker.
OsoHombre
6 days ago
Aiji:
One final remark:

I'm here as a friend and quarreling is the last thing I want on a website like this. You also need to understand that to me, a public message on a wall is not necessarily addressed the person I reply to but to everybody as well. My apologies if there was any misunderstanding.
OsoHombre
6 days ago
Alan:
I will reply to your message point by point.

> Re: jumping out of the window.
I like your sense of humor.

> But the same would apply to a paradigm that varied only the place name: "He went to Boston." "He went to Paris." "He went to London." Or the first name: "John went to Boston." "Violeta went to Boston." "Ivan went to Boston." And on and on.

I understand your concern and Paul's. As I said before, I only have a human brain (as most of us do on this website), however and in order to avoid nearly-identitcal sentencesn that are boring to translate, I will make everything I can to provide new sentences with new words. I need my own corpus to be lexically and grammatically rich to have interesting and challenging patterns for my co-workers, my students and myself to translate (I am already inviting all the folks around me to take part in the translation of the sentences into Arabic).

OsoHombre
6 days ago
Alan and Trang:
> If a contributor feels it is more important to use names that are more connected to their language/culture/identity, than to try to make sentences that 'fit' better in the Tatoeba corpus and/or are more 'convenient', it is their decision, and it is okay."

I just breathed a sigh of relief and I would like to thank Trang for writing these wondeful lines and Alan... thank you very much for publishing them. I think that today is my happiest day on Tatoeba although I've only been here for a couple of weeks.

شكرا لكما من صميم قلبي و أنا ممتنّ لكما لاتّخاذ هذا القرار الصائب و الحكيم. هذا هو أسعد يوم لي في موقعكم.

My thanks also go to Admin Pfirsichbaeumchen who had informed me that the admins were talking about the matter.

I can now rest assured and continue to work in peace (at last) on this website that (thank goodness) guarantees the freedom of expression of people of different language communities which is absolutely necessary for the promotion of every language.

Besides, this freedom would also increase the scientific value of Tatoeba's corpus, so long live language diversity and long live the freedom of expression of every language community on this earth.

Amastan
6 days ago
I've read everything and it was quite rich and long (whew!) I think that Ostrohombre is right. If Tatoeba doesn't officially "urge" contributors to use wilcard names, then why should someone bother using them if they don't want to? A few years ago, I decided to stop using them but I kind of backed down a little bit although I wasn't really convinced that I had taken the right decision and I somewhat knew that some day this issue would arise again. Now I'm no longer as active as before but I have always wanted to express some ideas in that sense. Not only do I think that it's not necessary to urge contributors to strictly use those wildcard names and nothing but those names (even if 98% of "us" actually use them), but I also agree with Ostohombre on the fact that this might threaten linguistic diversity on Tatoeba.

As a member of the Berber-speaking community, I belong to a community that has its own names that date back to thousands of years ago. Many of our revived names are actually those of our kings in ancient times like Massinissa and Jugurtha (who fought against the Romans) and we have many children that proudly bear those names today.

https://en.wikipedia.org/wiki/Masinissa
https://en.wikipedia.org/wiki/Jugurtha

My Tatoeba nickname itself is a Berber name born by a famous Tuareg king.
https://en.wikipedia.org/wiki/Moussa_Ag_Amastan

Until recently, Algerian authorities banned Berber names and civil registrars refused to record them. Now with the recognition of Berber as an official language both in Morocco and Algeria, the situations is changing little by little. I'm sure that there are many other communities that are facing the same problem around the world, therefore letting people pick the names they like on Tatoeba might be very helpful to these communities as well.

I too disagree with words such as "discourage" and also "flooding the site with X or Y". I think that Tatoeba is like a huge aquarium where every fish is free to swim as they like as long as they abide by the rules of the project. If some want to continue to use (by their own will) wildcard names, other should also be allowed to use whatever they want as long as it's OK with the rules.

To Osohombre:
أهلا بك في موقعنا. بإمكانك أن تراسلني بالعربية أيضا.
hide replies
OsoHombre
5 days ago
Amastan,
Thanks for your support.
شكرا لدعمك.
OsoHombre
5 days ago
I hardly noticed the word 'flooding' in CK's message. I totally disagree with such a word, too. I think that if there is equality among this website's members, every member has the right to contribute sentences in 'big' numbers. If a person who refuses to use Tom and Mary has their work labled as 'flooding', why shouldn't a person who uses them have their work labeled like that as well? This isn't a personal attack against anyone, but I like to dot my i's and cross my t's.
Aiji
2 days ago
[BUG?]
When using the "OK, unsure, not OK" system, once I clicked on one of them, I am unable to change (or cancel) my choice without reloading the page.
Does this happen to others too?
hide replies
Guybrush88
2 days ago
yes, it happens also to me
sharptoothed
2 days ago
** Sentences & Translations Stats **

These stats, graphs & charts have been updated:

http://tatoeba.j-langtools.com/transtop/
http://tatoeba.j-langtools.com/graphs.html
http://tatoeba.j-langtools.com/userchart/
hide replies
OsoHombre
2 days ago
Thank you for publishing these useful stats.
My shares have gone down this week.
deniko
2 days ago
@deyta & @duran made it happen - Turkish is now at the second place in the main tatoeba language chart! What an impressive work. They translated 6000 sentences in one week, accodring to your data.
hide replies
OsoHombre
2 days ago
I would like to congratulate my Turkish brothers for the impressive work they've done. I look forward to seeing Turkish become the first language on Tatoeba. As for me, I must now focus to enriching the Arabic corpus that needs much, much work.
hide replies
deyta
2 days ago - 2 days ago
Thanks.

I hope the Ukraine and Arabic languages rise to the top.
Tatoeba is always open to new people and ideas.

Sağolun.

Umarım Ukrayna ve Arap dilleri en üst sıralara yükselir.
Tatoeba her zaman yeni kişi ve düşüncelere açık.
hide replies
OsoHombre
2 days ago
That's why I love Tatoeba.
sharptoothed
2 days ago
Really looks like that. I hope my data is accurate enough. :-)
Guybrush88
2 days ago
thanks
OsoHombre
2 days ago
Today's featured word:

Woman - امرأة، مرأة

https://tatoeba.org/eng/sentenc...amp;sort=words

I'd appreciate if you helped me have these sentences translated into your languages.
keyboard_arrow_left 1234567...461