clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search
Pandaa Pandaa 20 days ago, edited 18 days ago 2019-11-02 09:42:50, edited 2019-11-03 13:40:23 link permalink

Useless words in Tatoeba:
Tom ~ 373025 results in English (30,16%); all: 930661 (11,77%)
Mary ~ 146934 results in English (11,88%); all: 235951 (2,98%)
Boston ~ 15383 results in English (1,24%); all: 36270 (0,46%)
Sami ~ 53492 results in English (4,33%); all: 74598 (0,94%)
Layla ~ 18740 results in English (1,52%); all: 22535 (0,28%)

29 hours later:
Tom ~ 373040 results in English (+15); all: 930754 (+93) (28092 a year)
Mary ~ 146942 results in English (+8); all: 235984 (+33) (9968 a year)

{{vm.hiddenReplies[33406] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 19 days ago 2019-11-02 15:30:12 link permalink

Tatoeba toimii vapaaehtoispohjalta, joten oikea ratkaisu on käyttää laajempaa kirjoa paikkoja ja nimiä omissa lauseissaan. Minä käytän joskus suomentaessani nimien vierasperäistä muotoa ja yleensä korvaan ne vastaavalla suomalaisella nimellä. Aavistus monimuotoisuutta siitäkin tulee. Uusissa lauseissa käytän vaihtelevia suomalaisia nimiä ja paikannimiä, jos käytän nimiä lainkaan.

Ehdotan samaa unkarin suhteen: Käytä uusissa lauseissa, joita kirjoitat, unkarilaisia henkilöitä ja paikkoja. Kun käännät lauseita, älä käännä lauseita joissa on mielestäsi liian yleisiä nimiä, tai käännä ne käyttäen vastaavia unkarilaisia nimiä. Esimerkiksi sivusto https://www.behindthename.com/name/tom-1 antaa Tom'ille unkarilaisiksi vastineiksi nimet Tamás ja Tomi.

{{vm.hiddenReplies[33407] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pandaa Pandaa 19 days ago, edited 19 days ago 2019-11-02 16:02:01, edited 2019-11-02 16:14:40 link permalink

Az egy dolog, ha majd lefordítom Tom-ot Tamásra, Mary-t pedig Máriára, ha egyszerűen sokan nem írnak át neveket, meghagyják Tom-nak, miközben lehet olyan nevet nem is használnak feléjük. És akkor aztán lehet mondani, hogy sokféleség kell, nem vacakol, lesz az Tom és kész.
Én is bajba tudok lenni, ha egy Sami-t kell lefordítsak, vagy Layla-t (talán egy ilyen mondatot sem fordítottam még), esetleg Sámuel és Leila lehetnének (egyáltalán nem nevezném megszokott magyar neveknek, nem is épp gyakoriak), és ezeknél nagyjából senkinek sincs ingere, hogy fordítson. Pont a ritkaságuk is hozzájárul, hogy nem lesz lefordítva ötvenezer Sami-s mondat, de talán még ötszáz se.
A Boston meg unalmas, talán sehol sincs másképp... (en: Boston/ hu: Boszton v. Boston/ de: Boston...), ha London lenne, akkor is ugyanazt mondanám (en: London/ hu: London/ de: London...), nade egy olyan, hogy: Anglia (en: England/ hu: Anglia/ de: England/ sp: Inglaterra), abban már lehetnek különbségek.

A nevek írása előtt el kéne dönteni, hogy szükséges oda, vagy sem.
Nézze, az angol mondatok 30%-ában szerepel az, hogy Tom.
Szerintem ennyiszer semmiképp sem szükségelt egy név alkalmazása a mondatokban.

Magyarban nem létezik olyan név, hogy Tom, mégis van 14728 belőle, míg Tamásból csupán 2103.

Az első hiba az volt, hogy valaki elkezdett sorra neveket írogatni.
A második pedig az, hogy a fordító a nyelvtől függően lusta vagy tehetetlen volt a nevek fordítását illetően.
A harmadik hiba pedig az volt, hogy míg a multkor nagyban felszólaltak sokan a nevek ellen, addig a tudjuk ki lapított, mikor pedig 'a vész elhárult', újra feléledt, újra promóz, újra Tom-os mondatokat gyárt (nem ír, hanem gyárt), és tán még arányában többet is, mint valaha.

A harmadik tűnik a legsúlyosabbnak.

{{vm.hiddenReplies[33408] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 19 days ago 2019-11-02 17:02:39 link permalink

Minäkään en näe suurta arvoa valtavassa määrässä Tom-lauseita. Nimet sinänsä ovat osa kielenkäyttöä ja siten myös luonteva osa Tatoebaa.

Joku - luultavasti jotkut - ovat lisänneet monia Tom-lauseita, koska ovat nähneet sen hyödyllisenä ja arvokkaana. Jos sinä et koe sitä hyödyllisenä tai arvokkaana, niin käytät toisia nimiä ja kenties jätät Tom-lauseet kääntämättä.

Tatoeba-tyylisen vapaaehtoistyön ydinperiaate on: Anna muiden lisätä mitä he haluavat, ja lisää itse mitä sinä itse haluat nähdä.

GuidoW GuidoW 18 days ago 2019-11-03 16:04:49 link permalink

I don't see your point...why are they useless in your opinion and why is the difference of 29 hours important to anyone ?!

{{vm.hiddenReplies[33412] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 17 days ago, edited 17 days ago 2019-11-04 13:45:28, edited 2019-11-04 14:18:38 link permalink

These proper nouns are not bad in themselves, but they displace others. Every time someone writes a sentence with the name "Tom", they're not writing a sentence with another name (or with a pronoun). There is the same imbalance regarding months, days of the month, language names, and so on. As a result, it becomes very difficult to learn other ones. In addition, in Slavic languages, native names are declined, but foreign ones are not, so sentences containing, say, "Мэри" ("Mary") deprive learners of the ability to work with these declensions. Furthermore, the preponderance of two Christian Anglocentric names makes the collection of sentences homogeneous and boring. This affects not only Tatoeba, but other sites that use our sentences, such as Clozemaster.

The reason for the preponderance of these names is that they are the only ones used by the biggest contributor in English (which is the most extensively translated language here). His claim is that this reduces the number of near-duplicates ("Tom went to the store", "John went to the store", and so on), which may be true for short sentences that differ only in terms of proper nouns. However, it doesn't eliminate other types of near-duplication. More importantly, this contributor has never acknowledged the other problems raised by this lack of a variety of names. Other contributors claimed that requesting the use of other names was "banning" and "censorship", despite the fact that large numbers of sentences with "Tom" and "Mary" already existed and no proposal was being made to change or remove them.

As for citing the number of sentences added in the last few hours, I believe Pandaa's point was that sentences with these names are still being added at a high rate.

There have been many discussions about this on the Wall. During the most recent one, Trang, the founder of the site, asked whether people could stop adding sentences with these names, in view of the fact that so many of them already existed. While many people supported Trang's request, people who wanted to continue to add sentences with these names added overwhelming numbers of such sentences while the discussion was in progress. The discussion was eventually abandoned. However, people are still urged to contribute sentences with a variety of names of people, months, days of the week, and so on.

You can find these discussions if you page back through older threads on the Wall.

{{vm.hiddenReplies[33415] ? 'expand_more' : 'expand_less'}} hide replies show replies
AmarMecheri AmarMecheri 16 days ago 2019-11-05 11:25:13 link permalink

Hi!
As far as I am concerned, I have never had any problem or state of mind - although I am not at all practicing - to translate the phrases of the type Tom and Mary.  Even though I found it boring, by dint of repeating it.
Since there are clearly people who refuse to translate Christian names (and / or other obediences), why not prefer ADAM and EVE who had (perhaps) no religion, in their time?  Why all these futile squabbles, even about flags?  of expressions?  etc.
All in all, we are here to exchange with our reciprocal languages: we are here for science.  That's all (folks), I think. We should take it easy and say "cheese".
Greetings for all the contributors.

{{vm.hiddenReplies[33425] ? 'expand_more' : 'expand_less'}} hide replies show replies
shekitten shekitten 16 days ago 2019-11-05 11:57:34 link permalink

I don't think it's that anyone refuses to translate Christian names. For myself, I have translated countless Tom/Mary sentences, and I continue to translate ones that were made pre-2019.

It's just that there are already so many Tom/Mary sentences that I, personally, don't feel that they enrich the corpus anymore. I would much rather translate Sami/Layla sentences (despite not being Arab) because I feel these enrich the corpus. I would feel the same way about any other pair of names besides Tom and Mary.

belkacem77 belkacem77 16 days ago 2019-11-05 13:11:41 link permalink

@AmarMecheri

I find that keeping Tom, Mary, Boston, October...and other proper nouns used by CK is a good idea for me. It allows me to make replacement easily with my programs while using them on other projects and it avoids repetetive sentences.

I can't understand why others are using the same sentences (English) and just replace proper nouns with Sami and Layla....etc.

It's not intersting to get the same sentences with just different proper nouns.
I think we have to (re)define the objectives of Tatoeba.

I find that English proper nouns for persons and places sound good (for ears), so we can keep them on the Kab corpus. But we can create our own sentences (different sentences) with our proper nouns just for recordings (audio).

It's my point of view.

{{vm.hiddenReplies[33428] ? 'expand_more' : 'expand_less'}} hide replies show replies
AmarMecheri AmarMecheri 16 days ago, edited 16 days ago 2019-11-05 14:40:22, edited 2019-11-05 14:45:08 link permalink

@belkacem77
I was answering in general ... Without aiming at anyone since all anglophones use Tom & Mary.  This is not a problem for me (I translated a maximum of sentences containing these names) but I DON'T THINK THAT IT DOES NOT DISTURB ANYBODY.  Me no.  You neither.  All the better.  And if it suits you for your algorithms, it's even better.  If I like a phrase, neither Tom nor Mary can stop me from translating it. 
Greetings,
again for all Tatoeba's contributors

Thanuir Thanuir 16 days ago, edited 16 days ago 2019-11-05 15:32:30, edited 2019-11-05 15:32:51 link permalink

Different names incline in different ways in several languages - some Finno-Ugric and Slavic ones, at least. This means that systematically changing or replacing them is very demanding.


Different names are transliterated in different ways between writing systems, maybe also based on language.


Some names have specific connotations or are parts of idioms.


Some sentences concerning history or politics should use specific given names.


Many cities and countries have different names in different languages. Months and weekdays also have different names. Learning these is a part of learning a language. Sometimes the names of the months or the days are compound words, which adds to their value.


Many language learning sources use Tatoeba as a source of material. These benefit from having native names, as learning what names look like makes life easier in a foreign country. Learning how the names are pronounced is part of language learning.


Translating sentences with varied names is more interesting, both at Tatoeba and when using a language learning service (like Anki or Clozemaster).


When creating voice input, it is easier to have native names, as those are easier to pronounce. Having foreign names is valuable, too. Having a mix of both is the best.

AmarMecheri AmarMecheri 16 days ago, edited 16 days ago 2019-11-06 00:07:24, edited 2019-11-06 00:08:05 link permalink

@belkacem77
You said:
>> "I find that keeping Tom, Mary, Boston, October...and other proper nouns used by CK is a good idea for me."
Je n'ai jamais dit le contraire!
I have nowhere said the contrary!
I'll do no other comment about the substitution of "Tom & Mary" by other pronouns.
I don't know to do so.
You already know my opinion.
Good luck for your job on language development on science computing.

maaster maaster 17 days ago, edited 17 days ago 2019-11-05 05:10:07, edited 2019-11-05 05:25:12 link permalink

After that discussion nothing has happened.
I don't translate on Tatoeba a while; I don't find sentences being worth translating. (Perhaps one in a hundreed.) And I am not the only one.
In case Tatoeba has a low quality who'll translate beside the phanatics?

(some examples in place of Tom: I, the boss, the butcher, the vicar, my niece, one of my classmate's father, granny, most of them, a guy on the street, a protestor, a right-wing politician, a shop assistant, agents, a conservative journalist; the only thing, the last night, the most complicated way, the easiest solution - and many thousands of other words)

{{vm.hiddenReplies[33417] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pfirsichbaeumchen Pfirsichbaeumchen 17 days ago, edited 17 days ago 2019-11-05 05:26:46, edited 2019-11-05 05:38:39 link permalink

[ENG] What makes sentences worth translating? Does a sentence automatically lose its worth if it contains the name "Tom" no matter how rich it is otherwise, or is there more to it? These sentences should be rather diverse: https://tatoeba.org/deu/sentenc...o=&sort=random (untranslated German ones). How are they not worth translating, for example?

[DEU] Wann lohnt es sich, einen Satz zu übersetzen? Verliert ein Satz automatisch seinen Wert, wenn er den Namen „Tom“ enthält, egal, wie reichhaltig er im übrigen ist, oder gibt es da noch mehr Kriterien? Die folgenden Sätze dürften recht abwechslungsreich sein: https://tatoeba.org/deu/sentenc...o=&sort=random (unübersetzte deutsche Sätze). Inwiefern sind es diese zum Beispiel nicht wert, übersetzt zu werden?

🙂

{{vm.hiddenReplies[33418] ? 'expand_more' : 'expand_less'}} hide replies show replies
maaster maaster 17 days ago 2019-11-05 06:29:50 link permalink

DU kannst ja sehen, dass ich immer von dt. Sätzen einige übersetze - aber nur die Originalen, die ich für benutzbar halte. Heutzutage kann man wieder etliche interessanten finden.
(Wenn ich aber Sätze wie Tom ist sehr reich, o. Tom ist mein Nachbar oder Sätze aus Enzyklopädien rauskopiert finde, bereite ich sofort eine Pistole zum Selbstmord vor.
Tatoeba-Sätze sind entweder allzu einfach oder zu enzyzlopedisch - Sätze für Kindergarten oder Sätze, die niemals verwendet werden werden.
Und Sätze, die man in Alltagsgesprächen benutzen würde, oder in Zeitungen lesen könnte, oder die die außländischen Arbeitnehmer (bei der Arbeit) verwenden könnten, kann man kaum finden.

{{vm.hiddenReplies[33420] ? 'expand_more' : 'expand_less'}} hide replies show replies
AmarMecheri AmarMecheri 16 days ago, edited 16 days ago 2019-11-05 15:01:21, edited 2019-11-05 15:03:12 link permalink

@maaster
That's a pity you don't read Kabyle: some of our sentences are very original, especially the proverbs, riddles, current expressions.
This is why I systematically translate my Kabyle sentences into French (and sometimes English when I can). I do that to make them visible and accessible to the other languages, provided that their speakers understand French and/or English.
Please, feel free to comment or add / link some of your sentences to mine and to other Kabyle sentences written by our team.
You are welcome.
AmarMecheri

{{vm.hiddenReplies[33431] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pandaa Pandaa 16 days ago, edited 16 days ago 2019-11-05 15:20:32, edited 2019-11-05 15:28:40 link permalink

Wir kaufen Nix! :)

{{vm.hiddenReplies[33432] ? 'expand_more' : 'expand_less'}} hide replies show replies
AmarMecheri AmarMecheri 16 days ago, edited 16 days ago 2019-11-06 00:19:26, edited 2019-11-06 00:29:37 link permalink

Ah, wenn Sie nichts kaufen ... Vielleicht ... hat niemand nichts zu verkaufen!
Ah, if you do not buy anything ... Maybe ... nobody has nothing to sell!

jegaevi jegaevi 17 days ago, edited 17 days ago 2019-11-05 08:31:52, edited 2019-11-05 08:32:23 link permalink

@Pfirsichbaeumchen

>>>Does a sentence automatically lose its worth if it contains the name "Tom" no matter how rich it is otherwise

No. My problem is not that there are a lot of sentences containing Tom, Mary or Boston. I'm OK with that if as you said it is rich otherwise.
But sentences like:
I wonder whether Tom has already told Mary she doesn't have to do that.
Tom said that Mary knew that he wanted to do that by himself.
Do you think Tom would mind if Mary did that?
Tom and Mary told me they like doing that.
Do you think anyone will care if Tom and Mary don't do that?
I know Tom is planning on doing that tomorrow.
Tom doesn't plan on doing that anytime soon.

There are a lot of these. I'm not saying that they are useless, but there are too much of them. I only had to got trough like 4 pages to find all these.
But that's just one example. There are countless sentences like: Tom likes apple. Tom likes basketball. Tom and Mary like apples. Tom, Mary, John and Alice all like apples. Etc. It's just boring and repetitive. It's like they all are made from the same mold. And because of all these I have to go trough around 10 pages to find 10 good sentences to translate.

{{vm.hiddenReplies[33423] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 16 days ago 2019-11-05 11:53:28 link permalink

> I have to go trough around 10 pages to find 10 good sentences to translate.

You can solve this by refining the search query to exclude some words, like soliloquist suggested below:
https://tatoeba.org/eng/wall/sh...#message_33422

You could also solve this by choosing to translate sentences from a specific user. You can use sharptoothed's website to find a user who has contributed sentences in a certain language: https://tatoeba.j-langtools.com/allstats/

Then you simply go to the page that lists sentences of that user. For instance I found this user whose sentence might be more interesting to you:
https://tatoeba.org/eng/Sentenc.../mmorelfmc/eng

In general the problem of having "good" sentences to translate shouldn't be the responsibility of those who create sentences. It is pretty much impossible to guess what will be interesting to translate because the criteria are different from a person to another. You and I will not have the same definition of what is interesting. So it is our own responsibility to filter the sentences according to our taste. And it is Tatoeba's job to provide the proper features to filter the sentences.

I think Tatoeba already has sufficient features for that but the main problem is that they are not intuitive enough.

Pandaa Pandaa 16 days ago, edited 16 days ago 2019-11-05 13:45:24, edited 2019-11-05 13:49:08 link permalink

Igaz, szintén probléma a sok hasonló, és kábé semmit nem mondó mondat.
Addig még talán jó is, hogy lefordítasz párat, míg még erősen az elején tartasz a nyelv tanulásának; én is lefordítok olyat brazilról, spanyolról, hogy A tehenünk nem ad tejet. (mialatt az is igaz: nem fogom lefordítani azt, hogy A szomszéd tehene nem ad tejet., A te tehened nem ad tejet.)

Viszont míg másokat talán nem, engem elsődlegesen a nevek zavarnak;
- azok fordításmentes ~változatai (akár, mint oroszban a cirill átírás), melyek nem honosak a nyelvben
- azok nagy száma, melynek nem látom létjogosultságát

pár példa amikor jó, ha van egy név a mondatban, sőt talán elengedhetetlen is:
- Hé te! - Én? - Igen, te! Mondd a nevedet! - Tamás vagyok. - Na, Tamás, szeretnél sokat keresni?

- Tessék mondani, kit keres? - Mór Tamáshoz jöttem időpontra. - Kilencre tetszett jönni? - Igen. - Ön Kovács Marianna? - Igen, én vagyok. - Rendben, azonnal értesítem. Kérem, várjon egy kicsit.

'Ismered Tamást?' 'Találkoztál már Tamással?'
'Szeretném bemutatni neked az új osztálytársam. Ő itt Tamás.'

(persze, hogy van helye a neveknek: Őt küldtem le végül boltba. (a beszélő biztosan tudja, hogy kiről van szó); Tamást küldtem le végül a boltba. (mert pl van még három testvére, el lehet fogadni))
...DE amikor: a Tom-ot és Mary-t izibe ki lehet cserélni HE-vel meg SHE-vel és semmit nem változtat a mondat jelentésén, az elszomorító tud lenni.

Nem vagyok hajlandó elhinni, hogy az angol mondatok 30%-ában szerepelnie kell a Tom névnek, vagy akármilyen névnek.
Soha egy beszélgetésben nincsenek ilyen nagy számban a nevek, max egy osztálykép nézegetés esetén, amikor is a nagyi mindenki nevét végigkérdezi a képen.

AmarMecheri AmarMecheri 16 days ago, edited 16 days ago 2019-11-06 00:41:46, edited 2019-11-06 00:42:23 link permalink

@Pfirsichbaeumchen

Well then!
With Tatoeba, we learn at least two languages at a time, in addition to ours!

soliloquist soliloquist 17 days ago 2019-11-05 07:37:15 link permalink

The following search shows some random English sentences that are not yet translated into Hungarian, and also don't have the words Tom, Mary, Boston, Australia, Canada, French, John, Alice, Bob, Jane, Dan, Linda, Sami, Layla, Mennad, Algeria, Berber or Kabyle. And there are lots of sentences without them. If you don't like translating sentences with such wildcard names, you could just bookmark this link.

https://tatoeba.org/eng/sentenc...o=&sort=random

I had to add a letter in the search because it gives error without adding some inclusive input. I chose the letter A, which is quite frequent as an initial letter, but you could change it to another.

{{vm.hiddenReplies[33422] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 16 days ago 2019-11-05 17:29:11 link permalink

Thanks! That search does come up with more interesting sentences.

CK CK 16 days ago, edited 16 days ago 2019-11-06 00:16:06, edited 2019-11-06 00:50:09 link permalink

Here is a similar search with over 180,000 English sentences with audio that haven't yet been translated into Hungarian.

Search: a*|*a|*a*|*e|e*|*e*|i*|*i|*i*|o*|*o|*o*|u*|*u|*u*|y*|*y|*y* -tom -mary -boston -australia -canada -french -john -alice -bob -jane -dan -linda -sami -layla -mennad -algeria -berber -kabyle

Limited: to English with audio

Excluding: sentences that already have a direct Hungarian translation

Showing: indirect Hungarian translations if they exist

https://tatoeba.org/eng/sentenc...o=&sort=random


** Alternative searches

These are the same as above, but with different limitations aimed at quality control. These both get over 300,000 results.

Limited to English sentences on List 907, but not limited to ones with audio

https://tatoeba.org/eng/sentenc...o=&sort=random

Limited to sentences owned by members claiming to be native English speakers, but not limited to ones with audio

https://tatoeba.org/eng/sentenc...o=&sort=random


Native speakers of other languages who want similar results can click one of these links, and then fine-tune the "More search criteria" on the right side of the page by changing "Hungarian" to your own native language, and then bookmark the resulting page to reuse this random search.