Wall (5,665 threads)
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
an hour ago
6 hours ago
16 hours ago
16 hours ago
20 hours ago
20 hours ago
20 hours ago
22 hours ago
I think there's an overabundance of the general use vernacular ''American past simple'' and a serious shortage of the past perfect (the present perfect as well but that's less problematic). I mean, is everyone okay with that? Just wondering, because sometimes there's a subtle difference in meaning.
E.g. in the case of sentence nr. 6353363: Tom said he didn't have any idea why Mary did that.
With past perfect tense: ''Tom said he didn't have any idea why Mary had done that.''
In e.g. Dutch this would produce two different translations:
(if past simple)Tom zei dat hij geen idee had waarom Mary dat deed.
Mary was still doing it or did it as a habit when Tom said it.
(if past perfect)Tom zei dat hij geen idee had waarom Mary dat had gedaan.
Mary was finished doing it, before Tom said what he said. It's similar to the past perfect in English which is used to make it clear that one event happened before another
Many translators solve this by providing two different translations, but this isn't being done consistently across Tatoeba.
Yes, you are right that the "American simple past" is quite overabundant, but that is because most of those sentences belong to the same contributor, whose way of speaking that is. Other examples include
“Mary said she doesn’t have very much money.” (#6417800)
My recommendation would be to guess the most obvious meaning (if it seems ambiguous) and translate it accordingly into correct indirect speech in your target language. For example, I would assume that “Tom said he didn’t have any idea why Mary did that” means “Tom said he didn’t have any idea why Mary had done that” and not “Tom said he didn’t have any idea why Mary was doing that” or “Tom said he didn’t have any idea why Mary did that occasionally”. Thus, I would translate it as „Tom sagte, er habe keine Ahnung, warum Maria das getan habe“ into German. Maybe I would ask for confirmation as to what he was really trying to express in that particular case before translating it. The answer is indeed usually, “It can be either.”
> I think there's an overabundance of the general use vernacular ''American past simple'' and a serious shortage of the past perfect (the present perfect as well but that's less problematic).
Do you mean that you're having trouble finding sentences in the past perfect, perhaps because you want to translate them? Or that you wish the distribution of sentences was different, maybe more diverse, or more reflective of the frequencies found in some external corpus (if so, where?)?
If you want to find sentences in the past perfect, you can do a search like this:
had << *bed|*ced|*ded|*ved|*ted|*zed
which you can expand to other combinations ending with "ed". It's not a perfect search (so to speak), but it does find some instances of the past perfect. Note that you can't simply write
because a wildcard like this must be accompanied by a string of at least three characters.
You could try this to have "had" and contracted words like "you'd" and "I'd" followed by commonly-used irregular verbs and all words ending in "ed"
d|had NEAR/1 been|beaten|become|begun|bet|blown|broken|brought|built|burst|bought|caught|chosen|come|cost|cut|dealt|done|drawn|drunk|driven|eaten|fallen|fed|felt|fought|found|flown|forgotten|frozen|gotten|given|gone|grown|hung|had|heard|hidden|hit|held|hurt|kept|known|laid|led|left|lent|let|lain|lit|lost|made|meant|met|paid|put|read|ridden|rung|risen|run|seen|sold|sent|set|shaken|stolen|shone|shot|shown|shut|sung|sunk|sat|slept|slid|spoken|spent|sprung|stood|stuck|sworn|swept|swum|swung|taken|taught|torn|told|thought|thrown|understood|woken|worn|woven|won|written|*aed|*bed|*ced|*ded|*eed|*fed|*ged|*hed|*ied|*jed|*ked|*led|*med|*ned|*oed|*ped|*qed|*red|*sed|*ted|*ued|*ved|*wed|*xed|*yed|*zed
Fewest Words First
Sentences with the Most Words First
To find sentences not yet translated into your own native language, you can fine-tune any of the above by choosing "exclude" and your own native language under "Translations" on the right side of the advanced search results.
For example, this is the first search above, limited to sentences not yet translated into Dutch.
** Some Ideas for Being Able to See More of These Sentences than the 1,000 Sentence Limit.
- Limit the searches to ones with audio first.
- And then, limit the search to ones without audio.
- Do both of the above for all the versions of the searches above.
For more sentences, you can also change the list from "proofread" to "unspecified."
>Do you mean that you're having trouble finding sentences in the past perfect, perhaps because you want to translate them?
Nope, but thanks, anyway. I just started wondering about it after seeing so many ambiguous occurrences of the ''past simple'' while translating random sentences. I know that CK uses American English so that's probably why. Maybe the past perfect tense isn't used as much as it is in German, Italian, Dutch, etc. ?
nr. 2790613 is another example, look at the German translation: ''was er getan hat''
So, Pfirsichbaeumchen's reply is a good pointer.
Just to be clear: The examples we've been talking about have dealt with indirect speech ("He|she said|thought that..."). And yes, at least in US English (and maybe in English from elsewhere), we don't think very much about the distinction between "Mary said she didn’t have very much money" and "Mary said she doesn’t have very much money." The assumption is that if she didn't have much money at the time that she made the statement, she probably doesn't have much money in general. If she had suddenly won the lottery since then, the sentence would emphasize that through other means, such as adverbial phrases ("At the time, Mary said she didn't have very much money"). But in the absence of such context, as Pfirsichbaeumchen said, you can generally translate it either way.
>"Mary said she didn’t have very much money" and "Mary said she doesn’t have very much money."
That's something else entirely, though. In Dutch it works the same way: ''Mary zei dat ze niet veel geld -had-'' and ''Mary zei dat ze niet veel geld -heeft-'' both, in the absence of adverbial phrases, mean the same thing.
Tatoeba kun on vapaaehtoishanke, yleensä ei kannata valittaa siitä että jotain on liikaa. Sen sijaan kannattaa lisätä itse sitä, mitä näkee liian vähän.
Eli: Kirjoita uusia hollanninkielisiä lauseita, jotka käyttävät aliedustettua aikamuotoa. Kun käännät lauseita, lisää järjestelmällisesti kaikenlaiset käännökset, tai vuorottele niiden välillä, tai lisää ensisijaisesti aliedustettuja käännöksiä.
(Myös nimet ”Tom” ja ”Mary” ovat yliedustettuja monissa kielissä. Suosittelen käyttämään muitakin nimiä ja erityisesti lisäämään lauseita, joissa on hollannille ominaisia nimiä.)
CK, I wonder why English-Esperanto doesn't show up in your list?
Given the very nature of Esperanto as a constructed language, there are very few "native" speakers. (There is no "Esperantlando" country yet!) Still, with over 610,000 sentences--the fifth largest in the corpus--it is not negligeable either. (Understandably, not all of them would make it to List 907.)
On your list, there are many languages with only a few hundreds sentences (e.g. Albanian, Cebuano, Tamil etc.) With millions of Esperanto speakers world-wide, it is quite possible that some speakers of those few-sentence-languages know Esperanto. By making English-Esperanto available, it could provide them an additional venue to explore English further.
A question regarding Tatoeba's export of Japanese-English sentence pairs, as hosted on Jim Breen's page ( ftp.monash.edu.au/pub/nihongo ):
Is the script that generates that file public?
Especially the part creating the accompanying "B" lines with furigana, dictionary forms and such.
I'm guessing there's some tokenizer like Mecab and Juman involved, but would it be possible to share the parameters used to be able to recreate the same output format?
There's quite a lot of software available that reads it, it would be excellent if one could use it with custom input.
IIRC, the current tokenizer is Jim Breen himself, manually annotating sentences in this format. I guess it would be possible to take the tokenization e.g. Mecab outputs and massage it into something more like the "B lines" format, but it won't be of similar quality.
You could ask Jim via a private message. His username is JimBreen.
I was wondering what the policy concerning sentences containing URLs is, if there is any. I can easily imagine people abusing it, but I also can easily imagine how to control it. So just being curious
Hallo Liebe Leute!
Ich habe eine frage und zwar... wenn ich nach etwas auf Deutsch suche wird mir nur 1.000 Ergebnisse oder Sätze gezeigt. Wie kann ich alle Treffern finden! Die 489.215 Treffern. Danke
Jos haluat vain nähdä uusia lauseita, voit etsiä satunnaisia lauseita. Luultavasti et ole nähnyt kaikkia näytettäviä. https://tatoeba.org/eng/sentenc...&sort_reverse=
Mikäli haluat kääntää lauseita, niin kiellä hakua näyttämästä lauseita, joilla on jo käännös. https://tatoeba.org/eng/sentenc...&sort_reverse=
Molemmissa tapauksissa vaihda etsittävää sanaa oikeassa sivupalkissa olevasta valikosta ja etsi myös käyttäen oikean sivupalkin valikkoa (tietokonenäkymässä, en tiedä muista laitteista).
Jos haluat käyttää lauseita johonkin muuhun tarkoitukseen, kannattanee ladata tietokanta ja etsiä lauseita sieltä. Muut osaavat auttaa tässä minua paremmin.
Wenn du alle deutschen Sätze brauchst, musst du dir entweder http://downloads.tatoeba.org/ex...tences.tar.bz2 oder http://downloads.tatoeba.org/ex...tailed.tar.bz2 herunterladen und dann die deutschen Sätze herausfiltern. (siehe https://tatoeba.org/deu/downloads für mehr Infos).
** Members' Langauges **
Sorted by native language, and can be resorted by username.
Sorted by username
My friends, is there a way to look for members which native language is X and speaks Y?
Example: Native people in English which speak Portuguese.
I am afraid that at present, there is no way to do that other than checking their profiles.
Not exactly what you're looking for, but you can find all profiles explicitly mentioning English and Portuguese:
Deniko's idea is nice. I'd never thought of doing it that way.
Another way is to download the user_languages.tar.bz2 from tatoeba.org and browse it offline.
Here is data from that file, sorted by username and then by claimed language level. You can download this and browse it. (5 = native language)
З Днем народження, Денисе! Happy birthday, Denis! 😊
Thanks a lot :) It's amazing you haven't forgotten to use the vocative case, nice one!
Happy birthday, Deniko!!!!
Thanks a lot Ricardo :)
A l'occasion du nouvel an amazigh 2970 qui coïncide avec la deuxième année depuis sa consécration constitutionnelle par l'Etat Algérien, je transmets mes vœux de bonheur et de prospérité, d'abord aux militants de la grande Tamazgha, ensuite aux linguistes qui ne ménagent aucun effort pour faire de Tamazight l'instrument de l'unité nord-africaine et enfin à toute l'humanité attachée aux valeurs du vivre-ensemble dans le respect de nos différences.
Aseggaz ameggaz i waytma d sitma!
Happy New Amazigh year to evryone in Algeria and the world.
Here is a list of English translations of the twenty most viral sentences on Toatoeba in 2019.
Bedouins live in the desert.
We live in a society.
This is not his handwriting.
Tom has friends in Germany.
My name is Omid.
The cat is big.
My name is Dilshad.
They will not pass!
The queen must die.
Can you understand Tom?
Is she Italian?
Marina is from Russia and Clarissa is from Sweden.
My bicycle is red.
Tom's cat is sick.
Kigali is the capital of Rwanda.
My cat meows a lot.
Tom works at a hospital.
India is my country.
My father didn't know her.
The big difference with Youtube is that on Tatoeba, Tom is even more popular than cats! No mention of Mary in the top 20 though...
These may not have really been viral, since almost all translations for the first sentence are by one member.
[#7697543] Bedouins live in the desert.
I didn't check the others.
Half of the names for people, though two-thirds of the often male names (Omid and Dilshid seem to not be gender-specific), are 'Tom'. This is still far too many for purposes of name diversity.
On the other hand, 'Mary' and other too often used 'standard' words did not propagate severely, according to this list.
Tom is the most used non stopword of the French corpus.