clear
{{language.name}} Keine Sprache gefunden.
swap_horiz
{{language.name}} Keine Sprache gefunden.
search

Pinnwand (5418 Diskussionen)

CK
CK
vor 9 Stunden
** Stats - 2019-07-20 - The Number of English Sentences on List 907 That Have Translations by Native Speakers **

http://tatoeba.byethost3.com/stats-190720.html

The last time I generated these stats was in 2017.
Aiji
vor 15 Stunden - vor 15 Stunden
UX-ly speaking (yes, I'm creative word-maker), I think that would be a good idea to add a question-mark pop-up on the right of the "Order" select box of the advanced search to explain what this mysterious "Relevance" is about. Right now, that's pretty much 100% unclear.
Antworten verbergen
AlanF_US
vor 13 Stunden
I agree.
CK
CK
vor 4 Tagen - vor 4 Tagen
I'm looking for volunteers to help me tag English sentences.

I have a large list of sentences that need both the "imperative" tag and the "present simple" tag.

If you think it might be interesting to tag such sentences, while perhaps at the same time translate them, please send me a private message, and I'll send you a list of clickable links to sentences.

https://tatoeba.org/eng/private_messages/write/CK


Note that you need to be an "advanced contributor" (or higher) to tag sentences.
https://en.wiki.tatoeba.org/art...d-contributors
Antworten verbergen
Smoky
vor 3 Tagen
Do you provide free snacks and drinks to volunteers?
soliloquist
vor 3 Tagen
I thought you were using some bot/script for the 'List 907' tag. It requires a lot of effort to tag that many sentences one by one. I, too, have thousands of sentences that need to be tagged, but it's discouraging having to visit each sentence's page.

Let's hope the mass-tagging feature will be implemented in the future.

https://github.com/Tatoeba/tatoeba2/issues/785
Antworten verbergen
Guybrush88
vor 2 Tagen
I do agree with this, it would be a very useful feature also for me, since from time to time I tag Italian sentences.
Antworten verbergen
soliloquist
vor 2 Tagen
Yes, that would be a great time-saver for you, too. You're the second most active user after CK in terms of tagging. (+182,756)

https://tatoeba.j-langtools.com...art/chart2.php
Ricardo14
vor 2 Tagen
Me too. I do want to tag many sentences.

For example, whenever a sentence begins by "Eu" in Portuguese it'll be tagged as "1st Person Singular".
Austrália, França, Canadá - country
amo, quero, vivo, estou - presente do indicativo

and so on....
Antworten verbergen
CK
CK
vor 2 Tagen
Note that it is possible to search for Portuguese sentences starting with "Eu."

https://tatoeba.org/eng/sentenc...eng&sort=words
Ricardo14
vor 2 Tagen
I have updated the request on GitHub - https://github.com/Tatoeba/tato...ment-512976729
Antworten verbergen
soliloquist
vor 2 Tagen - vor 2 Tagen
Thanks!

Edit: I made a suggestion.

https://github.com/Tatoeba/tato...ment-512998686
CK
CK
gestern
Here is another way to visit pages that need to be tagged with both "imperative" and "present simple".

http://tatoeba.byethost3.com/ta...mperative.html
It works on a Macintosh using Chrome, but not in Firefox or Safari.

If it works for you, it might be fun, but if not, sorry.

If you click any of the buttons to get to the next page, you will hear text-to-speech of the upcoming sentence while the page is still loading.

Note that none of the sentence here duplicate the ones that I've sent volunteers, so those volunteers don't need to worry about overlapping sentences that other members may tag using this way.
MacGyver
vor 6 Tagen - vor 6 Tagen
If you're looking for words to write example sentences for Tatoeba, then you should look at the arrows '<-- increase'. They indicate the words that should appear more in the Tatoeba Corpus. The second column of numbers indicates how many times that word should appear in Tatoeba in order to have the same frequency (proportionally) it has in the OpenSubtitles.com corpus.

I downloaded a file from http://opus.nlpl.eu/OpenSubtitles-v2018.php with 441.5M sentences in English and wrote a script to create a frequency list of words (the list in unformatted, i.e., don't -> don + t, etc). I did the same thing with all the sentences in English owned by native speakers here in the Tatoeba corpus. After comparing the two frequency lists, I compiled a file that gives you an idea of how the frequency of the words in Tatoeba/English compares to that of the OpenSubtitles/English corpus:

A sample of the file:
word \t occurrences in Tatoeba \t how many there should be (proportionally)
tom 359494 723.9471614662318
i 303688 284673.5980557653
to 294034 174335.06447556426
that 234251 105369.07199955775
the 199956 231733.21163537377 <-- increase
t 197499 101493.03356767741
you 176213 305392.1646249775 <-- increase
mary 142565 625.8323390462842
a 130684 149839.41303818647 <-- increase
do 120631 46119.76310532704
he 106705 57203.45769471371
is 106175 74813.6430558621
and 89045 104790.98523257893 <-- increase
was 76168 43791.06207458618
s 75846 152917.0533331018 <-- increase
in 73907 75330.66885961362 <-- increase
it 70138 140934.43701484663 <-- increase
of 68495 90226.5530729227 <-- increase
she 64731 27836.557998979915
be 60860 43264.01316312008
they 59631 32147.915102217754
me 56818 68010.83329256669 <-- increase
have 53539 48189.83034799129
said 53431 8299.380929768524
we 49383 72168.9096779763 <-- increase
know 49071 41497.24193626721
don 48290 43850.854478915724
for 47362 52364.353226404884 <-- increase
what 46221 73483.70680135512 <-- increase
didn 44793 11179.517051257248
think 42518 19066.90288432038
this 41598 60412.53263081548 <-- increase
are 37926 42973.06544886179 <-- increase
can 37084 39846.78412853941 <-- increase
with 36257 38338.81981074213 <-- increase
his 35673 17064.111051060263
on 35404 51944.97959418119 <-- increase
not 34972 43740.29882476427 <-- increase
her 33423 21676.844915118265
m 32329 48312.26568311476 <-- increase
my 31459 51048.86660395457 <-- increase
want 29463 20022.39637034032
told 28727 5296.513973554256
like 27819 30102.40119012533 <-- increase
at 27399 24364.425735303917
did 27199 17909.338412841887
has 25722 10076.924460296785
going 25540 15317.881523817805
as 24937 17656.581783517522
go 23405 28750.83973746701 <-- increase


the file: https://github.com/sidfc/Langua...ng_ordered.txt

There are also files for: deu, fra, spa, por, ita, pol, rus
Antworten verbergen
Thanuir
vor 6 Tagen
These all are very common words with huge numbers of sentences. Is there a particular reason for actively adding many more to get the same frequencies as the opensubtitles database?
Antworten verbergen
CK
CK
vor 6 Tagen
I wondered the same thing.

Here are the first 100 words with "<-- increase".

the, you, a, and, in, it, of, me, we, for, what, this, are, can, with, on, not, my, like, go, him, your, there, if, about, here, all, one, get, out, up, from, good, just, but, no, them, an, so, let, now, more, say, got, where, see, come, back, some, too, something, take, people, right, make, our, way, or, well, into, please, look, give, over, off, find, new, must, little, other, put, first, after, down, love, old, years, things, night, am, even, believe, man, two, life, away, being, nothing, came, wrong, these, father, understand, feel, looking, wait, stop, because, thing, call

Likely it would be more useful to find words that are high on word frequency lists that are missing from the Tatoeba Corpus. Perhaps you could generate such lists, putting the words in frequency order.


Antworten verbergen
Objectivesea
vor 6 Tagen
CK wrote: “Likely it would be more useful to find words that are high on word frequency lists that are missing from the Tatoeba Corpus. Perhaps you could generate such lists, putting the words in frequency order.”

I strongly agree with this suggestion. There are various frequency dictionaries for individual languages published by Routledge or by the Leipziger Universitätsverlag. Typically, these dictionaries list a large number of words (with definitions to prevent confusion with respect to homographs that have distinct meanings) — 5,000 or 10,000 or so. It would be nice to find words on the Routledge or Leipzig lists that are not yet found in the Tatoeba database for that particular language. These missing words could then be arranged in frequency order based on one of these published frequency dictionaries.

As CK notes, the order of the first 100 words or so is not very significant; the frequency depends to a great degree on the particular database selected — whether words are taken from newspaper text, from fiction works, from scientific papers, or transcribed from oral conversations. Indeed, when compiling concordances to works like the Bible or Shakespeare, etc., the most frequent 100 or so words in that corpus are placed on a “stop list” to be ignored by the computer preparing the concordance.

When learning a language, however, it can be very helpful to prioritize the most common words. Thus, even knowing 1,000 or 2,000 words can dramatically boost one's ability to understand that language and to speak fluently. Because Esperanto has extremely regular word-formation rules, a vocabulary of as few as 600 or 700 Esperanto words can be the equivalent of knowing 2,000 words in German, French, Russian or Spanish, etc.

An interesting article (at https://glanier.wordpress.com/2...arning-greek/) points out that introductory Greek courses often focus on the 310 most frequent words encountered in the New Testament, which enables a student “to read 80% of the NT without using a dictionary.”

If our user @MacGyver were able to generate, say, lists of the most common words (frequency order from 100 to 1,000) for English, Italian, Russian, Turkish and Esperanto — the five languages at Tatoeba which currently have the most sentences each) and compare the lists with the Tatoeba database, we would learn which particular high-frequency words are underrepresented in the Tatoeba database. Then contributors motivated to create sentences could try to focus on sentences utilizing those words. I think this might greatly improve the utlity of Tatoeba to language learners using the strategy of first learning the most frequently spoken words.
Antworten verbergen
Thanuir
vor 6 Tagen
Quite unrelated, but I had to check what a concordance or compiling one means. Could you add a sentence or two to this effect to Tatoeba? These would be precisely the kind of material that an advanced learner finds useful.

I also added "compile" and "concordance" to my vocabulary.
MacGyver
vor 5 Tagen - vor 5 Tagen
I got a few lists of words online and compiled the following files:

A list with the top ~3k words ordered by frequency of occurrence in the OpSub corpus: https://github.com/sidfc/Langua...atoeba_v01.txt

A list with the top ~10k words ordered by frequency of occurrence in the OpSub corpus: https://github.com/sidfc/Langua...atoeba_v01.txt

They are organized as follows:
column 1 = the word
column 2 = occurrences of the word in OpenSubtitles.com (the file is ordered by this column)
column 3 = occurrences of the word in Tatoeba (only sentences by native users were considered)
column 4 = indicates how many times the word 'should' appear in Tatoeba
column 5 = it only shows for words that have less than 50% of the occurrences it 'should have' in Tatoeba

As an example, the words 'indistinct', 'limitation', 'restriction', 'annulment', 'inaudible', 'flare', 'abduction', 'depot', 'decoy', 'deposition', 'cheater', 'retainer', 'hypothetically', 'caress', 'rebound', 'sleepover', 'riddance', 'relive', 'proxy', 'onward', 'visitation', 'envoy', 'reptile', 'viewer', 'proclaim', 'retrieval', 'canvass', 'caterer', 'abduct', 'withhold', have ZERO occurrences in this site (considering only sentences by natives).

I only use OpSub to measure the frequency of words, not as a source of words (there are too many wrong words in there). So, there isn't much I can do in order to generate a good/useful list of most frequent words (in any language).
MacGyver
vor 5 Tagen - vor 5 Tagen
* UPDATE *

Now using data from the British National Corpus.

A list with the top 30k words ordered by frequency of occurrence in the BNC corpus: https://github.com/sidfc/Langua...atoeba_v01.txt

You need to look at the second column of numbers (third column from left to right) to find words that have a low number of occurrences in Tatoeba.
Antworten verbergen
Ricardo14
vor 2 Tagen
MacGyver - Would that be possible to generate a list of words in Portuguese that were not "posted" on Tatoeba?
sharptoothed
vor 5 Tagen
** Stats & Graphs **

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/
Antworten verbergen
CK
CK
vor 5 Tagen
With so many non-native submissions last week, the "Count only native contributions" option is worth clicking. https://tatoeba.j-langtools.com/userchart/?nonly=1
deniko
vor 5 Tagen
Is it only me, or the site seems to be down?
Antworten verbergen
sharptoothed
vor 5 Tagen
I see no problem.
Antworten verbergen
deniko
vor 5 Tagen - vor 5 Tagen
Weird, this is what I'm seeing trying to open it:

https://i.imgur.com/J9Q312c.png

Might be our firewall, of course, but everything else seems to be working fine though.

Obviously, I see the same when I go to the main page:

https://tatoeba.j-langtools.com/allstats/
Antworten verbergen
sharptoothed
vor 5 Tagen - vor 5 Tagen
Maybe your ISP experiences some connectivity problem. Try running tracert / traceroute utility from your computer. You should see something like this:
https://2whois.ru/?t=traceroute...-langtools.com
Antworten verbergen
deniko
vor 5 Tagen
tracert doesn't seem to work from my computer - probably, again, because of some proxy settings.

It turned out I can open it from my phone just fine, so it does seem like a problem with my proxy server.

Guybrush88
vor 2 Tagen
Thanks
Antworten verbergen
sharptoothed
vor 2 Tagen
You're welcome :-)
maaster
vor 15 Tagen
I've finished my contribution on Tatoeba because of colleague mraz - as others finished it as well.
If he continues to unlink my Hungarian-Hungarian sentence pairs with the same meaning, I'll systematically delete all my translations, all my sentences.

I wrote about the problem months ago, nothing happened. Since then many of Hungarian members gave up Tatoeba.
Antworten verbergen
Pfirsichbaeumchen
vor 15 Tagen
Sent a private message.
Antworten verbergen
mraz
vor 15 Tagen - vor 15 Tagen
Antworten verbergen
Pandaa
vor 14 Tagen - vor 14 Tagen
Pandaa
vor 15 Tagen
Megkérdezhetném, hogy mi ez a perpatvar köztetek?
Antworten verbergen
maaster
vor 6 Tagen - vor 5 Tagen
Ez nem csak kettőnk között. Csak a többi látszólag megelégelve az egészet szép csöndben távozott okos enged alapon - én a szamár szerepét választottam.
Thanuir
vor 14 Tagen
Olisi suuri vahinko tietokannalle, jos poistaisit lauseesi. Toivottavasti päädytte jonkinlaiseen sopuun, tai jos päädyt poistumaan, niin tilin sulkeminen ja sähköpostimuistutusten lopettaminen riittää.

...

Ehdottaisin aselepoa, eli että ette linkitä tai poista linkityksiä toistenne lauseista. Olettaisin ongelmien johtuvan eriävistä tulkinnoista koskien linkityksen merkitystä. Olette toivottavasti yrittäneet keskustella asiasta jo. Kenties joku muu voisi toimia sovittelijana asiassa?
Antworten verbergen
AlanF_US
vor 14 Tagen
Pfirsichbaeumchen said that she sent a private message, so I assume she's dealing with the situation. I hope it can be resolved to everyone's satisfaction.
Antworten verbergen
Objectivesea
vor 5 Tagen
I echo the comment of AlanF_US. Sometimes two or more really bright people can accidentally "rub each other the wrong way," like bamboo stems rubbing against each other in the forest, and give rise to an unintended fire. Let's all do our best to reduce friction and also try to help make Tatoeba continue to grow as an innovative help for language learners all over the world. The contributions of many can overcome the limitations of a few. Please let's not allow a temporary irritation with one or two contributors to reduce the great utility of the overall project. Working together, I know that we can make Tatoeba better and better.
jegaevi
vor 5 Tagen
Kérlek, ne töröld a mondataidat! Olyan nagy kár értük! Ez a passzív agresszió nem vezet semmire. Nem lehetne ezt a dolgot valahogy megoldani? Nagyon sajnálnám, ha itt hagynád a Tatoebát.
Antworten verbergen
maaster
vor 5 Tagen - vor 5 Tagen
Amelyek szétkapcsolásra kerülnek, azokat törlöm, mert nehezen érthető mondat, nem gyakori kifejezésmóddal íródott az egyik tag, hogy az ilyeneket is megismerhessék azok, akiket esetleg érdekel a magyar nyelv (mert a Tom Bostonban él típusú mondatok fordításai ezt nem teszik lehetővé) , és mellérendelve van egy könnyen emészthető, magyarázó mondat, így, szétkapcsolás után, értelmét veszti az egész.
Az az igazság, én is sajnálom őket, de sajnos nem a gondolkodás, hanem a formaság kerekedett felül.
Nagyjából ezt vártam volna el mástól is: pl. ha vki ír egy szólásfélét, akkor értelmezi azt, hogy legyen értelme a T. használatának.
Mert így többet kell keresgélni a Guglival, mint amennyit a T.-t használod.
(A solution could be chemotherapy, but I'm afraid it's too late.)

Nehéz leszokni, mint dohányosnak lehet a cigiről. Már 2x +próbáltam, de visszaeső vagyok. Nem látom értelmét továbbcsinálni, csak az időm megy rá.
Antworten verbergen
Pandaa
vor 5 Tagen - vor 5 Tagen
Nem értem, miért kellett szétkapcsolni.
Hisz példát is adtam rá, hogy létezik ilyen még C* mondattárában is.
#7472853
#6226100

* x, y, z... stb.
CK
CK
vor 6 Tagen - vor 6 Tagen
English Vocabulary Study (With Links to Tatoeba.org)

http://tatoeba.byethost3.com/vocab/

This is something I put together in October of 2016.

Older members may not remember this and new members may not have see it yet.

Ricardo14
vor 6 Tagen
Would you guys like to have a group on Telegram?
Telegram is really great and easy to use. Besides, it prevents users to know your phone number.
Antworten verbergen
odexed
vor 6 Tagen
Good idea, you could create it and put a link here.
Antworten verbergen
CK
CK
vor 7 Tagen - vor 7 Tagen
Ther are over 18,000 English sentences with audio that have no translations

Sort: Last Created
https://tatoeba.org/eng/sentenc...e&sort=created

Sort: Random
https://tatoeba.org/eng/sentenc...de&sort=random

Perhaps you would enjoy translating some of these into your own native language.

18,529 out of 433,558 (4.27%) had no translations on July 14, 2019 at 9:00 UTC.

If you want to see the sentences that have the most-recently uploaded English audio files, then you can browse my list at http://tatoeba.org/eng/sentence...direction=desc . The newest audio files are at the top.
Antworten verbergen
CK
CK
vor 8 Tagen
A Selection of English Sentences with 20 or More Alternative Translations in One Language

http://tatoeba.byethost3.com/al...019-07-13.html

I just did this for fun.