clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search

掲示板(スレッド数:5415)

sharptoothed
1日前
** Stats & Graphs **

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/
返信を非表示
CK
CK
1日前
With so many non-native submissions last week, the "Count only native contributions" option is worth clicking. https://tatoeba.j-langtools.com/userchart/?nonly=1
deniko
1日前
Is it only me, or the site seems to be down?
返信を非表示
sharptoothed
23時間前
I see no problem.
返信を非表示
deniko
23時間前 - 23時間前
Weird, this is what I'm seeing trying to open it:

https://i.imgur.com/J9Q312c.png

Might be our firewall, of course, but everything else seems to be working fine though.

Obviously, I see the same when I go to the main page:

https://tatoeba.j-langtools.com/allstats/
返信を非表示
sharptoothed
23時間前 - 23時間前
Maybe your ISP experiences some connectivity problem. Try running tracert / traceroute utility from your computer. You should see something like this:
https://2whois.ru/?t=traceroute...-langtools.com
返信を非表示
deniko
23時間前
tracert doesn't seem to work from my computer - probably, again, because of some proxy settings.

It turned out I can open it from my phone just fine, so it does seem like a problem with my proxy server.

maaster
11日前
I've finished my contribution on Tatoeba because of colleague mraz - as others finished it as well.
If he continues to unlink my Hungarian-Hungarian sentence pairs with the same meaning, I'll systematically delete all my translations, all my sentences.

I wrote about the problem months ago, nothing happened. Since then many of Hungarian members gave up Tatoeba.
返信を非表示
Sent a private message.
返信を非表示
mraz
11日前 - 11日前
返信を非表示
Pandaa
9日前 - 9日前
Pandaa
11日前
Megkérdezhetném, hogy mi ez a perpatvar köztetek?
返信を非表示
maaster
2日前 - 1日前
Ez nem csak kettőnk között. Csak a többi látszólag megelégelve az egészet szép csöndben távozott okos enged alapon - én a szamár szerepét választottam.
Thanuir
10日前
Olisi suuri vahinko tietokannalle, jos poistaisit lauseesi. Toivottavasti päädytte jonkinlaiseen sopuun, tai jos päädyt poistumaan, niin tilin sulkeminen ja sähköpostimuistutusten lopettaminen riittää.

...

Ehdottaisin aselepoa, eli että ette linkitä tai poista linkityksiä toistenne lauseista. Olettaisin ongelmien johtuvan eriävistä tulkinnoista koskien linkityksen merkitystä. Olette toivottavasti yrittäneet keskustella asiasta jo. Kenties joku muu voisi toimia sovittelijana asiassa?
返信を非表示
AlanF_US
10日前
Pfirsichbaeumchen said that she sent a private message, so I assume she's dealing with the situation. I hope it can be resolved to everyone's satisfaction.
返信を非表示
Objectivesea
1日前
I echo the comment of AlanF_US. Sometimes two or more really bright people can accidentally "rub each other the wrong way," like bamboo stems rubbing against each other in the forest, and give rise to an unintended fire. Let's all do our best to reduce friction and also try to help make Tatoeba continue to grow as an innovative help for language learners all over the world. The contributions of many can overcome the limitations of a few. Please let's not allow a temporary irritation with one or two contributors to reduce the great utility of the overall project. Working together, I know that we can make Tatoeba better and better.
jegaevi
1日前
Kérlek, ne töröld a mondataidat! Olyan nagy kár értük! Ez a passzív agresszió nem vezet semmire. Nem lehetne ezt a dolgot valahogy megoldani? Nagyon sajnálnám, ha itt hagynád a Tatoebát.
返信を非表示
maaster
1日前 - 1日前
Amelyek szétkapcsolásra kerülnek, azokat törlöm, mert nehezen érthető mondat, nem gyakori kifejezésmóddal íródott az egyik tag, hogy az ilyeneket is megismerhessék azok, akiket esetleg érdekel a magyar nyelv (mert a Tom Bostonban él típusú mondatok fordításai ezt nem teszik lehetővé) , és mellérendelve van egy könnyen emészthető, magyarázó mondat, így, szétkapcsolás után, értelmét veszti az egész.
Az az igazság, én is sajnálom őket, de sajnos nem a gondolkodás, hanem a formaság kerekedett felül.
Nagyjából ezt vártam volna el mástól is: pl. ha vki ír egy szólásfélét, akkor értelmezi azt, hogy legyen értelme a T. használatának.
Mert így többet kell keresgélni a Guglival, mint amennyit a T.-t használod.
(A solution could be chemotherapy, but I'm afraid it's too late.)

Nehéz leszokni, mint dohányosnak lehet a cigiről. Már 2x +próbáltam, de visszaeső vagyok. Nem látom értelmét továbbcsinálni, csak az időm megy rá.
返信を非表示
Pandaa
1日前 - 1日前
Nem értem, miért kellett szétkapcsolni.
Hisz példát is adtam rá, hogy létezik ilyen még C* mondattárában is.
#7472853
#6226100

* x, y, z... stb.
MacGyver
2日前 - 2日前
If you're looking for words to write example sentences for Tatoeba, then you should look at the arrows '<-- increase'. They indicate the words that should appear more in the Tatoeba Corpus. The second column of numbers indicates how many times that word should appear in Tatoeba in order to have the same frequency (proportionally) it has in the OpenSubtitles.com corpus.

I downloaded a file from http://opus.nlpl.eu/OpenSubtitles-v2018.php with 441.5M sentences in English and wrote a script to create a frequency list of words (the list in unformatted, i.e., don't -> don + t, etc). I did the same thing with all the sentences in English owned by native speakers here in the Tatoeba corpus. After comparing the two frequency lists, I compiled a file that gives you an idea of how the frequency of the words in Tatoeba/English compares to that of the OpenSubtitles/English corpus:

A sample of the file:
word \t occurrences in Tatoeba \t how many there should be (proportionally)
tom 359494 723.9471614662318
i 303688 284673.5980557653
to 294034 174335.06447556426
that 234251 105369.07199955775
the 199956 231733.21163537377 <-- increase
t 197499 101493.03356767741
you 176213 305392.1646249775 <-- increase
mary 142565 625.8323390462842
a 130684 149839.41303818647 <-- increase
do 120631 46119.76310532704
he 106705 57203.45769471371
is 106175 74813.6430558621
and 89045 104790.98523257893 <-- increase
was 76168 43791.06207458618
s 75846 152917.0533331018 <-- increase
in 73907 75330.66885961362 <-- increase
it 70138 140934.43701484663 <-- increase
of 68495 90226.5530729227 <-- increase
she 64731 27836.557998979915
be 60860 43264.01316312008
they 59631 32147.915102217754
me 56818 68010.83329256669 <-- increase
have 53539 48189.83034799129
said 53431 8299.380929768524
we 49383 72168.9096779763 <-- increase
know 49071 41497.24193626721
don 48290 43850.854478915724
for 47362 52364.353226404884 <-- increase
what 46221 73483.70680135512 <-- increase
didn 44793 11179.517051257248
think 42518 19066.90288432038
this 41598 60412.53263081548 <-- increase
are 37926 42973.06544886179 <-- increase
can 37084 39846.78412853941 <-- increase
with 36257 38338.81981074213 <-- increase
his 35673 17064.111051060263
on 35404 51944.97959418119 <-- increase
not 34972 43740.29882476427 <-- increase
her 33423 21676.844915118265
m 32329 48312.26568311476 <-- increase
my 31459 51048.86660395457 <-- increase
want 29463 20022.39637034032
told 28727 5296.513973554256
like 27819 30102.40119012533 <-- increase
at 27399 24364.425735303917
did 27199 17909.338412841887
has 25722 10076.924460296785
going 25540 15317.881523817805
as 24937 17656.581783517522
go 23405 28750.83973746701 <-- increase


the file: https://github.com/sidfc/Langua...ng_ordered.txt

There are also files for: deu, fra, spa, por, ita, pol, rus
返信を非表示
Thanuir
2日前
These all are very common words with huge numbers of sentences. Is there a particular reason for actively adding many more to get the same frequencies as the opensubtitles database?
返信を非表示
CK
CK
2日前
I wondered the same thing.

Here are the first 100 words with "<-- increase".

the, you, a, and, in, it, of, me, we, for, what, this, are, can, with, on, not, my, like, go, him, your, there, if, about, here, all, one, get, out, up, from, good, just, but, no, them, an, so, let, now, more, say, got, where, see, come, back, some, too, something, take, people, right, make, our, way, or, well, into, please, look, give, over, off, find, new, must, little, other, put, first, after, down, love, old, years, things, night, am, even, believe, man, two, life, away, being, nothing, came, wrong, these, father, understand, feel, looking, wait, stop, because, thing, call

Likely it would be more useful to find words that are high on word frequency lists that are missing from the Tatoeba Corpus. Perhaps you could generate such lists, putting the words in frequency order.


返信を非表示
Objectivesea
1日前
CK wrote: “Likely it would be more useful to find words that are high on word frequency lists that are missing from the Tatoeba Corpus. Perhaps you could generate such lists, putting the words in frequency order.”

I strongly agree with this suggestion. There are various frequency dictionaries for individual languages published by Routledge or by the Leipziger Universitätsverlag. Typically, these dictionaries list a large number of words (with definitions to prevent confusion with respect to homographs that have distinct meanings) — 5,000 or 10,000 or so. It would be nice to find words on the Routledge or Leipzig lists that are not yet found in the Tatoeba database for that particular language. These missing words could then be arranged in frequency order based on one of these published frequency dictionaries.

As CK notes, the order of the first 100 words or so is not very significant; the frequency depends to a great degree on the particular database selected — whether words are taken from newspaper text, from fiction works, from scientific papers, or transcribed from oral conversations. Indeed, when compiling concordances to works like the Bible or Shakespeare, etc., the most frequent 100 or so words in that corpus are placed on a “stop list” to be ignored by the computer preparing the concordance.

When learning a language, however, it can be very helpful to prioritize the most common words. Thus, even knowing 1,000 or 2,000 words can dramatically boost one's ability to understand that language and to speak fluently. Because Esperanto has extremely regular word-formation rules, a vocabulary of as few as 600 or 700 Esperanto words can be the equivalent of knowing 2,000 words in German, French, Russian or Spanish, etc.

An interesting article (at https://glanier.wordpress.com/2...arning-greek/) points out that introductory Greek courses often focus on the 310 most frequent words encountered in the New Testament, which enables a student “to read 80% of the NT without using a dictionary.”

If our user @MacGyver were able to generate, say, lists of the most common words (frequency order from 100 to 1,000) for English, Italian, Russian, Turkish and Esperanto — the five languages at Tatoeba which currently have the most sentences each) and compare the lists with the Tatoeba database, we would learn which particular high-frequency words are underrepresented in the Tatoeba database. Then contributors motivated to create sentences could try to focus on sentences utilizing those words. I think this might greatly improve the utlity of Tatoeba to language learners using the strategy of first learning the most frequently spoken words.
返信を非表示
Thanuir
1日前
Quite unrelated, but I had to check what a concordance or compiling one means. Could you add a sentence or two to this effect to Tatoeba? These would be precisely the kind of material that an advanced learner finds useful.

I also added "compile" and "concordance" to my vocabulary.
MacGyver
1日前 - 1日前
I got a few lists of words online and compiled the following files:

A list with the top ~3k words ordered by frequency of occurrence in the OpSub corpus: https://github.com/sidfc/Langua...atoeba_v01.txt

A list with the top ~10k words ordered by frequency of occurrence in the OpSub corpus: https://github.com/sidfc/Langua...atoeba_v01.txt

They are organized as follows:
column 1 = the word
column 2 = occurrences of the word in OpenSubtitles.com (the file is ordered by this column)
column 3 = occurrences of the word in Tatoeba (only sentences by native users were considered)
column 4 = indicates how many times the word 'should' appear in Tatoeba
column 5 = it only shows for words that have less than 50% of the occurrences it 'should have' in Tatoeba

As an example, the words 'indistinct', 'limitation', 'restriction', 'annulment', 'inaudible', 'flare', 'abduction', 'depot', 'decoy', 'deposition', 'cheater', 'retainer', 'hypothetically', 'caress', 'rebound', 'sleepover', 'riddance', 'relive', 'proxy', 'onward', 'visitation', 'envoy', 'reptile', 'viewer', 'proclaim', 'retrieval', 'canvass', 'caterer', 'abduct', 'withhold', have ZERO occurrences in this site (considering only sentences by natives).

I only use OpSub to measure the frequency of words, not as a source of words (there are too many wrong words in there). So, there isn't much I can do in order to generate a good/useful list of most frequent words (in any language).
MacGyver
1日前 - 1日前
* UPDATE *

Now using data from the British National Corpus.

A list with the top 30k words ordered by frequency of occurrence in the BNC corpus: https://github.com/sidfc/Langua...atoeba_v01.txt

You need to look at the second column of numbers (third column from left to right) to find words that have a low number of occurrences in Tatoeba.
CK
CK
1日前 - 1日前
English Vocabulary Study (With Links to Tatoeba.org)

http://tatoeba.byethost3.com/vocab/

This is something I put together in October of 2016.

Older members may not remember this and new members may not have see it yet.

Ricardo14
2日前
Would you guys like to have a group on Telegram?
Telegram is really great and easy to use. Besides, it prevents users to know your phone number.
返信を非表示
odexed
2日前
Good idea, you could create it and put a link here.
返信を非表示
CK
CK
2日前 - 2日前
Ther are over 18,000 English sentences with audio that have no translations

Sort: Last Created
https://tatoeba.org/eng/sentenc...e&sort=created

Sort: Random
https://tatoeba.org/eng/sentenc...de&sort=random

Perhaps you would enjoy translating some of these into your own native language.

18,529 out of 433,558 (4.27%) had no translations on July 14, 2019 at 9:00 UTC.

If you want to see the sentences that have the most-recently uploaded English audio files, then you can browse my list at http://tatoeba.org/eng/sentence...direction=desc . The newest audio files are at the top.
返信を非表示
CK
CK
3日前
A Selection of English Sentences with 20 or More Alternative Translations in One Language

http://tatoeba.byethost3.com/al...019-07-13.html

I just did this for fun.
mmorelfmc
6日前 - 6日前
Here is a situation where I would need the more experienced users to propose how to resolve it.

The problem is with Sentence #13214:

< Bin, aimes-tu le baseball ? >

The author used "Bin" which is an expression from Québec:

http://www.je-parle-quebecois.c...n/ben-bin.html

(Also the pronunciation of "in" in French doesn't really have a equivalent in English or Esperanto [I don't know about other languages]. It is not "bin" like in "storage bin". See
https://french.stackexchange.co...-a-back-vowel)

Simply said, it means in English, "Well", like in:

< Well, do you like baseball? >

or in Esperanto, "Nu", like in:

< Nu, ĉu vi ŝatas la basbalon? >

or in more international French, "Alors", like in:

< Alors, aimes-tu le baseball ? >

However, everyone who translated the original sentence assumed that "Bin" was the name of a person. This resulted in some very silly sentences like:

Do you like baseball, Bin?
Bin, houd je van honkbal?
¿Te gusta el béisbol, Bin?
Ĉu vi ŝatas la basbalon, Bin?

and with the Galician sentence, "Bin" was even replaced by "Bill".

How do we go about fixing this mass hallucination?

返信を非表示
soweli_Elepanto
6日前
Personally I think that no need to do anything about it. Those "wrong" translations indeed may be as good as the "right" ones. Your case is not unique. For example, the name "Tom", when translated _from_ Russian, can be understood as "volume", and "Tom's" can be understood not only as "of volume", but as "volumes" as well.
返信を非表示
mmorelfmc
6日前
What happens with cases like the "Tom-confused-as-volume"? If an original English sentence is about Tom but the Russian translation makes it about volume, then the translation is false. Is there a mechanism (perhaps a tag? a marker?) to indicate that the meaning of a translated sentence differs significantly from the original sentence?
返信を非表示
Impersonator
6日前
> If an original English sentence
> is about Tom but the Russian translation
> makes it about volume, then
> the translation is false.

Russian would be about *both* Tom and volume. The word 'tom' means 'volume' in Russian, and if 'volume' is placed at the beginning of the sentence, then it will be written with a capital letter.

If Russian is only about a volume, then it's a mistake and it should be unlinked from the translation.
Aiji
5日前
Considering "Bin" as a name or as the equivalent to "Ben" would both give correct translations. Of course, one could argue that "Bin" is a silly name, etc. but basically I don't think there is anything wrong (as long as the misunderstanding is not systematic)

We can also use tags, for example we have a "français du Canada" tag. Although nobody would read it when translating because it wouldn't be displayed on a results list.

And finally, one can leave a comment to say that Bin = Ben.
Objectivesea
5日前
These sorts of misapprehensions can often be prevented by supplying a usage note in a comment with an unusual word's actual meaning when creating the original sentence.
返信を非表示
mmorelfmc
5日前
Mia amiko, vi parolas la vero.
Thanuir
4日前
Contribute the translations you deem as correct. A sentence can mean several different things.
deniko
5日前
Tag auto-completion.

A while ago I made a mistake and tagged a sentence with the tag "matheamatics" instead of "mathematics". I immediately removed that tag, and added the correct one, but now every time I start typing "math" to add a tag, "matheamatics" is on the list:

https://i.imgur.com/g7Zkwq8.png

There are currently no sentences for the tag "matheamatics".

https://tatoeba.org/eng/tags/sh..._tag/10425/eng

Can auto-completion be modified not to suggests tags with zero sentences?
返信を非表示
CK
CK
6日前 - 5日前
Screenshot from 5 years ago.

https://prnt.sc/odea8o
Number of Sentences 2014-07-11 at 22.56.43.png

Screenshot from today.

https://prnt.sc/odj9b2
Number of Sentences 2019-07-11 at 18.48.08.png

Timestamps are Japan Standard Time (UTC+9)