Wall (6,006 threads)
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
44 minutes ago
6 hours ago
8 hours ago
2 days ago
4 days ago
4 days ago
4 days ago
4 days ago
4 days ago
Most often, when I perform an exact search for a Japanese sentence I have found at Clozemaster, the search doesn't find anything. However, if I set "Is orphan" and "Is unapproved" to Any, the sentence will be found. I find it very irritating that I have to go through that extra step, especially if the first search is lengthy.
Try this one: https://tatoeba.org/eng/sentenc...rom=und&to=und
I can possibly understand that Tatoeba does not want to pollute the search results with dodgy sentences, but when there is an exact match I think an exception should be made.
Another issue is why there are so many Japanese sentences that seem to be mostly forgotten, having no owner and sometimes with very "imaginative" English translations. Might this partly be because they are hidden by the search interface?
Bookmark this URL.
It's a pre-filled advanced search form with what you want.
The pre-filled search form option is something that was added recently.
Thanks, that helps a bit, but not when I issue the search directly from Clozemaster.
I suggest posting a request at Clozemaster for them to change the parameters of the query issued to include at least orphans (unapproved might be more problematic) when searching for a full sentence. Searching for a single word is probably a different story, since you would probably get enough good matches without relaxing the criteria.
Many of our Japanese sentences come from the Tanaka Corpus, and we've had too few Japanese-speaking members to fix all the problems with it.
Are you sure that all of the sentences originate from here?
Or maybe the sentence was rewritten but on clozemaster the infomation hasn't changed.
I also not found one of its English sentence pair so far. (in first 100 most used words list)
Judging from my experience, I am quite sure. I don't think Clozemaster uses any other corpus than Tatoeba, and Clozemaster does not rewrite sentences, to my knowledge.
It seems that Clozemaster fetched all its Japanese sentences from Tatoeba some five to eight years ago, so any changes made after that are not reflected in its corpus. On the other hand, not a lot seems to have happened with these sentences here at Tatoeba in the meantime, anyway. But if you don't find a particular sentence, it might have been deleted.
** Stats & Graphs **
Tatoeba Stats, Graphs & Charts have been updated:
51 usernames from sharptoothed's Tatoeba User Activity Chart from last week had both a tcnt/scnt over 1 and over 1,000 native language sentences.
aldar, alexmarcelo, arh, Balamax, bandeirante, bill, brauchinet, bunbuku, CH, CK, CM, CN, danepo, deniko, diegohn, dotheduyet1999, elenacristina260, Esperantostern, felix63, felvideki, gillux, GrizaLeono, Hybrid, lbdx, LeeSooHa, Luiaard, manese, marafon, MarijnKp, martinod, Micsmithel, morbrorper, mraz, Ninja, Nylez, Objectivesea, odexed, ondo, PaulP, Pfirsichbaeumchen, po_slovensky, Ricardo14, sacredceltic, Selena777, sharptoothed, shekitten, Shishir, Silja, small_snow, Tepan, Yorwba
Source URL: https://tatoeba.j-langtools.com...=1&chtype=alla
tcnt/scnt is this guy:
If it's 1.5 this means 1000 of your sentences have 1500 translations directly linked to it (on average).
If it's <1 this means you add a lot of sentences without translations.
This number only means that how many sentences are connected your sentences.
"If it's <1 this means you add a lot of sentences without translations."
If I write only one sentence (and not translate it), but 8 other contributor tranlates that, then my tcnt/scnt number is 8.
So this is a measure of
1. how popular your language is (as a foreign language)
2. how convenient (easy, short, simple) your sentences are to translate
3. how many sentences you write as translations of others, rather than as original sentences.
I would say that it is not particularly virtuous to aim for a high number here.
I agree with you.
I don't quite understand how tcnt/scnt is determined.
The user below has 3 sentences all of which are linked to a translation, so I would expect their tcnt/scnt to be 1, but it is 0.3 on the chart.
And this user's ratio is 0.8 on the chart although their all sentences are linked to a translation.
What am I missing?
It looks like "Count only native contributions" checkbox is checked. Uncheck it and you'll see what you expected. I can't explain it right now, sorry. Maybe there's a bug in my scripts. I'll try to find it out.
Thanks. I'm sorry to have bothered you.
The user glavsaltulo has 31,220 sentences (31,193 of them are in their native language, Lithuanian).
And only 4 of their sentences are untranslated.
But their tcnt/scnt is 0.50 on the chart when 'Count only native contributions' is checked.
So there must be something that decreases this number other than untranslated and non-native sentences.
Unchecking that checkbox gives a more accurate number as you mentioned.
When that checkbox is checked only sentences created by natives counted. That is, if a member has 3 sentences and all of them are translated once but only one sentence belongs to a native then the ratio will be 1/3.
"Count only native contributions" is unchecked by default and saved in cookies, as far as I remember. This checkbox is for those who needs statistics for native contributors.
They are displayed right-to-left, but I think your question is whether they should be right-justified rather than left-justified, correct?
Yes, indeed. Thank you!
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
Check out the new "Browse by language" page: https://dev.tatoeba.org/sentences/index (note that it displays differently for guests and logged-in users).
Related issue: https://github.com/Tatoeba/tatoeba2/issues/2157
Feedback is welcome. Also, a lot of code changed under the hood, so I’d be glad if you could check if the rest of the website is working normally.
I really like it. I see a lot of influence from Wikipedia, but there's nothing wrong with that. I particularly like the display of languages by number of sentences (100,000+, 10,000+, etc.). That might be especially helpful in motivating people to add sentences in order to move their language from one group to another (for instance, another 3,000+ sentences will get Polish into the 100,000+ group).
Regarding the text "0+ sentences": Since you have a "1+ sentence" category, it should suffice to show "0 sentences". But I'm not sure why we have so many languages with no sentences. Don't we require sentences to be added for a language before we support the language? I suppose that we could have languages that have a small number of sentences that are then all deleted, for one reason or another. But displaying languages with zero sentences could just lead people to wonder why we don't support every known language, since we apparently don't even require sentences for the language before we display it.
Thanks! I’m glad you like it too. :-)
Right, the "0+ sentences" is weird. It won’t appear on tatoeba.org because all languages have at least one sentence. It does appear on dev.tatoeba.org though because we keep the list of supported languages updated without adding new sentences. I’m going change it into "0 sentences" nonetheless.
I see. That makes sense.
Thanks again for this page gillux, I'm looking forward to having this deployed on prod :)
The issues I've noticed:
1) The "unknown" language has a blank icon and clicking on it leads to an error.
2) The South Levantine Arabic icon is wrongly sized (but you have already noted that).
3) I also thought for a moment that some languages were missing an icon. For with "Tahaggart Tamahaq", if you don't know that it's one language you could be thinking that Tahaggart is one language and Tamahaq is another language and wondering why the second one doesn't have an icon.
If you could make the space between languages larger, or reduce the line-height of the language names, it would make it clear which string corresponds to the same language.
I fixed 1) and 2) already.
As for 3), I borrowed this way of displaying text from Wikipedia. It’s less of a problem for them because they localize each language names. I could change the display like you suggested, but this will make the text unaligned with other columns. I like how straight and organized it looks right now.
I’m not sure how to go about this but I’d rather try to approach the problem differently.
The new "Browse by language" page is now deployed on tatoeba.org.
There are now 777,777 sentences on List 907. 513,553 (66%) of these have audio.
This is the list of good proofread English sentences that I use on my projects. http://www.manythings.org/corpus/tatoeba.html
Bilingual sentence pairs made up of these sentences and sentences by native speakers contributing to the Tatoeba Project can be downloaded from http://www.manythings.org/anki/ .
Screenshot showing the 777,777 number.
For comparison, here are the number of sentences for the 2nd and 3rd ranked languages on tatoeba.org.
Russian = 798,683
Italian = 767,143
Link to List 907.
Codidact languages site: https://languages.codidact.com/
This is an open source Q&A-community in the style of Stack exchange, but non-commercial and also the source code is open. This particular instance I linked to is about languages; maybe people here find it to be of interest.
Cool! Thank you!
I had a very brief look at the list of vocabulary words for which sentences are desired. When restricted to English words only, it nevertheless had a few anomalies.
1. ‹Lautgesetzlich›, clearly a German word, is mislabelled as English.
2. ‹thunderstrock› is misspelled; it should be ‹thunderstruck›.
3. ‹щзхлзщхлщзх›, being written in Cyrillic letters, cannot be English. Perhaps it is Russian or Bulgarian, but my guess is that it may be a nonsense word, as I do not see any vowels. Can it even be pronounced?
I have a similar question-what do we do when we make a mistake in a vocab request? I think I was half asleep when I requested sentences with "though" in Dutch--I used the word "though", which is obviously English and not Dutch :(
Tällä hetkellä virheellisyyksiä ja typeryyksiä ei voi poistaa listasta. Profiilissani on joitakin käytännön ehdotuksia.
A rough English version of @Thanuir's Finnish remarks:
> You can remove words which you yourself have added to the vocabulary.
> At this time, inaccuracies and stupidities cannot be removed from the list.
> There are some practical suggestions on my profile.
Thank you, Thanuir, for your helpful comments and for the very helpful guidance on your profile page, at <https://tatoeba.org/eng/user/profile/Thanuir>, for contributing sentences based on proposed vocabulary items that are currently rare in the Tatoeba sentence database. Kiitos.
Vær så god.
‹щзхлзщхлщзх› That's not a word, just 4 letters that are located side by side on the Russian layout. Can be deleted.
Way too often grammatical gender or number are being lost in indirect translations.
Examples: #2707014 #1612621
Any ideas how do we fix it?
Adding correct translations as 'direct' does not remove flawed indirect options immediately.
This is the expected behavior of tatoeba.org.
But @CK and all, is there no way to tag the originals as, e.g., "female speaker" or "formal" or "addressing multiple people"? It seems like this might help clear things up.
Lauseisiin voi lisätä tunnisteita. Se ei mitenkään estä linkittämästä niitä käännöksiin eikä vähennä epäsuoria käännöksiä.
There is a technical problem going on. When you have multiple pages for a search result, any page other than the first have the no result message.
Anybody else see that?
I saw that, too, and I reported it on the bug tracker: https://github.com/Tatoeba/tatoeba2/issues/2547
I was going to ask the same question since I noticed it too.
Oui, moi aussi j’ai remarqué cela.
Also, the random sorting order doesn't work, as in it kind of works, but when you reload the first page of the results (you can't go to the second anyway) you're getting the same results.
Normally in Random mode every time you reload any page you get different sentences (well, random, some of them can be the same, of course)
The problem should be fixed.
Sorry for any inconveniences.
It does work fine now, thanks for fixing it.