Wall (6,291 threads)
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
an hour ago
8 hours ago
11 hours ago
14 hours ago
19 hours ago
19 hours ago
21 hours ago
Boldog születésnapot eszperantó!
1887. július 26-án jelent meg az eszperantó első tankönyve, így a mai napot tekintjük az "eszperantó nemzetközi nyelv" születésnapjának.
Sok örömet a nyelv minden ismerőjének és használójának!
Vivu la 26-a de Julio!
Vivu la 26-a de Julio!
I am new to Tatoeba. I have read the Faq but I have a question about one of the interface icons.
On some Japanese sentence translations there is a red exclamation mark triangle. What is the meaning of this, and what is it for?
See google photo screenshot example :
It means that the reading aid (called 'furigana' in Japanese) has not been verified yet and could contain errors. Welcome to Tatoeba! 🙂
* Tatoeba As A Graph *
Tatoeba internals represented as undirected graphs.
This is cool
I said a few days ago that I wanted to respond to the thread about near-duplicates ( https://tatoeba.org/en/wall/sho...#message_37321 ), but I haven't had a chance to do it until now. Trang brought up many of the things that I would want to, but there are a few more things that I want to say.
(1) Diversity of sentences benefits everyone, beginners as well as advanced learners. Even simple sentences that are designed to demonstrate variation across a single dimension (such as substitution of a pronoun) are more valuable when they contain variety in multiple respects. We are humans, not machines, and we learn best when we're not bored.
(2) Even if an overabundance of near-duplicate sentences had a positive or neutral effect on beginning learners (which, as I say, I don't believe) but disadvantaged advanced learners, then we would need to take it seriously. This is true not only for what one might call ethical reasons (not wanting to frustrate advanced learners), but for practical ones as well (we want the contributions that advanced learners make; we want to serve the needs of a broad community so that people don't feel they need to leave Tatoeba once they achieve a certain level).
(3) Even if an overabundance of near-duplicate sentences had a positive or neutral effect on speakers of language X (which again, I don't believe in general) but disadvantaged English speakers, then we would need to take it seriously. Otherwise, we'd be simply substituting "X-centric" for "English-centric".
(4) There are many kinds of sets of near-duplicate sentences, and they exist on a continuum where the extent to which they reduce the quality of the corpus increases with:
(a) the number of sentences in a particular set
(b) the simplicity of the grammatical transformations required to produce one from another (so "drop-in" substitutions that leave the rest of the sentence untouched, like replacing "everyone" with "everybody", are less valuable than those that involve changing gender or tense)
(c) the lack of word-choice transformations required to produce one from another (where, for instance, changing a noun would also require changing the verb that is associated with it, as in "damage a building" -> "injure a person")
I believe Trang's point is that we should consider how to avoid adding the near-duplicates that are towards the wrong end of the scale of usefulness.
(5) By allowing us to write sentences virtually without restriction, Tatoeba already provides us a huge degree of self-expression. Ultimately, Trang is talking about how we can write sentences so that they achieve the most good. I feel like it's reasonable to ask ourselves that question, rather than reflexively jumping to a defense of what we've always done.
I am also one of those who think that similar sentences generally bring more noise than value to Tatoeba's corpus.
In order to reduce the proportion of similar sentences, it might be useful to ask contributors for confirmation when they add an unlinked sentence whose originality is below a certain threshold.
> it might be useful to ask contributors for confirmation when they add an unlinked sentence whose originality is below a certain threshold.
That would be nice, but I'm sure it's beyond what we can do any time soon, given our limited number of developers and long backlog of requests. I think Trang's comment applies here as well:
"The reality about Tatoeba today is that it doesn't provide a full-fledged set of features for people to sort out what they possibly don't need. For all I know, it could take another ten years till we get there and during this time we cannot operate as if the necessary features were going to be rolled out tomorrow.
That being said, if someone wants to work on a technical solution to help users filter out near-duplicates, I have to remind that Tatoeba is an open source project and we're always more than happy to receive pull requests :)"
In the absence of such a feature, it comes down to contributors making an effort to write sentences that are less likely to be near-duplicates.
> In the absence of such a feature, it comes down to contributors making an effort to write sentences that are less likely to be near-duplicates.
On the "Add sentences" page, we can read "Avoid using the same words, names, topics, or patterns over and over again." Yet some very large contributors have completely ignored this recommendation for years and continue to flood the corpus with low quality sentences.
I think it is now urgent to recognize that guidelines alone will not be sufficient and that it is necessary to introduce friction if we want to curb this phenomenon.
It is true that the feature I proposed in my previous post is difficult to implement in the short term. A simpler and more radical solution would be to cap the number of unlinked sentences a contributor can add in a given time period. Similarly, it might be useful to cap the total number of unlinked sentences that a user is allowed to own.
I don't think a cap on the number of unlinked sentences is going to work, because many near-duplicates get added by linking them to existing sentences.
E.g. #252252 was adopted by CC https://tatoeba.org/en/user/profile/CC , which helpfully states in the description that it's an account created specifically for "Sentences that I have either adopted or written that have a version with contractions." Then the version with contraction was added as a new sentence #10189287 and all sentences linked to #252252 were also linked to #10189287 , thus creating a near-duplicate with many links.
A cap on unlinked sentences wouldn't prevent this, and even a cap on all sentences would be easy to circumvent with multi-accounting.
I think this kind of behavior is caused by strict adherence to the rule that correct sentences shouldn't be changed, https://en.wiki.tatoeba.org/art...are-correct%2E so CK just adds his preferred variant to a whole lot of sentences. If we change the rule to allow changes to sentences you own or adopt, provided they don't change the meaning, we might get fewer of these kinds of near-duplicates.
> many near-duplicates get added by linking them to existing sentences
We could also cap the number of links per language that a contributor can add to a sentence. This would avoid situations like #8558069 .
> even a cap on all sentences would be easy to circumvent with multi-accounting
The goal is to set clear and sufficiently deterrent limits. Those who still choose to circumvent the rules will have to work harder.
Does the list of "Vocabulary that needs sentences" (https://tatoeba.org/en/vocabula...sentences/eng) update as new sentences are added for those phrases? I like this page a lot and decided to use it to add English sentences a while ago, and I noticed that even when a given vocabulary request already has 9 sentences, the number doesn't change (and the entry doesn't disappear) after I add new sentences.
I noticed while looking around that other requests seem "frozen" based on the time they were added. "Spot on" is on page 24 of the English vocabulary requests (https://tatoeba.org/en/vocabula...s/eng?page=24) with "1 sentence," but when you click on the link to show the existing sentences, there are actually 24 sentences (https://tatoeba.org/en/sentence...&unapproved=). This is similar for a lot of other entries ("spouse" on page 25 doesn't have 1 sentence as is written, but 63). Maybe it would be more useful if the list periodically updated to reflect the number of existing sentences in the corpus. I don't know if someone has brought this up before.
I think this bug has already been reported: https://github.com/Tatoeba/tatoeba2/issues/2239
I wasn't aware. Thanks!
** Stats & Graphs **
Tatoeba Stats, Graphs & Charts have been updated:
** Tatominer **
Thanks to Yorwba, Walentinio, marafon, Objectivesea, AlanF_US, Polgar1, cojiluc, shekitten, ddnktr, small_snow, maaster, Rafik, glavsaltulo, iiujik, Jeigmz, Shishir, aldar, megamanenm and giuliopaci for their 112 contributions that helped move the project forward this week.
Check out the most searched words that lack sentences or translations in your language at https://tatominer.netlify.app.
Hello, I am new here and I have a question: how does one report incorrect sentences? I can see no flag option.
I left a comment containing the correct translation a week ago, but no-one has responded or corrected the sentence.
Thank you for the help: I am looking at the wiki as well but it is not obvious where to find this information.
Welcome to Tatoeba, Elin!
You did the right thing by adding a comment. If you stay around for a while, you will gain the ability to add tags (such as "@check" and "@change") that are periodically checked.
Unfortunately, if you look at this page:
you'll see that Welsh has no admins, corpus maintainers, or advanced contributors, and in fact it has only three contributors in total. This makes you all the more valuable to us :) but it explains why no one has responded to you yet.
Admins and corpus maintainers can modify or delete sentences from languages that they don't know as long as they are given clear, reliable information. So now that we know that you're adding comments, we can periodically look at them.
Thank you Alan - a really helpful reply.
I have been learning Welsh (as an adult) for 6 years and am happy to help with sentences at the beginner's (Mynediad/Sylfaen) end of the spectrum. I learn De Cymraeg (South Walian) but at the Uwch 2 level I have just completed, more and more North Walian is being introduced - which will necessitate more pseudo-duplicate sentences!!
You can only add tags at least as an advanced contributor.
If you write a comment into the comment field, you can use the @-sign; e.g.: @AlanF_US and in this case Alan gets a massage.
Thank you :o)
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.