Profile
Sentences
Vocabulary
Reviews
Lists
Favorites
Comments
Comments on sundown's sentences
Wall messages
Logs
Audio
Transcriptions
Translate sundown's sentences
I agree with lbdx that a sentence limit should be introduced. It's the least that should be done.
Thanks, @lbdx. It's good to have some numbers to back up what should be obvious to anyone who cares to look a bit at the English corpus and who has shaped it. Could you give us some more detail about the sharp drop in the number of active contributors? For example, has it been across all languages and countries?
> I think any imbalance in the corpus can be fixed with improved search features and using lists. There’s room for everyone.
The idea that improved search will sort out the imbalance in the corpus I find overoptimistic, to say the least.
@lbdx Reading your description of the list in your profile, it's interesting to me that you date the imbalance of the English corpus to 2017: that's when I joined. Sharptooth's graphs show a massive increase of English sentences at that time. Until 2020 (or thereabouts), I myself had only added about 1,500 sentences.
Thanks, everyone, for your replies.
> If only one sentence has audio, that one is kept.
This is what I thought, though I wasn't sure.
This would seem to reward any users out there who have the ability to upload audio and who are motivated by numbers to 'capture' existing sentences:
add a corrected version with audio of a sentence that's already in the corpus
= inherit any translations linked to those older, 'original' sentences when they're corrected
= accrue contribution points
= gain a sentence at the expense of someone else (for those with a zero-sum mentality)
It encourages acquisitive behaviour rather than giving help to others to improve their sentences and the quality of the corpus.
I've been on the receiving end of this a few times. It's only because I kicked up a fuss for all to see that the user in question 'generously' released the sentences he'd gained. Over the years I've seen it happen to others, I would say, multiple times: a comment is left on an incorrect sentence, and when it's eventually edited, it then merges with a newer sentence which has audio.
I don't know about anyone else, but it doesn't seem fair to me.
Now, obviously, this can happen entirely unintentionally. But I don't believe that it always does.
I'm interested in what happens when sentences merge.
When two sentences merge – one that has audio with another that hasn't – is ownership always retained by the owner of the sentence with audio, regardless of which sentence was created first?
If both sentences have audio, is ownership always retained by the owner of the sentence that was created first?
Does the number of translations a sentence has play any role? Does anything else play a role, such as sentence ratings?
I'm not a programmer. If I haven't worded these questions clearly, please let me know.
I don't object to sentences about any country, just as I don't object to sentences about any city. As far I'm concerned, some of the best English contributions here are written by non-native speakers. They put to shame my own attempts to write in other languages. I take my hat off to those users. I'm not one of those who discourage non-native speakers from adding English sentences.
What I do object to is sentences being uploaded to the site on an industrial scale. The main priority seems to be to pump out as many sentences as possible – to what purpose, we can only guess – and let the rest of us find and correct the mistakes (a service we provide voluntarily). We all make mistakes, but in this case it's the sheer volume. Whoever you are here – native speaker or non-native speaker, admin, corpus maintainer or whatever – taking this approach is not community-minded.
We all have views. That's stating the obvious.
That said, though, I personally don't like some users' use of this site to, as they see it, push their agenda.
However, my point in supporting a cap is about *volume*. As I said, I wish it had been done years before this latest user started the deluge. Tatoeba seems to be at the mercy of anyone motivated enough to dominate its contents.
Maybe. But you're still mischaracterising people here. We're not all the same.
You're wrong. "Everyone" is not OK with Tom and Mary ad infinitum.
I still remember when a certain member here was uploading tens of thousands of sentences in one go once a month (using scripts), which is partly why we're lumbered now with Tom and Mary. Above all, it's that sort of behaviour, now being exhibited by another user, that I'd like to see some "caps" on. I wish it had been done years ago.
+
I've hardly been able to load a page here for the past three hours.
> I am convinced that the Tatoeba Corpus would grow more harmoniously if the addition rate of original sentences was capped
I agree. Something like this should've been done years ago.
As well as sabretou's suggestions, 'as for' is a possibility.
@Yorwba
It's interesting how perception differs from reality; I would've bet that some of those percentages were higher than they are. So, thanks for that correction.
> I'd like to encourage translators to set their search parameters to ignore the names overly used in the corpus (Tom, Mary,
I translate mainly German sentences here. If I followed your advice, the number of sentences I'd have to translate would be significantly curtailed, because the most prolific authors of German sentences here are enthusiastic users of those names.
> the near-duplicates that CK wants to avoid
He seems to me to have been one of the most active producers of so-called near-duplicates here.
https://tatoeba.org/en/wall/sho...#message_39154
> Short sentences could be added with personal pronouns; longer sentences could be added with different names as well.
Often when I add a sentence with a personal pronoun – usually a translation – the same sentence, more or less, will invariably appear a bit later with the pronoun substituted with "Tom".
Thanks very much, @Cangarejo. Thanks too to @AlanF_US for the help to us non-programmers.