Menu

Could someone tell @amastan to stop adding so many sentences about Algeria?
14,031 occurrences in English-language sentences, compared to:
'France', 1,001 occurrences,
'India', 381
'China' , 1,296 ,
'America', 1,098 ,
'United States', 1,112
'Japan', 1,734
'Germany', 871
All these countries have bigger populations, and are, dare I say, significantly more recognizable than Algeria.
(While ''French'' does have 13k+ occurrences, almost all of them are only about the language (as opposed to the culture/country), which is spoken by about 100 million people world-wide. ''Spanish'', with 486 million speakers yields only 870 results,)

He also seems obsessed with transgender people.

Does being obsessed with something go against Tatoeba's rules?

Excuse me, I was just passing by, but does writing sentences about countries other than “recognized” really go against the rules of this website?

As @Nuel pointed out, Tatoeba's own guideline: ''Avoid using the same words, names, topics or patterns over and over again.''
And it's definitely not coincidence: almost all of these 14,000 sentences are from one single user.
Mere happenstance could never result in that many sentences about Algeria. This project should be representative of the entire world, or at the very least of places most people will be familiar with or actually live in (India, USA, China).
Translating on here gets boring/annoying pretty fast when every other sentence is about Algeria, not to mention the fact you have to filter out sentences containing ''Algeria'' each time you download sentences. We could start spamming different countries as a knee-jerk reaction, but that'd be just as bad.

How Algeria is worse than Tom and Mary?

+1

Everyone here has the same means of contacting Amastan, which is to send a private message. So you can in fact tell him to stop adding so many sentences about Algeria, as can anyone else. Doing so directly in private is also more likely to reach the intended recipient than posting on the wall. (I'll send him a link to this wall post so he has the opportunity to respond.)
But if you're actually hoping for an admin to order Amastan to stop, that's unlikely to happen, since the last time a similar issue came up the verdict was that
> As a general rule, no action will be taken against a contributor based on the sole fact that they are creating new sentences with a name that has been overused.
https://blog.tatoeba.org/2019/0...h-tom-and.html
If you want to avoid English sentences which overuse certain words, you might want to restrict your searches to lbdx's "Pruned English Corpus" list, i.e. https://tatoeba.org/en/sentence...ny&sort=random
It's also worth pondering what Amastan is supposed to do instead. After all, telling someone to stop doing something they think is the right thing to do is unlikely to be effective, but convincing them that there's something else they could do that would be even better might work.
As far as I know, Amastan is Algerian, so it is indeed not mere happenstance that he added so many sentences about Algeria, but also not terribly surprising. However, it does seem like many of these sentences could be equally said about pretty much any country, which might be why you consider them boring.
But surely Amastan doesn't want us to think Algeria is boring! So maybe there is room for mutually beneficial cooperation here. Do you think some of the sentences are less boring than others? For example, I think that sentences that are less generic and more specifically about Algeria, like #7811221 "Algeria and Morocco are the only North African nations that recognize Berber as an official language." are a bit more interesting. What do you think?
Maybe Amastan would also be up for creating more sentences that are about Algeria without explicitly stating as much. (In keeping with the old adage that writers should "show, not tell.")

If everyone stop adding almost useless and empty short sentences with the same words over and over again, perhaps, Amastan will also stop acting so.
I think that may be his reaction against Tom- and/or Australia-sentences.
(I still think these sentences are the first reason that the most contributors finish using Tatoeba after translating three (or much more) sentences.)
(It may be just a double standard against Amastan. Am I wrong?)

> It may be just a double standard against Amastan. Am I wrong?
You're right, Amastan isn't the first to stuff his sentences with pervasive words. But he's by far the one who's been producing the most of them lately. Last month, he added 16,000 original English sentences. The other two main English contributors added only 1,000. The figures are available at https://colab.research.google.c...=5&uniqifier=1
Unfortunately, the people who seem to be in charge of the project refuse to see this as a problem. How about a monthly cap on the number of original sentences in one language? 3,000 seems like a reasonable number to me...
As this issue comes up regularly on the wall, I would like the Tatoeba community to comment on this measure. If you think such a cap is appropriate, please post a plus sign in the comments of this post. If you want to show your disapproval, please post a minus.

+

+

+

-

When the corpus is domated by Western names Tom and Mary, everyone is OK with that. But when a non-Western place name Algeria gets used, people want some caps.
This is cultural colonialism, plain and simple.

You're wrong. "Everyone" is not OK with Tom and Mary ad infinitum.
I still remember when a certain member here was uploading tens of thousands of sentences in one go once a month (using scripts), which is partly why we're lumbered now with Tom and Mary. Above all, it's that sort of behaviour, now being exhibited by another user, that I'd like to see some "caps" on. I wish it had been done years ago.

Well, if that gets instituted after Algeria and not after Tom and Mary, that still shows the bias of the community.

Maybe. But you're still mischaracterising people here. We're not all the same.

You're right, I guess.

That said, though, I personally don't like some users' use of this site to, as they see it, push their agenda.
However, my point in supporting a cap is about *volume*. As I said, I wish it had been done years before this latest user started the deluge. Tatoeba seems to be at the mercy of anyone motivated enough to dominate its contents.

> some users' use of this site to, as they see it, push their agenda
I don't believe in "not having an agenda". We all have our views, and the sentences we add necessarily reflect our views. Everyone has an agenda.

We all have views. That's stating the obvious.

There is a difference between "having views" and "having an agenda". Anyway, if you will: my "agenda" is that the only accepted agenda on Tatoeba should be the passion for creating (collecting, in some cases) high-quality linguistic content across the planet.

What it looks like to me is that there are two separate issues, and that they've been conflated. I see these two issues as:
1. A single person is adding far more sentences to the English corpus than the next two people combined.
2. People don't like that a lot of this person's sentences are about Algeria. That's too bad. More people should add sentences about their countries and cultures. He's doing nothing wrong by writing about his own, even if a lot of it amounts to political propaganda.
To address 1, I'll echo those who have suggested caps. I think the caps suggested are more than reasonable and would not prevent most users from adding the number of sentences that they are already adding.

I don't object to sentences about any country, just as I don't object to sentences about any city. As far I'm concerned, some of the best English contributions here are written by non-native speakers. They put to shame my own attempts to write in other languages. I take my hat off to those users. I'm not one of those who discourage non-native speakers from adding English sentences.
What I do object to is sentences being uploaded to the site on an industrial scale. The main priority seems to be to pump out as many sentences as possible – to what purpose, we can only guess – and let the rest of us find and correct the mistakes (a service we provide voluntarily). We all make mistakes, but in this case it's the sheer volume. Whoever you are here – native speaker or non-native speaker, admin, corpus maintainer or whatever – taking this approach is not community-minded.

Did you still have a tab open from three days ago? I got rid of the native part of that post within an hour of making it (and my original comment said non-native contributions were fine, but that the volume of non-native contributions was the problem)
At any rate, the answer seems to be a cap on daily contributions.

There are currently 17,183 occurrences of the "wildcard" country "Australia" in the corpus, and 14,651 occurrences of "Algeria." Australia is the best thing to compare Algeria to, rather than Tom and Mary, which are names (and Amastan uses many names in his sentences).
If the cap is instituted once the number of occurrences of Algeria comes to roughly equal that of Australia, will that allay your concerns? It will mean both Algeria and Australia have equally benefited from the pre-cap situation. No one could say that Algeria has been disadvantaged; in fact, one could say that instituting this cap at any point (even now) makes it harder for any country to play catch-up to Algeria.
Disclaimer: I am not an administrator and do not have the power to make offers.

It's not the first talk about wildcard words.

The point is Algeria is already almost as well represented in the corpus as Australia - better represented even, proportionally to population. And that it will soon catch up to Australia in raw numbers.
Before Amastan was adding 16,000 new sentences in a month, someone else apparently used to add a similarly high amount, but their contributions were never capped. If we did the cap after Algeria caught up to Australia, you could say that both Algeria and Australia had benefitted equally from the situation before the cap was instituted. I don't know if this is less prejudicial or not; it's an idea.

We were talking about a cap for years now. When finally we will have a cap, that long list will contain 160 thousand sentences, not just 16.

A cap is a limit. I'm talking about a limit on the number of daily contributions.
16,000 sentences by one user in a day is too many. It's not natural. I'm disabled and I don't contribute anywhere near that much, nor has anyone contributed anywhere near that much before without using scripts.

Assuming 16 hour workday, that is 1000 sentences per hour, or 17 per minute, or one per 4 seconds. Just for reference. Or one per 2 seconds if only working 8 hours on Tatoeba that day.

My apologies. It's 16,000 a month, not 16,000 a day. I misremembered what I read.
Either way, that is 16 times as much as the people who add the second-most and third-most sentences. That's something we have good reason to want to prevent, for the sake of the future of this site.
But having this discussion in the midst of a discussion asking to limit sentences about Algeria specifically taints this.

About 500 a day is much more manageable, even by hand. Thank you for correcting.

What 16 thousand sentences a day? Where? I don't see such day.

I misremembered - it was 16,000 a month.

+

+

However, I don't like that great amount of sentences with the word "Algeria" or with any other country name, either.

> I still think these sentences are the first reason that the most contributors finish using Tatoeba after translating three (or much more) sentences.
What is the presumption and what is your statement? Do we have the means to talk about worrying tendencies within contributions to Tatoeba? If so, do you have any evidence supporting this idea that trivial sentences, even in excessive amounts, somehow pose a threat to Tatoeba?
My personal impression is that it is *easy* to find useful sentences and a whole lot of mental cycles are wasted, with little constructivity, over something that isn't even consensual within the community. I can easily agree with a sort of "rate limit" on contributions as a broader measure against bruteforce takeover.
Last but not least... I have no intention to rank the reasons why somebody stops using Tatoeba but you yourself are a dubious example in my opinion, and you should at least think about that before coming up with theories. You seem to fancy far-fetched or downright undecipherable translations and you are very vocal not only against near-literal translations but upon your personal ad-hoc prescriptivism as well.
Since I was rather involved in Tatoeba via Clozemaster, I definitely see a bigger problem in coming across "maasterisms" or your annoying insistence to change translations that I consider plausible and practical because they are apparently not idiomatic enough for you.

That isn't just a presumption.
Clozemaster is not my business.
It's a voluntary job. I don't do it for other ones' pleasure.
Add much more translations on Tatoeba and read them on Clozemaster.
Perhaps, there're not enough cynical sentences on Tatoeba. You can add some.

If it's not a presumption, then maybe argue for it, or show evidence.
Also, it's kind of a strawman to talk about someone's pleasure when I said you are actively causing DISpleasure, both to users with your trademark undecipherable sentences, and to contributors with the trademark prescriptivist gatekeeping over absolutely non-representative personal fixations. This is something you ought to think about, not to deflect.

Using clozemaster after the change (now you only can do 30 sentences a day per language (as a free user)), my biggest concern no "maasterism", no "amastanism", but the mood what the sentences create, there are soooo many bad view about the world, sooo many pessimistic sentences, sooo many about death and dying.
I know, I contributed to those. But in the big picture, there are too much of them. It makes me sad. It's a sad collection about sad sentences.
Our creation is what sadeness, anger was inside us.
People probably stop using the site because they are not full of hatred or sadeness.

I also stopped using Clozemaster after that ridiculous restriction but honestly, I can't recall the sentences to be particularly bitter or depressive.
Anyway, it doesn't literally have to be Clozemaster. If you want to learn about a language by taking sentences and their translations, it's an unwanted challenge that somebody makes up odd artsy translations on a regular basis, while also telling others off for translating too literally, or using certain colloquialisms that go against his peculiar view of language protectionism. I have more confidence in my language use than to fall victim for the latter attempts but I think even the attempt is completely off for a project like Tatoeba. And the translations that barely resemble the original sentence have dubious value, the least to say.

It became small pocets of sadeness. When you on a daily basis read something about death, it seems depressive. (not only death, but suicide, suicidal thoughts, marital misbehavement... etc. (when I did 400 sentences a day I didn't recognize such thing, but this way I picked up on it)

I wrote "I think".
Nevertheless, if you had been here in 2016, you could have read supposedly the last comment and opinion of freddy1 about "empty" sentences–before leaving the project.
Ja, meg a kiseva33 vagy a jegaevi is írta erről időközben megváltozott véleményét és abbahagyta az egészet.

> Users who participated in the last 200 contributions.
>Amastan: 169
85% of the last 200, he's adding a sentence every 6 seconds, all of them are "Algeria" or "Antonio"…

To me the answer to "disproportionate number of sentences about Algeria" is to add more sentences about France, India, China, the U.S., Japan, Germany, Kenya, etc..
Granted, this is hard when someone adds such a volume of sentences that it dwarfs all others, so caps are a good idea

Aluksi Tatoebassa oli paljon Japania koskevia lauseita.
Jossain vaiheessa alkoi Tom- ja Mary-aalto, samoin kuin ranskan kieli.
Samoin Ziri, Mennad, Sami ja mitä näitä nyt on.
Nyt sitten Algeria.
Toisaalta: tämäkin menee ohi ja uusia aaltoja tulee.
Toisaalta: ilmeisesti tämä aktiivisesti ärsyttää ihmisiä.