Dear contributors, dear administrators, Hi,
First of all, I present my best wishes for the new year. 🎊💐🎉
We have already explained the interest of having this kind of sentences.
First, it's for geolocation needs by integrating these sentences in GPS systems (My colleague @Rafik explained it recently), then in AI, Chatbot, among others, because we don't have yet algorithms that can generate these types of sentences by themselves. If you know of any, please let us know.
The Kabyle language, which is a fragile language, must have more opportunities just like the endangered languages. Especially since the names of Kabyle villages are replaced by Arabic names because of the policy of Arabization.
Besides, could you tell me why there are thousands of sentences of:
Do you also think, that it is more relevant than knowing that:
Tom got married in Boston.
Tom got married in L.A.
Tom got married in N.Y. .................................etc
Then there is Tom, there is Mary, there is Mr. Jackson, Mr. White...etc. With the same verbs each time, yet we translate them with pleasure.
Many sentences are redundant!
And then thousands of sentences with the name Algeria in Berber (Lezzayer = Zzayer)
The usefulness of proselytizing sentences like (Accept Islam)
Hate phrases against the State of Israel
I have only given two examples of languages, English and Berber. There are hundreds of other examples.
As for me, I am constantly correcting even sentences with village names. I also improve them and diversify them. (Please feel free to check them out)
If some contributors are uncomfortable with translating them, I will translate them myself with my colleagues, we speak Arabic, English, French, German, Spanish...etc. No problem. Our team has grown too.
In the end and in parenthesis, you gave all the sentences of a Kabyle contributor (who changed his name to : TALWIT, who left the project, after receiving threats by the same person who creates the polemics), to the "Berber language", without consulting the Kabyle contributors, not even the person responsible for the Kabyle corpus. Almost 100,000 sentences. Do the contributors of Tatoeba know this sad event? I doubt it.
Talwit's sentences (You can check out)
What do we do with these Kabyle phrases?
Our flag was also deleted, just because it displeases the same person who creates polemics!
So what do we do with our Kabyle flag?
Finally, we know where these controversies come from, which delay and complicate things for us.
Do you have any suggestions, dear administrators?
I wish you a happy new year, too.
It is unfortunately quite true that several contributors have introduced large quantities of redundant, near-duplicate sentences in a number of languages in the corpus. Please don't follow their example. As I've said, doing so hurts your own language. We could have a far better English corpus, for instance, if it weren't diluted with so many trivially different versions of the same sentences. People who are looking for concrete examples of vocabulary or grammar in a given language, or who want to contribute translations, need diversity.
The fact that you want to improve GPS systems is laudable, but if your approach is going to make things worse for people who want to use Tatoeba for everything else, you should go elsewhere or build your own tools for this purpose. Obviously, given the large number of existing GPS systems built without the help of Tatoeba, this can be done.
It is also true that various members of the community have violated the spirit of respect for people and groups in their sentences, comments, and communication with each other. We try to remedy this as fairly as we can, which is a difficult job for a number of reasons. However, even when you are the target of such behavior, it doesn't justify watering down your corpus by adding near-duplicate sentences on an industrial scale.
I really wish there was a way to upvote messages. Anyway... +1, very much agreed.
While that’s be very social media-like, having an upvoting feature, I very agree with the message too.
We’re not here to drive them away, but we’re wanting everyone to contribute in the best manner.
I propose you better, if you allow it, then no orphan sentences will be introduced. Only sentences with at least one link. Until the languages have obtained 100% translations. At least the first 20 languages.
We will also need to recover the sentences of our Kabyle corpus, the one of the contributor "Talwit" who left this project, unfortunately.
Finally, we need to recover our Kabyle flag with which we registered in Tatoeba. Then we will start on real good bases and a real good year.
>Only sentences with at least one link. Until the languages have obtained 100% translations.
You mean you’ll stop adding original sentences and only add translations until they’re fully translated into at least one language?
Yes you get it.
> We will also need to recover the sentences of our Kabyle corpus, the one of the contributor "Talwit" who left this project, unfortunately.
To my knowledge, there’s no way to do a mass action, such as transferring all sentences of a user to another user, or changing the flag of all of them, but an option is to make that user inactive, and then all the sentences will be orphan, then you can adopt and change the flag.
The point about the flag is not only to attach these sentences to Kabyle language, but also to restore the original Kabylian flag removed by the admins. Some of the contributors still think that Berber language is the same as Kabylia one, which is not the case.
And agree with Igider regarding Orphan sentences, and we can imagine to not accepting new sentences without at least one translation, this will push to improve the corpuses.
Regarding the daily limit, if you want to create such as rules, this should be created at the application level, and prevent technically a user to add more than 1000 sentences (with at least one translation) per day, and not ban the user.
> if you want to create such as rules,, this should be created at the application level [...] and not ban the user.
Coding such a feature would probably take time. The temporary suspension of a member by the moderation team could be introduced more quickly and would allow for exceptions in certain cases.
> we can imagine to not accepting new sentences without at least one translation
This additional constraint does not seem necessary to me. Besides, a user should be allowed to contribute even if he knows only one language.
I don't think that it is something too big to code, behind this it is not really logic to ban a member because he is adding sentences and also this means that the admins needs to follow the banned accounts in order to reintegrate them again after a certain period, which needs also some development I think.
Let's see what the admins think about the propositions.
If something is not hard to code, than why do you do part of your GPS project here?
Maybe one of you have the love towards the site and convinced others to join, but surely, to get so much city, village and street names like H2O molecules in the seven seas included in the sentences won't bring anything great.
Tatoeba is a corpus, not your data centre.
@cabo I think that you didn't get my point at all, when I speak about coding I'm speaking about adding this into Tatoeba not in other projects since we are in Tatoeba here...
Again this is something that should be discussed with all the admins and see what are the rules that we should add even technically in Tatoeba to push to the corpuses improvement.
And yes Tatoeba is a very nice corpuses and it is used by several projects, companies and in every thing including GPS.
I’m not referring to the flag representing the Kabyle language, that’s a separate issue. I’m talking only about all the Talwit sentences, i don’t know of a way to do an action to all of them, other than make them adoptable by advanced contributors by making “Talwit” inactive.
We are open to all solutions can make these sentences retrieved in kab language.
Regarding the flag yes this is an issue for us.
> Regarding the daily limit, if you want to create such as rules, this should be created at the application level, and prevent technically a user to add more than 1000 sentences (with at least one translation) per day, and not ban the user.
Won’t users just create more accounts?
Indeed. That's one reason why relying on technical restrictions can only get us so far.
What if you create something what doesn't check for a cap, but allows you writing stand alone sentences when you have at least equal amount of translated ones?
Then you have no problem that someone will create other characters just to write more sentences.
What about users that only speak one language? They wouldn’t be able to translate, and so contribute.
In the Kabyle issue, seems to not be a problem, since most/all of them speak at least French.
I can't speak German, but I still can translate simple sentences.
I’m referring to other users. We talking about this Kabyle topic, but any technical limit we put on Tatoeba affects everyone.
And I don't think that someone who "only speaks one language" (first, to find this site may you know English, because I don't remember how I found this site, but I'm sure not after a Hungarian search) can't find anything to translate or can't find anything to contribute.
Happy new year to you all!
To the best of my knowledge, Talwit requested the change from Kabyle to Berber of all of his sentences, except the sentences with audio, which were unadopted and left as Kabyle and later on adopted by some of your people (as owner of the sentences, Talwit is the only one whose opinion is relevant in that matter. If there have been threats we have no knowledge about it). Rafik was informed about this process.
About the near-duplicates, if they're added as different translations of a sentence it's fine for me because people learn other words to say the same thing or different meanings of a sentence, which can be useful in some situations. If they're added with different names of streets, cities, people, I find it useless. And it doesn't matter what language we are talking about, actually most people who are against the use of Tatoeba to create that GPS system are also against the use of only the name Tom or Boston or the repetition of the very same sentence with different cities or people.
About the attacks in the sentences, I admit I'm surprised you bring up that topic, if I remember correctly you also added several offensive sentences against Islam and Algeria, which can still be found in the corpus. But as Alan said, it's not an easy matter to take care of.
And that of the flag, I think you also remember that after a long research the decision of changing the Kabyle flag was taken, and, as I don't think the circumstances that led to that decision have changed, I wouldn't expect that flag to come back.
Concerning Talwit, who was called "Ubezwi1" before, the case of his sentences is a little strange, because even if he himself chose to "change language", even today, his sentences come out in the search for Kabyle sentences (corpus ) but... with the BER flag! and people who try to translate by choosing words or random sentences, don't understand this, some of them translate without noticing, the flags are so similar, especially since he added thousands of sentences, I think somewhere the job is not well done, since it must have been the only such case on Tatoeba, the admins should see what's wrong.