menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Wall (6,756 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages subdirectory_arrow_right

Thanuir

10 hours ago

subdirectory_arrow_right

LanguageExpert

16 hours ago

feedback

CK

17 hours ago

subdirectory_arrow_right

ssvb

21 hours ago

subdirectory_arrow_right

Pfirsichbaeumchen

21 hours ago

subdirectory_arrow_right

shekitten

yesterday

subdirectory_arrow_right

ssvb

yesterday

feedback

CK

2 days ago

feedback

sharptoothed

5 days ago

subdirectory_arrow_right

janTuki

9 days ago

Igider Igider January 21, 2023, edited January 21, 2023 January 21, 2023 at 4:42:27 PM UTC, edited January 21, 2023 at 4:44:10 PM UTC link Permalink

Azul,

Seɣtaɣ ugar n 11.000 n tefyar deg kra n wussan kan (s uṭagi aṛbiɛi/adal). Mazal, arma fukkeɣ-tent-akk (≈304.000).
Ttbeddileɣ tidak n ṣṣmaṭa, akken ad lhunt u ad sɛunt lmeɛna/anamek.

Ad yili usuqel ɣer tefransist neɣ taglizit akken ad sɛunt akk assaɣen. Yak d wa i d iswi!

Mazal ad d-rnun iwiziwen/tiwiziwin, ussan-a, akken ad s-nefk ddḥis, axaṭeṛ nxuṣṣ mliḥ deg usuqel.

Afud uzmir i yal yiwet d yal yiwen.

{{vm.hiddenReplies[39449] ? 'expand_more' : 'expand_less'}} hide replies show replies
samir_t samir_t January 21, 2023 January 21, 2023 at 5:09:40 PM UTC link Permalink

Azul. Walaɣ aṭas i tseggmeḍ, d ayen igerrzen, nessaram ad seggment akk tid i d-terniḍ tineggura-a, ladɣa anda llant talɣiwin ur neddi akken ilaq deg teqbaylit, ulamma ahat d ayen ara k-yeṭṭfen akud, l. Afud iǧehden.

{{vm.hiddenReplies[39452] ? 'expand_more' : 'expand_less'}} hide replies show replies
Igider Igider January 21, 2023 January 21, 2023 at 5:17:03 PM UTC link Permalink

Tanemmirt a Samir,

Ad dhuɣ deg-sent akk, akken ma llant. Yiwet yiwet, awal awal, asekkil asekkil!
Kkes ugur akk!

Afud uzmir ula i kečč.

:-)

Rafik Rafik January 21, 2023 January 21, 2023 at 5:26:00 PM UTC link Permalink

Afud igerrzen ay Igider.
Ayen i d-nenna drus, d leqdic i yewwin aṭas seg wakud-inek, am uzal am yiḍ. Ssarameɣ ad yeffeɣ leɛtab-ik ɣer tafat. Tanemmirt.

{{vm.hiddenReplies[39455] ? 'expand_more' : 'expand_less'}} hide replies show replies
Igider Igider January 21, 2023 January 21, 2023 at 5:32:21 PM UTC link Permalink

Tanemmirt a Rafiq,

Ad d-iniɣ ula d nekk akken ɣef leqdic-ik. Akken i k-nesnemmer drus!

Awi-d kan ad teddu teqbaylit (Teqvaylit) ɣer sdat u ad taf iman-is akken ilaq.


Afud uǧhid ula i kečč.

CK CK January 16, 2023 January 16, 2023 at 11:42:08 PM UTC link Permalink

I've added a number of new English audio files to recently-added sentences.

https://tatoeba.org/en/sentence.../show/4000/und

Perhaps you might want to translate some of these into your own native language.

AlanF_US AlanF_US January 3, 2023 January 3, 2023 at 11:53:21 PM UTC link Permalink

I'm starting a new thread with reference to this one:

https://tatoeba.org/en/wall/sho...#message_39361

in order to focus on my point:

Adding large quantities of near-duplicate sentences degrades the quality of the Tatoeba corpus.

Issues not relevant to that point:

(1) whether those near-duplicate sentences are translated, and by whom (note that translating a large quantity of near-duplicate sentences would produce a large number of near-duplicate translations)
(2) any purpose outside Tatoeba (such as a GPS) for which those sentences are intended
(3) how many people speak the language in question
(4) whether near-duplicate sentences exist in other languages
(5) whether sentences exist that violate our civility guidelines
(6) the chain of events that led to Talwit leaving the project
(7) what has happened to Talwit's sentences so far
(8) what should happen to Talwit's sentences in the future
(9) the flag used for the Kabyle language
(10) whether Berber and Kayble are separate languages
(11) whether members of Tatoeba think of them as separate languages
(12) whether we should have a cap on the rate of sentences added by a member
(13) what that cap should be
(14) how difficult it would be to code a capping mechanism

If anyone wants to explain how adding large quantities of near-duplicate sentences enhances the Tatoeba corpus, this is your opportunity.

{{vm.hiddenReplies[39380] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cangarejo Cangarejo January 4, 2023 January 4, 2023 at 12:58:55 PM UTC link Permalink

What’s your process for coming up with sentences?

{{vm.hiddenReplies[39383] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US January 4, 2023 January 4, 2023 at 2:16:21 PM UTC link Permalink

Personally speaking, I mostly write translations, but when I come up with original sentences, I usually write them around a vocabulary item from either Tatominer ( https://tatominer.netlify.app/eng.html ) or vocabulary requests ( https://tatoeba.org/en/vocabulary/add_sentences ).

brauchinet brauchinet January 4, 2023, edited January 4, 2023 January 4, 2023 at 6:57:59 PM UTC, edited January 4, 2023 at 7:00:50 PM UTC link Permalink

> Adding large quantities of near-duplicate sentences degrades the quality of the Tatoeba corpus.
(12) "whether we should have a cap on the rate of sentences added by a member" is relevant to that point. It prevents users from adding large quantities of sentences whatsoever.
A limit is just a way of making clear that mass production of sentences is unwanted.
Yes, people can create multiple accounts, but this argument would be presupposing some "criminal energy" on their part.

{{vm.hiddenReplies[39389] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US January 5, 2023, edited January 11, 2023 January 5, 2023 at 3:50:45 PM UTC, edited January 11, 2023 at 4:18:37 PM UTC link Permalink

Caps belong to "solution space" -- the discussion of how to address a problem. I wanted to stick to a more fundamental question: Do people agree that adding large quantities of near-duplicate sentences is bad for the corpus? Discussion of whether we should institute a cap, and what it should be, and how it should work, is irrelevant to answering that basic question. And I think it's important to start with the basics, because if you agree that what you're doing is bad for the corpus, you need to reevaluate what you're doing. It's extremely easy to refuse to face the consequences, or to rationalize them away. And when the discussion turns to technical means of discouraging bad behavior, it's easy for people to slip into the mode of "Well, I'll just keep doing it until the site stops me." The fact is that we have few developers and a lack of infrastructure for deciding on whether and how to implement a change, so chances are that it will occur a long time from now, if ever. And as you and others have pointed out, if such a change were instituted, people could use multiple accounts to get around it, and then all that discussion and development time would have been thrown away.

People don't need "criminal energy" to sabotage the site. Ideology, or self-deception, or self-aggrandizement, or hypocrisy will do just fine. My belief is that it is important for people to confront the consequences of their actions, and the sooner they recognize whether or not they're following their own principles or achieving their goals in the best way, the better. And those aren't just lofty principles, either. For instance, people should realize that while adding near-duplicate sentences to Tatoeba might be a shortcut to collecting the data they need for a GPS, it's not the best way, and meanwhile the mind-numbing repetitiveness of their sentences will make people hope to never see another Kabyle sentence in their lives.

{{vm.hiddenReplies[39399] ? 'expand_more' : 'expand_less'}} hide replies show replies
imalaqvayli imalaqvayli January 5, 2023 January 5, 2023 at 5:30:25 PM UTC link Permalink

Hi @AlanF_US

Completely agree that "near-duplicate sentences" are not useful for Tatoeba corpuses, by the way, we have we stopped including "near-duplicate sentences" and Igider started also to ones the ones already integrated, this will take some time to fix the duplicated old ones for kab, this will take time but it's worth it.

Behind this, I have another idea to improve this, since a user can integrate a "near-duplicate sentences" without knowing it since he can't check all already the sentences, do you know if it will be possible to integrate a "near-duplicate sentences" check when the user is integrating a new sentence? By showing to the user the possible "near-duplicate sentences", he will be able to decide if it is useful to integrate his new sentence.

{{vm.hiddenReplies[39401] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US January 11, 2023 January 11, 2023 at 4:23:02 PM UTC link Permalink

I'm glad to hear that you have stopped including near-duplicate sentences.

The rule of thumb is that whatever functionality does not currently exist on Tatoeba will either take a very long time to be implemented or never appear. The number of developers is small, the number of things that need to be fixed is huge, and the number of divergent opinions on the advisability of any particular measure is also large, meaning that discussion will take a long time and often not lead to a result.

The best way to avoid adding near-duplicate sentences is to make an attempt to include variety. I give an example here:

https://tatoeba.org/en/wall/sho...#message_39414

If your sentences are sufficiently varied, and not extremely simple, they probably won't be near-duplicates of existing sentences.

{{vm.hiddenReplies[39417] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK January 13, 2023 January 13, 2023 at 3:22:06 AM UTC link Permalink

A lot of us do the following, while there are several members who don't like this idea.

Wildcards Used to Help Avoid Too Many Near Duplicates
http://bit.ly/tatoebawildcards

Here are a few examples of the kinds of near duplicates that could be avoided using this method.

These are just limited to a few proper nouns, person's name, city name, country name and language name.


**Proper Noun: Person's Name

[#9171210] Tom adopted a puppy. (CK)
[#11392303] Ziri adopted a puppy. (Amastan)

[#7785065] Tom became exhausted. (shekitten)
[#8830384] Skura became exhausted. (Amastan)

[#3150589] Tom didn't buy anything. (CK)
[#7268137] Sami didn't buy anything. (OsoHombre)

[#5062121] Tom doesn't think that's a good idea. (AlanF_US)
[#7135177] Sami doesn't think that's a good idea. (OsoHombre)

[#10119304] Tom ate the bug. (ddnktr)
[#7147707] Sami ate the bug. (OsoHombre)

[#2966807] Tom gave up smoking. (AlanF_US)
[#8794262] Skura gave up smoking. (Amastan)

[#8899999] Tom asked to speak to the manager. (shekitten)
[#7287227] Sami asked to speak to the manager. (OsoHombre)

[#7919112] Tom has a criminal history. (AlanF_US)
[#9388153] Yanni has a criminal history. (Amastan)

[#10553053] Tom was a very nice guy. (AlanF_US)
[#8003579] Mennad was a very nice guy. (OsoHombre)

[#1955120] Tom can't wait. (CK)
[#7156636] Sami can't wait. (OsoHombre)

[#2458647] When is Tom's birthday? (Hybrid)
[#7121120] When is Sami's birthday? (OsoHombre)

[#10247773] Tom accidentally set himself on fire. (ddnktr)
[#11190504] Ziri accidentally set himself on fire. (Amastan)

[#4382032] Tom cut the apple in two. (CK)
[#11266646] Ziri cut the apple in two. (Agestur)

[#6945900] Tom broke everything. (shekitten)
[#7245667] Sami broke everything. (OsoHombre)

[#10802492] Tom clung to a branch. (ddnktr)
[#6713011] Sami clung to a branch. (OsoHombre)

[#10109939] Tom didn't fight back. (ddnktr)
[#11177147] Ziri didn't fight back. (Amastan)

[#3448539] Tom works as an announcer on television. (AlanF_US)
[#9023943] Skura works as an announcer on television. (Amastan)

[#8235173] Tom bought something. (shekitten)
[#7199508] Sami bought something. (OsoHombre)
[#8054241] Mennad bought something. (OsoHombre)
[#10234984] Ziri bought something. (Amastan)
[#11422340] Rima bought something. (Agestur)

[#2236196] Tom denied it. (CK)
[#6698344] Sami denied it. (OsoHombre)
[#9797071] Yanni denied it. (Amastan)
[#10222066] Ziri denied it. (Amastan)
[#11385747] Rima denied it. (Agestur)

[#5149973] Tom denied this. (CK)
[#8795379] Skura denied this. (Amastan)
[#10222064] Ziri denied this. (Amastan)
[#11385729] Rima denied this. (Agestur)

[#2863468] Tom deserves it. (Amastan)
[#7148955] Sami deserves it. (OsoHombre)
[#10897449] Ziri deserves it. (Amastan)
[#11429964] Rima deserves it. (Agestur)

[#2236201] Tom did that. (CK)
[#6806928] Sami did that. (OsoHombre)
[#10026274] Yanni did that. (Amastan)
[#10208200] Ziri did that. (Amastan)
[#11287796] Rima did that. (Agestur)

[#2549619] Tom dialed 911. (CK)
[#10207372] Ziri dialed 911. (Amastan)
[#6462067] Sami dialled 911. (OsoHombre)
[#11413579] Rima dialled 911. (Agestur)
[#11413576] He dialled 911. (Agestur)
[#11413577] She dialled 911. (Agestur)
[#11413583] They dialled 911. (Agestur)


** Proper Noun: City Name

[#2806248] Tom was raised in Boston. (AlanF_US)
[#10222042] Ziri was raised in Algiers. (Amastan)

[#2045784] Boston is a beautiful city. (CK)
[#8104935] Algiers is a beautiful city. (Amastan)

[#4811756] Boston is a big city. (CK)
[#8396376] Algiers is a big city. (Amastan)

[#9817056] Boston is a fascinating city. (CK)
[#8580221] Algiers is a fascinating city. (Amastan)


** Proper Noun: Country Name

[#7192180] Do you feel safe in Australia? (CK)
[#8567684] Do you feel safe in Algeria? (Amastan)

[#7137416] Do you like Australia? (CK)
[#8512926] Do you like Algeria? (Amastan)

[#7192158] Do you live in Australia? (CK)
[#8417301] Do you live in Algeria? (Amastan)

[#7192156] Do you miss Australia? (CK)
[#8313843] Do you miss Algeria? (Amastan)


** Proper Noun: Language Name

[#9266614] I want to speak French fluently. (shekitten)
[#10132507] I want to speak Spanish fluently. (Ricardo14)

[#9284403] It's important to study French. (shekitten)
[#9284497] It's important to study Russian. (shekitten)

[#2451464] Do you have a French dictionary? (CK)
[#8314009] Do you have a Berber dictionary? (Amastan)

[#8410238] Does anybody speak French here? (CK)
[#7865035] Does anybody speak Berber here? (Amastan)

[#2451515] Does anyone here speak French? (CK)
[#6317360] Does anyone here speak Russian? (carlosalberto)
[#11066331] Does anyone here speak Portuguese? (sundown)

{{vm.hiddenReplies[39426] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir January 13, 2023 January 13, 2023 at 7:35:36 AM UTC link Permalink

Ja kuten aiemmin mainittu, on tämä huono ajatus: kaupunkien, valtioiden ja kielten nimet vaihtelevat kielestä toiseen, eri nimet muuttuvat eri tavalla aakkostosta toiseen ja eri nimet taipuvat eri lailla eri kielissä. Lisäksi on hyvä, jos kussakin kielessä esiintyy myös sille tavanomaisia erisnimiä, myös kielenoppijoiden kannalta.

Yorwba Yorwba January 14, 2023 January 14, 2023 at 12:06:49 PM UTC link Permalink

How often do you get the message that the sentence you were trying to add already existed? In other words, how often does always using the same names prevent you from adding a near-duplicate? And what do you do when that happens?

I only get that message when I'm adding a translation to an existing sentence that was already indirectly linked via a few corners to the same translation that I came up with; I don't think I've encountered it with an original sentence I thought up myself, but maybe that's because I haven't added all that many.

shekitten shekitten January 5, 2023, edited January 5, 2023 January 5, 2023 at 5:11:45 PM UTC, edited January 5, 2023 at 5:14:54 PM UTC link Permalink

> Adding large quantities of near-duplicate sentences degrades the quality of the Tatoeba corpus.

What standard is being applied here to determine the quality of the Tatoeba corpus? What are you basing this judgment of quality on? Is this just your opinion?

{{vm.hiddenReplies[39400] ? 'expand_more' : 'expand_less'}} hide replies show replies
Polgar1 Polgar1 January 9, 2023 January 9, 2023 at 12:16:25 PM UTC link Permalink

To my understanding, all of this is up to discussion - preferably, based on reasoning that others can agree on, or at least understand.

shekitten shekitten January 11, 2023, edited January 11, 2023 January 11, 2023 at 5:12:48 PM UTC, edited January 11, 2023 at 5:15:39 PM UTC link Permalink

I'll take your lack of a response as evidence that this is purely your opinion.

So why do people have to take the opportunity to explain to you that near-duplicate sentences enhance the quality of the corpus, when you have no basis (other than your own beliefs) for claiming the opposite? And when you set preconditions excluding almost every argument for why they enhance the quality?

{{vm.hiddenReplies[39418] ? 'expand_more' : 'expand_less'}} hide replies show replies
Selena777 Selena777 January 14, 2023 January 14, 2023 at 7:22:08 PM UTC link Permalink

I agree with you and I would like to support name diversity and free creation of sentences, including so-called "near-duplicates".

Here are the reasons: I use sentences from open sources projects to make audio files and text tables. Listening and reading them helps to learn languages without digging into grammar rules. When you learn a word, it's important to learn different forms of it, not only its main form. Introducing "I learnt Berber", "You learnt Berber", "She learnt Berber", etc. helps you understand the basics of grammar of your new language. That's natural for a human and helps to learn proper endings and inclination.

For the name diversity: finding just "Tom" and "Ziri" here and there while reading/listening sentences is just boring.

{{vm.hiddenReplies[39431] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba January 14, 2023 January 14, 2023 at 10:14:24 PM UTC link Permalink

> Introducing "I learnt Berber", "You learnt Berber", "She learnt Berber", etc. helps you understand the basics of grammar of your new language.

Do you prefer a series of sentences that differ in only one word, or would

I learnt Berber.
You learnt that at school, I hope.
She learnt how to spell her name.

also be acceptable?

{{vm.hiddenReplies[39432] ? 'expand_more' : 'expand_less'}} hide replies show replies
Selena777 Selena777 January 15, 2023 January 15, 2023 at 3:26:34 PM UTC link Permalink

>>Do you prefer a series of sentences that differ in only one word

It depends. For a completely new language with difficult grammar, mixing up sentences like "I learnt Berber", "I learnt English", "I learnt French", "He learnt Berber", "He learnt French", "She learnt German", "Ivan learnt Berber", would be my choice to understand the inclination and learn languages' names.

Of course, you sentences are also valuable and useful. Tatoeba is a collection, so it may consist of sentences of different sorts serving different aims. It's not a dictionary, which main purpose is illustrating different meanings of words.

Thanuir Thanuir January 5, 2023 January 5, 2023 at 7:30:42 PM UTC link Permalink

Yksittäisen lauseen käännöksillä, vaikka ne muistuttaisivatkin toisiaan, on arvoa.

1. Ne näyttävät yhdellä silmäyksellä kuinka moninaisilla tavoilla erään lauseen voi kääntää.
2. Ne tukevat monimuotoisuutta esimerkiksi persoonapronominien sukupuolen ja yksiköllisyyden/monikollisuuden suhteen.

Lisäksi niillä on sekä positiivinen että negatiivinen vaikutus linkittämiseen; negatiivinen sinänsä, että jos joku kääntää yhden lauseen, ei hän välttämättä linkitä samankaltaisia lauseita, eihän käyttöliittymä anna tähän hyvää mahdollisuutta. Positiivinen sinänsä, että jos joku on vaikkapa suomentanut ruotsinkielisen lauseen ja toinen ranskankielisen, ja näillä olisi mahdollinen yhteinen käännös, siihen osutaan todennäköisemmin jos molemmilla lauseilla on useita käännöksiä.

{{vm.hiddenReplies[39402] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cangarejo Cangarejo January 6, 2023 January 6, 2023 at 8:35:59 PM UTC link Permalink

@Thanuir, the sentences being added at a large scale don’t have any translations.

{{vm.hiddenReplies[39404] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir January 7, 2023 January 7, 2023 at 8:16:29 AM UTC link Permalink

Tiedän. Ymmärsin tämän kysymyksen kuitenkin yleisenä, en vain niitä lauseita koskevana.

Polgar1 Polgar1 January 9, 2023 January 9, 2023 at 12:26:27 PM UTC link Permalink

I think this is agreeable, once we clarify two terms: "large quantity" and "near-duplicate".

If by "near-duplicate", we mean sentences that have basically the same meaning with basically the same grammar (i.e you replace a name in an English sentence), I'd say basically any quantity of such "near-duplicates" is just noise. Even a third sentence that just substitutes an undeclined proper name to an undeclined proper name is too much.

However, if it's a translatable word that is being substituted in, or there is some grammatical diversity between the words substituted in, I would be more lenient and therefore would rather focus on the "large quantity" part. It's okay if there is a sentence that illustrates the meaning and usage of several words in a certain context - the important thing is that 1. it shouldn't try to cover "all words" in the given context (because then the corpus is mixed with a boring dictionary) 2. if something can reasonably be deduced from an example, don't create analoguous examples for other sentences.

Loosely, I could say that for me "large amount" means generating sentences based on a logic, either manually or programmatically.

{{vm.hiddenReplies[39408] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx January 9, 2023 January 9, 2023 at 1:13:07 PM UTC link Permalink

> generating sentences based on a logic, either manually or programmatically.

For my part, I call the "original" sentences added using this method "patterned sentences". Here are some of the many examples added today by Amastan/Agestur:

They learnt some Berber.
We learnt some Berber.
Rima and Skura learnt some Berber.
Ziri and Rima learnt some Berber.
Rima learnt some Berber.
She learnt some Berber.
He learnt some Berber.
I learnt some Berber.

Unfortunately, this prolific contributor seems to refuse to communicate with us on this Wall 😞

{{vm.hiddenReplies[39409] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US January 11, 2023, edited January 11, 2023 January 11, 2023 at 3:54:26 PM UTC, edited January 11, 2023 at 4:11:53 PM UTC link Permalink

My hope is that rather than adding these sentences:

They learnt some Berber.
We learnt some Berber.
Rima and Skura learnt some Berber.

a contributor would add these:

They learnt some Greek while they were on vacation.
We learnt some cooking skills by watching YouTube videos.
Rima and Skura learnt valuable life lessons the hard way.
Everything they learnt was wrong.
Our mother, we learnt later, had been saving money in a separate account.

Regarding your last remark: If you have trouble getting a contributor to communicate on the Wall, try sending them a private message. In many situations, private messages are better in the first place.

morbrorper morbrorper January 12, 2023 January 12, 2023 at 10:11:06 AM UTC link Permalink

I think the opposition to near-duplicate sentences can be explained by the term "noise-to-signal ratio"; I hope we can agree that we want sentences that increase the signal level and decrease the level of "noise".

Using this criterion, I would say that the "some Berber" sentences are increasing the noise level, as opposed to Alanf_US's suggestions. Imagine a learning tool that takes it corpus from Tatoeba; it will have to filter out all these repetitive sentences as "noise" in order to be useful.

lbdx lbdx January 10, 2023, edited January 11, 2023 January 10, 2023 at 8:00:04 PM UTC, edited January 11, 2023 at 4:28:59 PM UTC link Permalink

** off-topic post deleted by the author **

{{vm.hiddenReplies[39410] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo January 10, 2023 January 10, 2023 at 8:15:58 PM UTC link Permalink

Ask it again, but before that ask it to define what is the difference between data and information.
And then also ask it to tell you what those near-duplicates are: data or information?

lbdx lbdx January 10, 2023, edited January 11, 2023 January 10, 2023 at 8:48:11 PM UTC, edited January 11, 2023 at 4:29:12 PM UTC link Permalink

** off-topic post deleted by the author **

{{vm.hiddenReplies[39412] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US January 11, 2023, edited January 11, 2023 January 11, 2023 at 3:47:34 PM UTC, edited January 11, 2023 at 3:48:08 PM UTC link Permalink

Entertaining though it might be to see what AI-generated text has to say, I hope this is not the wave of the future. It's hard enough to find time to participate in a conversation with humans. Sorting through AI-generated text makes things even more difficult. Using an AI engine to come up with ideas on your side is one thing but asking a reader to guess which parts of an automatically generated wall of text you actually stand behind, or have written yourself, is another.

Often, AI-generated text doesn't apply to the specifics of the situation. For instance, item 2 ("Variety of context: Even though the sentences are similar, they could have different context and this way, it can give users a better understanding of how the language is used in different situations and situations") doesn't apply to Tatoeba. The whole point is that sentences here appear without external context, and a set of near-duplicate sentences fails to provide any more internal context beyond what exists in any one of those sentences. Thus, people are unable to reason about what variety of contexts might be valid for a particular usage.

{{vm.hiddenReplies[39413] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx January 11, 2023 January 11, 2023 at 4:04:04 PM UTC link Permalink

@Alanf_US Sorry, but I was very impressed with the result and thought other members would be curious as well. I will delete everything.

{{vm.hiddenReplies[39415] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US January 11, 2023, edited January 11, 2023 January 11, 2023 at 4:13:59 PM UTC, edited January 11, 2023 at 4:14:59 PM UTC link Permalink

I didn't expect you to delete your posts, but I appreciate your openness to and respect for my viewpoint.

Cangarejo Cangarejo January 12, 2023 January 12, 2023 at 1:58:26 PM UTC link Permalink

How is Tatoeba doing in terms of storage space?

{{vm.hiddenReplies[39421] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez January 13, 2023 January 13, 2023 at 5:54:29 AM UTC link Permalink

I wonder too. Possibly this might have an impact on how Tatoeba has been crashing a lot recently for me.

{{vm.hiddenReplies[39427] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba January 14, 2023 January 14, 2023 at 12:28:54 PM UTC link Permalink

We actually ran out of disk space on 2022-12-10 and had to delete some files that were no longer needed, but that was due to inefficient usage of the space we did have available. We're currently using about 80 GiB of storage, so there's lots of headroom for growth before we even get close to the limit of what can fit into a single server.

When you see the "Tatoeba is currently unavailable" page intermittently, that's more likely due to rate limiting (if you request more than one page per second, the server will try to slow you down) or heavy load on the server when many users try to access it simultaneously.

January 12, 2023, edited January 12, 2023 January 12, 2023 at 7:05:04 AM UTC, edited January 12, 2023 at 7:55:46 AM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

{{vm.hiddenReplies[39419] ? 'expand_more' : 'expand_less'}} hide replies show replies
January 12, 2023, edited January 12, 2023 January 12, 2023 at 4:22:57 PM UTC, edited January 12, 2023 at 7:25:34 PM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

{{vm.hiddenReplies[39422] ? 'expand_more' : 'expand_less'}} hide replies show replies
January 12, 2023, edited January 12, 2023 January 12, 2023 at 4:33:56 PM UTC, edited January 12, 2023 at 4:43:45 PM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

{{vm.hiddenReplies[39423] ? 'expand_more' : 'expand_less'}} hide replies show replies
January 12, 2023 January 12, 2023 at 8:01:31 PM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

sharptoothed sharptoothed January 8, 2023 January 8, 2023 at 7:06:31 AM UTC link Permalink

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

Igider Igider January 4, 2023, edited January 4, 2023 January 4, 2023 at 9:23:46 PM UTC, edited January 4, 2023 at 9:36:58 PM UTC link Permalink

@AlanF_US
@Shishir

Thank you for your answer.

Rafik is/was not the person responsible for the Kabyle corpus. And we were not informed of this event at all. It was expeditious! And do you find it normal that a contributor of a language wants to add almost 100,000 sentences to another language?

Would you accept that a contributor of your respective languages, after 3 years of contribution, decides overnight to add tens of thousands of sentences to another language? And this behind closed doors? Would you really do it? Is that moral?

I have not insulted Islam, I have exposed facts, the same ones concerning Algeria, I have exposed sentences found in independent newspapers. I have a total of 150 sentences on these two words, unlike those who proselytize and Judeophobie speech in thousands of sentences. You have the links.

The kabyle flag is our really representative flag. Berber flag representd all berber in the world. Do you accept changing your own flag to another just because it matters for a person who is not concerned about it?

For the repetitive sentences, I have already talked about them, as well as the solutions.

By the way, I translate every day.

I'll respect your decision whatever you'll decide.

Best regards.

{{vm.hiddenReplies[39395] ? 'expand_more' : 'expand_less'}} hide replies show replies
Rafik Rafik January 4, 2023 January 4, 2023 at 10:23:42 PM UTC link Permalink

Hello to all and best wishes. my wish for 2023 and that all these controversies stop and that Tatoeba will be better.

After transferring the sentences from Talwit to the Berber corpus, I actually asked the admins to be able to keep those with an audio recording in our corpus (kabyle). I found it abnormal and unfair that users who had spent hours making their recordings saw their efforts serving another corpus. The admins contacted Talwit who agreed to release the sentences in question.
For the flag affair, it's like seeing the European corpuses with a single flag, that of the European Union and the inscriptions: Fra, Ita, Spa, Ger, Por... to differentiate them! that does not make any sense.

Shishir Shishir January 4, 2023 January 4, 2023 at 10:45:37 PM UTC link Permalink

Hi Igider,

Rafik was the person who complained to us and mentioned the sentences with audio, that is why he was informed about the fact that these Kabyle sentences that already had audio remained Kabyle, it's not that we went to inform him as leader of the Kabyle project. I myself do not know how your project is run or whether there is a particular leader or people just expose their complaints directly to us like the contributors of any other language.

If someone or even all the other Spanish speakers wanted to change their sentences to another language I admit I would wonder, but it would be their decision after all and I would respect it, since it is their work and their sentences we are talking about. If I had reasons to ask for such a change of my own sentences I would also not feel compelled to tell the other Spanish speakers about it, but again we might be dealing with a different way of leadership since your Kabyle project seems to have some kind of hierarchy that the other languages lack.

About not insulting Islam and Algeria but exposing facts... you added these sentences, which I would consider closer to personal opinions than proven facts:

#8111667 The Algerian Arabs are happy when the Kabyle are killed by their Arab-Islamist army.

#8111685 The minority of Kabyle who support the criminal and corrupt Algerian state are Islamist Kabyle.

#10698511 Les islamistes algériens, tout comme le pouvoir algérien dictatorial, soutiennent Poutine mais préfèrent vivre en Occident.

#8126416 Muslims who are tolerant are no longer Muslims because they no longer apply the precepts of Islam that are liberticidal.

#9635667 The majority of Muslims are ready to kill anyone who criticizes Islam which is not a religion but a dangerous sect. Don't we say that "religion is a successful sect"?

And I know there are other users who also add offensive sentences, more than you do. But complaining about them when you also write offensive sentences is kind of shocking.

sharptoothed sharptoothed December 25, 2022 December 25, 2022 at 3:20:20 PM UTC link Permalink

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

{{vm.hiddenReplies[39322] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo December 27, 2022, edited December 27, 2022 December 27, 2022 at 8:47:47 AM UTC, edited December 27, 2022 at 8:48:32 AM UTC link Permalink

13k sentences a week? Please, just shut down that bot for the night.

{{vm.hiddenReplies[39333] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx December 28, 2022 December 28, 2022 at 11:01:16 AM UTC link Permalink

I agree with Cabo. If he was flooding the English corpus with tens of thousands of little varied original sentences every month, this member would certainly have been suspended (at least temporarily). Unfortunately, some corpora don't get enough care from the moderation team.

Aiji Aiji December 28, 2022 December 28, 2022 at 1:07:07 PM UTC link Permalink

I also agree with Cabo. Maybe we could hear what the Kabyle moderation team, or the user involved, thinks about the matter.

{{vm.hiddenReplies[39342] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez December 28, 2022 December 28, 2022 at 4:40:52 PM UTC link Permalink

@samir_t Would you say that all of his sentences bring any meaningful value to the corpus, other than bulk?

I’m aware of the intent to make audio for OpenGPS but I don’t know if this is even the case here.

{{vm.hiddenReplies[39343] ? 'expand_more' : 'expand_less'}} hide replies show replies
Rafik Rafik December 28, 2022 December 28, 2022 at 6:41:34 PM UTC link Permalink

Even if the volume is indeed enormous, the sentences added by the user in question are all significant and useful (even more when they are translated). The corpora are not equal and what seems banal for a powerful language such as English, German, Russian or French is of paramount importance for fragile languages ​​such as Kabyle. PS: you can choose dozens of sentences added, at random, and I would be happy to translate them to you in French just to prove that they are varied and useful.

(translated from French with Google Translate)

{{vm.hiddenReplies[39344] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez December 28, 2022, edited December 29, 2022 December 28, 2022 at 7:07:35 PM UTC, edited December 29, 2022 at 2:10:40 AM UTC link Permalink

https://tatoeba.org/en/sentences_lists/show/170869

Ceci est tiré d'une recherche aléatoire.

Vous pouvez parler en français si vous le souhaitez. Je ne le parle pas bien, mais je le comprends.

{{vm.hiddenReplies[39345] ? 'expand_more' : 'expand_less'}} hide replies show replies
Rafik Rafik December 28, 2022 December 28, 2022 at 7:51:41 PM UTC link Permalink

#11365801 De l'eau, s'il vous plaît !
Donnez-moi de l'eau, s'il vous plaît !
#9632897 J'ai accouru à Gerruma.
#9100073 je pense que c'est une expression idiomatique, littéralement : je vous les ai porté sur le dos.
#9121136 elle visitera la ville de Béjaïa.
#7535051 est-ce vrai que c'est pour la langue kabyle que vous vous battez ?
#9486175 : y a t'il une autre façon d'écrire astrologie en islandais ?
#9111757 je vous les ai toutes partagé.
#11329469 j'ai goûté de la viande tout à l'heure.
 #9459560 vous avez protégé ma grand mère contre tout.
#11335969 vendez-lui du beurre.
#9576035 vous avez prit une baklava.

C'est juste un échantillon des phrases que vous avez soumis,Je n'ai pas assez de temps pour les traduire toutes, mais si vous insistez je le ferrais pour vous tout à l'heure.

{{vm.hiddenReplies[39346] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US January 2, 2023, edited January 2, 2023 January 2, 2023 at 6:46:50 PM UTC, edited January 2, 2023 at 6:48:30 PM UTC link Permalink

Accessing the sentences via random search hides the patterns that are used to create them. Here are a number of examples of those patterns:

(1) "Ad telḥu ɣer" (= She will walk to) + placename (83 results)
https://tatoeba.org/en/sentence...roved=no&user=
Examples:
Ad telḥu ɣer Brazil.
Ad telḥu ɣer Budwaw.
Ad telḥu ɣer Timyawin.

"Ad terzu ɣer" (= She will visit) + placename (108 results)
https://tatoeba.org/en/sentence...roved=no&user=
Examples:
Ad terzu ɣer Brazil.
Ad terzu ɣer Rwiba.
Ad terzu ɣer Tubiret.

"Ad tnehṛemt ɣer" (=You will drive to) + placename (88 results)
https://tatoeba.org/en/sentence...C9%A3er%22&to=
Ad tnehremt ɣer Budwaw.
Ad tnehremt ɣer Bgayet.
Ad tnehremt ɣer Tubiret.

These patterns cast serious doubt on the assertion that the sentences added by the user in question are all significant, varied, and useful. Mass-producing sentences like this results in a banal collection of sentences for any language. Anyone who does this is actually doing their language a disservice. Having to search through a large number of nearly identical sentences discourages potential translators and language learners alike.

lbdx lbdx December 31, 2022 December 31, 2022 at 9:10:42 AM UTC link Permalink

I am convinced that the Tatoeba Corpus would grow more harmoniously if the addition rate of original sentences was capped (e.g. to 3,000 sentences per month). This would rebalance the corpus lexically and encourage more contributors to participate as their sentences would gain more visibility.

Of course, a member who would exceed the limit would not be permanently banned but temporarily suspended depending on the extent of the excess. Besides, exceptions could be made in some special cases.

{{vm.hiddenReplies[39351] ? 'expand_more' : 'expand_less'}} hide replies show replies
sundown sundown January 2, 2023, edited January 2, 2023 January 2, 2023 at 7:15:07 PM UTC, edited January 2, 2023 at 8:37:48 PM UTC link Permalink

> I am convinced that the Tatoeba Corpus would grow more harmoniously if the addition rate of original sentences was capped

I agree. Something like this should've been done years ago.

{{vm.hiddenReplies[39360] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US January 3, 2023 January 3, 2023 at 12:53:38 PM UTC link Permalink

I also agree, though I would favor a lower cap on the rate of addition of sentences.

{{vm.hiddenReplies[39364] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx January 3, 2023 January 3, 2023 at 5:37:51 PM UTC link Permalink

> I would favor a lower cap on the rate of addition of sentences

How about 1,500 original sentences per month (i.e. an average of 50 additions per day for a month)? If this cap had been in effect last month, only 4 contributors would have exceeded it. Source: https://colab.research.google.c...10&uniqifier=1

imalaqvayli imalaqvayli January 4, 2023 January 4, 2023 at 9:37:32 PM UTC link Permalink

caping the number of new sentences per month is good yes, but 1500 is too low, 3000 is a good number I think.

Behind this I think also that we may make mandatory to add at least one translation per created sentence, this will improve the quality of the corpuses. 99% of the users are good in at least 2 languages. What do you think @AlanF_US?

Cangarejo Cangarejo December 29, 2022 December 29, 2022 at 11:15:00 AM UTC link Permalink

What’s OpenGPS?

{{vm.hiddenReplies[39347] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez December 29, 2022 December 29, 2022 at 5:59:26 PM UTC link Permalink

A GPS app that the Kabyles are developing of all the villages in the region. They want sentences for every village/place name so they can record audio for it and use it for the app, instead of machine audio (for which there is not enough data). I wonder if a different platform is a better host for the sentences with this purpose though, if they otherwise provide not much linguistic value.

{{vm.hiddenReplies[39348] ? 'expand_more' : 'expand_less'}} hide replies show replies
Objectivesea Objectivesea December 30, 2022 December 30, 2022 at 4:32:45 AM UTC link Permalink

Yes, @DJ_Saidez, if all you ultimately want are pronounced versions of individual proper nouns rather than full sentences, one might suggest a site like Forvo.com for this task, since Tatoeba's very raison d'être is to have complete sentences, and someone would then have to extract the village names by editing sound files for the sentences.

samir_t samir_t January 3, 2023, edited January 3, 2023 January 3, 2023 at 5:33:11 PM UTC, edited January 3, 2023 at 5:40:29 PM UTC link Permalink

Excepté les phrases où des noms de lieux sont utilisés et où des répétitions sont certes visibles, je confirme que les autres sont utiles et apportent un plus au corpus. Je vois d'ailleurs que @Rafik, dans sa réponse, en a traduit un bon échantillon qui représente à peu près toutes ces phrases. Le fait qu'elles soient si nombreuses ne signifie pas forcément qu'elles sont de mauvaise qualité.

Igider Igider January 3, 2023, edited January 3, 2023 January 3, 2023 at 6:39:21 AM UTC, edited January 3, 2023 at 9:37:41 AM UTC link Permalink

Dear contributors, dear administrators, Hi,

First of all, I present my best wishes for the new year. 🎊💐🎉

We have already explained the interest of having this kind of sentences.

First, it's for geolocation needs by integrating these sentences in GPS systems (My colleague @Rafik explained it recently), then in AI, Chatbot, among others, because we don't have yet algorithms that can generate these types of sentences by themselves. If you know of any, please let us know.

The Kabyle language, which is a fragile language, must have more opportunities just like the endangered languages. Especially since the names of Kabyle villages are replaced by Arabic names because of the policy of Arabization.

Besides, could you tell me why there are thousands of sentences of:

Boston
https://tatoeba.org/fr/sentence...ery=Boston&to=

New York
https://tatoeba.org/fr/sentence...y=New+york&to=

Los Angeles
https://tatoeba.org/fr/sentence...&query=L.A&to=

Do you also think, that it is more relevant than knowing that:

Tom got married in Boston.
Tom got married in L.A.
Tom got married in N.Y. .................................etc

Then there is Tom, there is Mary, there is Mr. Jackson, Mr. White...etc. With the same verbs each time, yet we translate them with pleasure.

Many sentences are redundant!

And then thousands of sentences with the name Algeria in Berber (Lezzayer = Zzayer)

Zzayer
https://tatoeba.org/fr/sentence...ery=Zzayer&to=

Lezzayer
https://tatoeba.org/fr/sentence...y=Lezzayer&to=

The usefulness of proselytizing sentences like (Accept Islam)
https://tatoeba.org/fr/sentence...uery=islam&to=

Hate phrases against the State of Israel
https://tatoeba.org/fr/sentence...o&user=Amastan

...etc.

I have only given two examples of languages, English and Berber. There are hundreds of other examples.

As for me, I am constantly correcting even sentences with village names. I also improve them and diversify them. (Please feel free to check them out)

If some contributors are uncomfortable with translating them, I will translate them myself with my colleagues, we speak Arabic, English, French, German, Spanish...etc. No problem. Our team has grown too.

In the end and in parenthesis, you gave all the sentences of a Kabyle contributor (who changed his name to : TALWIT, who left the project, after receiving threats by the same person who creates the polemics), to the "Berber language", without consulting the Kabyle contributors, not even the person responsible for the Kabyle corpus. Almost 100,000 sentences. Do the contributors of Tatoeba know this sad event? I doubt it.

Talwit's sentences (You can check out)
https://tatoeba.org/fr/sentence.../Talwit?page=5

What do we do with these Kabyle phrases?

Our flag was also deleted, just because it displeases the same person who creates polemics!

So what do we do with our Kabyle flag?

Finally, we know where these controversies come from, which delay and complicate things for us.

Do you have any suggestions, dear administrators?


Best regards.

@Admins
@AlanF_US

{{vm.hiddenReplies[39361] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US January 3, 2023, edited January 3, 2023 January 3, 2023 at 12:49:40 PM UTC, edited January 3, 2023 at 12:51:16 PM UTC link Permalink

I wish you a happy new year, too.

It is unfortunately quite true that several contributors have introduced large quantities of redundant, near-duplicate sentences in a number of languages in the corpus. Please don't follow their example. As I've said, doing so hurts your own language. We could have a far better English corpus, for instance, if it weren't diluted with so many trivially different versions of the same sentences. People who are looking for concrete examples of vocabulary or grammar in a given language, or who want to contribute translations, need diversity.

The fact that you want to improve GPS systems is laudable, but if your approach is going to make things worse for people who want to use Tatoeba for everything else, you should go elsewhere or build your own tools for this purpose. Obviously, given the large number of existing GPS systems built without the help of Tatoeba, this can be done.

It is also true that various members of the community have violated the spirit of respect for people and groups in their sentences, comments, and communication with each other. We try to remedy this as fairly as we can, which is a difficult job for a number of reasons. However, even when you are the target of such behavior, it doesn't justify watering down your corpus by adding near-duplicate sentences on an industrial scale.

{{vm.hiddenReplies[39363] ? 'expand_more' : 'expand_less'}} hide replies show replies
Polgar1 Polgar1 January 3, 2023 January 3, 2023 at 3:08:23 PM UTC link Permalink

I really wish there was a way to upvote messages. Anyway... +1, very much agreed.

{{vm.hiddenReplies[39365] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez January 3, 2023, edited January 3, 2023 January 3, 2023 at 3:40:00 PM UTC, edited January 3, 2023 at 3:40:41 PM UTC link Permalink

While that’s be very social media-like, having an upvoting feature, I very agree with the message too.
We’re not here to drive them away, but we’re wanting everyone to contribute in the best manner.

Igider Igider January 3, 2023 January 3, 2023 at 3:37:21 PM UTC link Permalink

I propose you better, if you allow it, then no orphan sentences will be introduced. Only sentences with at least one link. Until the languages have obtained 100% translations. At least the first 20 languages.

We will also need to recover the sentences of our Kabyle corpus, the one of the contributor "Talwit" who left this project, unfortunately.

Finally, we need to recover our Kabyle flag with which we registered in Tatoeba. Then we will start on real good bases and a real good year.

{{vm.hiddenReplies[39366] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez January 3, 2023 January 3, 2023 at 3:42:15 PM UTC link Permalink

>Only sentences with at least one link. Until the languages have obtained 100% translations.

You mean you’ll stop adding original sentences and only add translations until they’re fully translated into at least one language?

{{vm.hiddenReplies[39368] ? 'expand_more' : 'expand_less'}} hide replies show replies
Igider Igider January 3, 2023 January 3, 2023 at 3:51:30 PM UTC link Permalink

Yes you get it.

DJ_Saidez DJ_Saidez January 3, 2023 January 3, 2023 at 3:54:52 PM UTC link Permalink

> We will also need to recover the sentences of our Kabyle corpus, the one of the contributor "Talwit" who left this project, unfortunately.

To my knowledge, there’s no way to do a mass action, such as transferring all sentences of a user to another user, or changing the flag of all of them, but an option is to make that user inactive, and then all the sentences will be orphan, then you can adopt and change the flag.

{{vm.hiddenReplies[39370] ? 'expand_more' : 'expand_less'}} hide replies show replies
imalaqvayli imalaqvayli January 3, 2023 January 3, 2023 at 5:06:37 PM UTC link Permalink

The point about the flag is not only to attach these sentences to Kabyle language, but also to restore the original Kabylian flag removed by the admins. Some of the contributors still think that Berber language is the same as Kabylia one, which is not the case.


And agree with Igider regarding Orphan sentences, and we can imagine to not accepting new sentences without at least one translation, this will push to improve the corpuses.

Regarding the daily limit, if you want to create such as rules, this should be created at the application level, and prevent technically a user to add more than 1000 sentences (with at least one translation) per day, and not ban the user.

{{vm.hiddenReplies[39371] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx January 3, 2023, edited January 3, 2023 January 3, 2023 at 5:51:53 PM UTC, edited January 3, 2023 at 5:58:22 PM UTC link Permalink

> if you want to create such as rules,, this should be created at the application level [...] and not ban the user.

Coding such a feature would probably take time. The temporary suspension of a member by the moderation team could be introduced more quickly and would allow for exceptions in certain cases.


> we can imagine to not accepting new sentences without at least one translation

This additional constraint does not seem necessary to me. Besides, a user should be allowed to contribute even if he knows only one language.

{{vm.hiddenReplies[39374] ? 'expand_more' : 'expand_less'}} hide replies show replies
imalaqvayli imalaqvayli January 3, 2023 January 3, 2023 at 9:04:18 PM UTC link Permalink

I don't think that it is something too big to code, behind this it is not really logic to ban a member because he is adding sentences and also this means that the admins needs to follow the banned accounts in order to reintegrate them again after a certain period, which needs also some development I think.
Let's see what the admins think about the propositions.

{{vm.hiddenReplies[39375] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo January 3, 2023 January 3, 2023 at 10:07:21 PM UTC link Permalink

If something is not hard to code, than why do you do part of your GPS project here?
Maybe one of you have the love towards the site and convinced others to join, but surely, to get so much city, village and street names like H2O molecules in the seven seas included in the sentences won't bring anything great.
Tatoeba is a corpus, not your data centre.

{{vm.hiddenReplies[39376] ? 'expand_more' : 'expand_less'}} hide replies show replies
imalaqvayli imalaqvayli January 3, 2023 January 3, 2023 at 10:17:41 PM UTC link Permalink

@cabo I think that you didn't get my point at all, when I speak about coding I'm speaking about adding this into Tatoeba not in other projects since we are in Tatoeba here...
Again this is something that should be discussed with all the admins and see what are the rules that we should add even technically in Tatoeba to push to the corpuses improvement.
And yes Tatoeba is a very nice corpuses and it is used by several projects, companies and in every thing including GPS.

DJ_Saidez DJ_Saidez January 3, 2023 January 3, 2023 at 10:11:18 PM UTC link Permalink

I’m not referring to the flag representing the Kabyle language, that’s a separate issue. I’m talking only about all the Talwit sentences, i don’t know of a way to do an action to all of them, other than make them adoptable by advanced contributors by making “Talwit” inactive.

{{vm.hiddenReplies[39377] ? 'expand_more' : 'expand_less'}} hide replies show replies
imalaqvayli imalaqvayli January 3, 2023 January 3, 2023 at 10:19:46 PM UTC link Permalink

We are open to all solutions can make these sentences retrieved in kab language.
Regarding the flag yes this is an issue for us.

Cangarejo Cangarejo January 4, 2023 January 4, 2023 at 2:22:37 PM UTC link Permalink

> Regarding the daily limit, if you want to create such as rules, this should be created at the application level, and prevent technically a user to add more than 1000 sentences (with at least one translation) per day, and not ban the user.

Won’t users just create more accounts?

{{vm.hiddenReplies[39385] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US January 4, 2023 January 4, 2023 at 6:12:53 PM UTC link Permalink

Indeed. That's one reason why relying on technical restrictions can only get us so far.

{{vm.hiddenReplies[39386] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo January 4, 2023, edited January 4, 2023 January 4, 2023 at 6:23:39 PM UTC, edited January 4, 2023 at 7:01:02 PM UTC link Permalink

What if you create something what doesn't check for a cap, but allows you writing stand alone sentences when you have at least equal amount of translated ones?

Then you have no problem that someone will create other characters just to write more sentences.

{{vm.hiddenReplies[39387] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez January 4, 2023, edited January 4, 2023 January 4, 2023 at 9:04:27 PM UTC, edited January 4, 2023 at 9:05:14 PM UTC link Permalink

What about users that only speak one language? They wouldn’t be able to translate, and so contribute.
In the Kabyle issue, seems to not be a problem, since most/all of them speak at least French.

{{vm.hiddenReplies[39390] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo January 4, 2023 January 4, 2023 at 9:11:30 PM UTC link Permalink

I can't speak German, but I still can translate simple sentences.

{{vm.hiddenReplies[39391] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez January 4, 2023 January 4, 2023 at 9:15:09 PM UTC link Permalink

I’m referring to other users. We talking about this Kabyle topic, but any technical limit we put on Tatoeba affects everyone.

{{vm.hiddenReplies[39392] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo January 4, 2023, edited January 4, 2023 January 4, 2023 at 9:17:50 PM UTC, edited January 4, 2023 at 9:18:06 PM UTC link Permalink

And I don't think that someone who "only speaks one language" (first, to find this site may you know English, because I don't remember how I found this site, but I'm sure not after a Hungarian search) can't find anything to translate or can't find anything to contribute.

Shishir Shishir January 4, 2023 January 4, 2023 at 6:46:28 PM UTC link Permalink

Happy new year to you all!

To the best of my knowledge, Talwit requested the change from Kabyle to Berber of all of his sentences, except the sentences with audio, which were unadopted and left as Kabyle and later on adopted by some of your people (as owner of the sentences, Talwit is the only one whose opinion is relevant in that matter. If there have been threats we have no knowledge about it). Rafik was informed about this process.

About the near-duplicates, if they're added as different translations of a sentence it's fine for me because people learn other words to say the same thing or different meanings of a sentence, which can be useful in some situations. If they're added with different names of streets, cities, people, I find it useless. And it doesn't matter what language we are talking about, actually most people who are against the use of Tatoeba to create that GPS system are also against the use of only the name Tom or Boston or the repetition of the very same sentence with different cities or people.

About the attacks in the sentences, I admit I'm surprised you bring up that topic, if I remember correctly you also added several offensive sentences against Islam and Algeria, which can still be found in the corpus. But as Alan said, it's not an easy matter to take care of.

And that of the flag, I think you also remember that after a long research the decision of changing the Kabyle flag was taken, and, as I don't think the circumstances that led to that decision have changed, I wouldn't expect that flag to come back.

{{vm.hiddenReplies[39388] ? 'expand_more' : 'expand_less'}} hide replies show replies
samir_t samir_t January 4, 2023, edited January 4, 2023 January 4, 2023 at 9:22:54 PM UTC, edited January 4, 2023 at 9:34:13 PM UTC link Permalink

Concerning Talwit, who was called "Ubezwi1" before, the case of his sentences is a little strange, because even if he himself chose to "change language", even today, his sentences come out in the search for Kabyle sentences (corpus ) but... with the BER flag! and people who try to translate by choosing words or random sentences, don't understand this, some of them translate without noticing, the flags are so similar, especially since he added thousands of sentences, I think somewhere the job is not well done, since it must have been the only such case on Tatoeba, the admins should see what's wrong.

felix63 felix63 January 3, 2023 January 3, 2023 at 7:05:11 AM UTC link Permalink

Beaucoup de bonheur, une heureuse santé, voilà mes souhaits pour la nouvelle année. Que les joies l'emportent sur tout le reste, pour vous et vos proches. Très amicalement. 🎁

{{vm.hiddenReplies[39362] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx January 4, 2023 January 4, 2023 at 8:19:56 AM UTC link Permalink

@felix63 merci pour tes vœux et merci de veiller aussi bien sur le corpus français. Bonne année et bonne santé à toi aussi.

CK CK January 4, 2023, edited January 4, 2023 January 4, 2023 at 12:39:35 AM UTC, edited January 4, 2023 at 12:50:29 AM UTC link Permalink

** Stats **

2023-01-04 compared to 2020-01-04

https://postimg.cc/JDcmtV2d

Screenshots of this page.

"Stats per Language" (/stats/sentences_by_language)
https://tatoeba.org/en/stats/sentences_by_language

I didn't have internet access at this time of year for the last couple of years, so I don't have screenshots for 2021-01-04 and 2022-01-04.