✹✹ Stats & Graphs ✹✹
Tatoeba Stats, Graphs & Charts have been updated:
13k sentences a week? Please, just shut down that bot for the night.
I agree with Cabo. If he was flooding the English corpus with tens of thousands of little varied original sentences every month, this member would certainly have been suspended (at least temporarily). Unfortunately, some corpora don't get enough care from the moderation team.
I also agree with Cabo. Maybe we could hear what the Kabyle moderation team, or the user involved, thinks about the matter.
@samir_t Would you say that all of his sentences bring any meaningful value to the corpus, other than bulk?
I’m aware of the intent to make audio for OpenGPS but I don’t know if this is even the case here.
Even if the volume is indeed enormous, the sentences added by the user in question are all significant and useful (even more when they are translated). The corpora are not equal and what seems banal for a powerful language such as English, German, Russian or French is of paramount importance for fragile languages such as Kabyle. PS: you can choose dozens of sentences added, at random, and I would be happy to translate them to you in French just to prove that they are varied and useful.
(translated from French with Google Translate)
Ceci est tiré d'une recherche aléatoire.
Vous pouvez parler en français si vous le souhaitez. Je ne le parle pas bien, mais je le comprends.
#11365801 De l'eau, s'il vous plaît !
Donnez-moi de l'eau, s'il vous plaît !
#9632897 J'ai accouru à Gerruma.
#9100073 je pense que c'est une expression idiomatique, littéralement : je vous les ai porté sur le dos.
#9121136 elle visitera la ville de Béjaïa.
#7535051 est-ce vrai que c'est pour la langue kabyle que vous vous battez ?
#9486175 : y a t'il une autre façon d'écrire astrologie en islandais ?
#9111757 je vous les ai toutes partagé.
#11329469 j'ai goûté de la viande tout à l'heure.
#9459560 vous avez protégé ma grand mère contre tout.
#11335969 vendez-lui du beurre.
#9576035 vous avez prit une baklava.
C'est juste un échantillon des phrases que vous avez soumis,Je n'ai pas assez de temps pour les traduire toutes, mais si vous insistez je le ferrais pour vous tout à l'heure.
Accessing the sentences via random search hides the patterns that are used to create them. Here are a number of examples of those patterns:
(1) "Ad telḥu ɣer" (= She will walk to) + placename (83 results)
Ad telḥu ɣer Brazil.
Ad telḥu ɣer Budwaw.
Ad telḥu ɣer Timyawin.
"Ad terzu ɣer" (= She will visit) + placename (108 results)
Ad terzu ɣer Brazil.
Ad terzu ɣer Rwiba.
Ad terzu ɣer Tubiret.
"Ad tnehṛemt ɣer" (=You will drive to) + placename (88 results)
Ad tnehremt ɣer Budwaw.
Ad tnehremt ɣer Bgayet.
Ad tnehremt ɣer Tubiret.
These patterns cast serious doubt on the assertion that the sentences added by the user in question are all significant, varied, and useful. Mass-producing sentences like this results in a banal collection of sentences for any language. Anyone who does this is actually doing their language a disservice. Having to search through a large number of nearly identical sentences discourages potential translators and language learners alike.
I am convinced that the Tatoeba Corpus would grow more harmoniously if the addition rate of original sentences was capped (e.g. to 3,000 sentences per month). This would rebalance the corpus lexically and encourage more contributors to participate as their sentences would gain more visibility.
Of course, a member who would exceed the limit would not be permanently banned but temporarily suspended depending on the extent of the excess. Besides, exceptions could be made in some special cases.
> I am convinced that the Tatoeba Corpus would grow more harmoniously if the addition rate of original sentences was capped
I agree. Something like this should've been done years ago.
I also agree, though I would favor a lower cap on the rate of addition of sentences.
> I would favor a lower cap on the rate of addition of sentences
How about 1,500 original sentences per month (i.e. an average of 50 additions per day for a month)? If this cap had been in effect last month, only 4 contributors would have exceeded it. Source: https://colab.research.google.c...10&uniqifier=1
caping the number of new sentences per month is good yes, but 1500 is too low, 3000 is a good number I think.
Behind this I think also that we may make mandatory to add at least one translation per created sentence, this will improve the quality of the corpuses. 99% of the users are good in at least 2 languages. What do you think @AlanF_US?
A GPS app that the Kabyles are developing of all the villages in the region. They want sentences for every village/place name so they can record audio for it and use it for the app, instead of machine audio (for which there is not enough data). I wonder if a different platform is a better host for the sentences with this purpose though, if they otherwise provide not much linguistic value.
Yes, @DJ_Saidez, if all you ultimately want are pronounced versions of individual proper nouns rather than full sentences, one might suggest a site like Forvo.com for this task, since Tatoeba's very raison d'être is to have complete sentences, and someone would then have to extract the village names by editing sound files for the sentences.
Excepté les phrases où des noms de lieux sont utilisés et où des répétitions sont certes visibles, je confirme que les autres sont utiles et apportent un plus au corpus. Je vois d'ailleurs que @Rafik, dans sa réponse, en a traduit un bon échantillon qui représente à peu près toutes ces phrases. Le fait qu'elles soient si nombreuses ne signifie pas forcément qu'elles sont de mauvaise qualité.