menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
sharptoothed sharptoothed December 25, 2022 December 25, 2022 at 3:20:20 PM UTC link Permalink

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

{{vm.hiddenReplies[39322] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo December 27, 2022, edited December 27, 2022 December 27, 2022 at 8:47:47 AM UTC, edited December 27, 2022 at 8:48:32 AM UTC link Permalink

13k sentences a week? Please, just shut down that bot for the night.

{{vm.hiddenReplies[39333] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx December 28, 2022 December 28, 2022 at 11:01:16 AM UTC link Permalink

I agree with Cabo. If he was flooding the English corpus with tens of thousands of little varied original sentences every month, this member would certainly have been suspended (at least temporarily). Unfortunately, some corpora don't get enough care from the moderation team.

Aiji Aiji December 28, 2022 December 28, 2022 at 1:07:07 PM UTC link Permalink

I also agree with Cabo. Maybe we could hear what the Kabyle moderation team, or the user involved, thinks about the matter.

{{vm.hiddenReplies[39342] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez December 28, 2022 December 28, 2022 at 4:40:52 PM UTC link Permalink

@samir_t Would you say that all of his sentences bring any meaningful value to the corpus, other than bulk?

I’m aware of the intent to make audio for OpenGPS but I don’t know if this is even the case here.

{{vm.hiddenReplies[39343] ? 'expand_more' : 'expand_less'}} hide replies show replies
Rafik Rafik December 28, 2022 December 28, 2022 at 6:41:34 PM UTC link Permalink

Even if the volume is indeed enormous, the sentences added by the user in question are all significant and useful (even more when they are translated). The corpora are not equal and what seems banal for a powerful language such as English, German, Russian or French is of paramount importance for fragile languages ​​such as Kabyle. PS: you can choose dozens of sentences added, at random, and I would be happy to translate them to you in French just to prove that they are varied and useful.

(translated from French with Google Translate)

{{vm.hiddenReplies[39344] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez December 28, 2022, edited December 29, 2022 December 28, 2022 at 7:07:35 PM UTC, edited December 29, 2022 at 2:10:40 AM UTC link Permalink

https://tatoeba.org/en/sentences_lists/show/170869

Ceci est tiré d'une recherche aléatoire.

Vous pouvez parler en français si vous le souhaitez. Je ne le parle pas bien, mais je le comprends.

{{vm.hiddenReplies[39345] ? 'expand_more' : 'expand_less'}} hide replies show replies
Rafik Rafik December 28, 2022 December 28, 2022 at 7:51:41 PM UTC link Permalink

#11365801 De l'eau, s'il vous plaît !
Donnez-moi de l'eau, s'il vous plaît !
#9632897 J'ai accouru à Gerruma.
#9100073 je pense que c'est une expression idiomatique, littéralement : je vous les ai porté sur le dos.
#9121136 elle visitera la ville de Béjaïa.
#7535051 est-ce vrai que c'est pour la langue kabyle que vous vous battez ?
#9486175 : y a t'il une autre façon d'écrire astrologie en islandais ?
#9111757 je vous les ai toutes partagé.
#11329469 j'ai goûté de la viande tout à l'heure.
 #9459560 vous avez protégé ma grand mère contre tout.
#11335969 vendez-lui du beurre.
#9576035 vous avez prit une baklava.

C'est juste un échantillon des phrases que vous avez soumis,Je n'ai pas assez de temps pour les traduire toutes, mais si vous insistez je le ferrais pour vous tout à l'heure.

{{vm.hiddenReplies[39346] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US January 2, 2023, edited January 2, 2023 January 2, 2023 at 6:46:50 PM UTC, edited January 2, 2023 at 6:48:30 PM UTC link Permalink

Accessing the sentences via random search hides the patterns that are used to create them. Here are a number of examples of those patterns:

(1) "Ad telḥu ɣer" (= She will walk to) + placename (83 results)
https://tatoeba.org/en/sentence...roved=no&user=
Examples:
Ad telḥu ɣer Brazil.
Ad telḥu ɣer Budwaw.
Ad telḥu ɣer Timyawin.

"Ad terzu ɣer" (= She will visit) + placename (108 results)
https://tatoeba.org/en/sentence...roved=no&user=
Examples:
Ad terzu ɣer Brazil.
Ad terzu ɣer Rwiba.
Ad terzu ɣer Tubiret.

"Ad tnehṛemt ɣer" (=You will drive to) + placename (88 results)
https://tatoeba.org/en/sentence...C9%A3er%22&to=
Ad tnehremt ɣer Budwaw.
Ad tnehremt ɣer Bgayet.
Ad tnehremt ɣer Tubiret.

These patterns cast serious doubt on the assertion that the sentences added by the user in question are all significant, varied, and useful. Mass-producing sentences like this results in a banal collection of sentences for any language. Anyone who does this is actually doing their language a disservice. Having to search through a large number of nearly identical sentences discourages potential translators and language learners alike.

lbdx lbdx December 31, 2022 December 31, 2022 at 9:10:42 AM UTC link Permalink

I am convinced that the Tatoeba Corpus would grow more harmoniously if the addition rate of original sentences was capped (e.g. to 3,000 sentences per month). This would rebalance the corpus lexically and encourage more contributors to participate as their sentences would gain more visibility.

Of course, a member who would exceed the limit would not be permanently banned but temporarily suspended depending on the extent of the excess. Besides, exceptions could be made in some special cases.

{{vm.hiddenReplies[39351] ? 'expand_more' : 'expand_less'}} hide replies show replies
sundown sundown January 2, 2023, edited January 2, 2023 January 2, 2023 at 7:15:07 PM UTC, edited January 2, 2023 at 8:37:48 PM UTC link Permalink

> I am convinced that the Tatoeba Corpus would grow more harmoniously if the addition rate of original sentences was capped

I agree. Something like this should've been done years ago.

{{vm.hiddenReplies[39360] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US January 3, 2023 January 3, 2023 at 12:53:38 PM UTC link Permalink

I also agree, though I would favor a lower cap on the rate of addition of sentences.

{{vm.hiddenReplies[39364] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx January 3, 2023 January 3, 2023 at 5:37:51 PM UTC link Permalink

> I would favor a lower cap on the rate of addition of sentences

How about 1,500 original sentences per month (i.e. an average of 50 additions per day for a month)? If this cap had been in effect last month, only 4 contributors would have exceeded it. Source: https://colab.research.google.c...10&uniqifier=1

imalaqvayli imalaqvayli January 4, 2023 January 4, 2023 at 9:37:32 PM UTC link Permalink

caping the number of new sentences per month is good yes, but 1500 is too low, 3000 is a good number I think.

Behind this I think also that we may make mandatory to add at least one translation per created sentence, this will improve the quality of the corpuses. 99% of the users are good in at least 2 languages. What do you think @AlanF_US?

Cangarejo Cangarejo December 29, 2022 December 29, 2022 at 11:15:00 AM UTC link Permalink

What’s OpenGPS?

{{vm.hiddenReplies[39347] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez December 29, 2022 December 29, 2022 at 5:59:26 PM UTC link Permalink

A GPS app that the Kabyles are developing of all the villages in the region. They want sentences for every village/place name so they can record audio for it and use it for the app, instead of machine audio (for which there is not enough data). I wonder if a different platform is a better host for the sentences with this purpose though, if they otherwise provide not much linguistic value.

{{vm.hiddenReplies[39348] ? 'expand_more' : 'expand_less'}} hide replies show replies
Objectivesea Objectivesea December 30, 2022 December 30, 2022 at 4:32:45 AM UTC link Permalink

Yes, @DJ_Saidez, if all you ultimately want are pronounced versions of individual proper nouns rather than full sentences, one might suggest a site like Forvo.com for this task, since Tatoeba's very raison d'être is to have complete sentences, and someone would then have to extract the village names by editing sound files for the sentences.

samir_t samir_t January 3, 2023, edited January 3, 2023 January 3, 2023 at 5:33:11 PM UTC, edited January 3, 2023 at 5:40:29 PM UTC link Permalink

Excepté les phrases où des noms de lieux sont utilisés et où des répétitions sont certes visibles, je confirme que les autres sont utiles et apportent un plus au corpus. Je vois d'ailleurs que @Rafik, dans sa réponse, en a traduit un bon échantillon qui représente à peu près toutes ces phrases. Le fait qu'elles soient si nombreuses ne signifie pas forcément qu'elles sont de mauvaise qualité.