Muro (4147 fadenoj)

<<< 1234567 >>
Hybrid
antaŭ 3 tagoj
I see that Horus still doesn't have an avatar. Maybe we could just use this temporarily: http://en.wikipedia.org/static/...g/hiero_G5.png

It means "Horus" in hieroglyphs.
kaŝi la respondojn
CK
CK
antaŭ 3 tagoj
It's already an issue on GitHub.
https://github.com/Tatoeba/tatoeba2/issues/811
Sometimes we just need to be patient. Even though this wouldn't take long to do, TRANG likely has other priorities.
kaŝi la respondojn
Hybrid
antaŭ 3 tagoj
Thanks for your answer.
gillux
antaŭ 20 horoj
I think It’s a bit too small. It fits for messages like on the Wall, but not for the profile page.
kaŝi la respondojn
Hybrid
antaŭ 2 horoj
How about this one: http://upload.wikimedia.org/wik...ion_Detail.jpg

If you can remove the other parts.
kaŝi la respondojn
gillux
antaŭ 2 horoj
I set it already.
kaŝi la respondojn
Hybrid
antaŭ 2 horoj - redaktita antaŭ unu horo
Thanks, Gillux. You're awesome! It's perfect.

link: http://tatoeba.org/eng/user/profile/Horus)

Edit: Now, you just have to set his country of origin to Egypt :)
Amastan
antaŭ 15 horoj
Good morning Tatoeba,

This month, Amazigh (or Berber) has been recognized as the second official language of Algeria alongside Arabic. To all Amazigh-language activists in Algeria and elsewhere, this is a dream come true.

http://www.bbc.com/news/world-africa-35515769

Amazigh (as we prefer to call it) will at last have the right to appear in official documents and be taught in every single school of our national territory.

Tatoeba's corpus (which contains more than a 100,000 Amazigh sentences) will certainly help learners, authors and teachers of Amazigh in their various language activities. With a corpus of this size, Tatoeba is one of the few rich resources of Amazigh available online.

I would like to thank all those who contribute to making this website a good and valuable resource for language promotion, learning and preservation and I think that Tatoeba will continue to be a good tool for the promotion of Amazigh online for many years to come.



Tifawin a Tatoeba,

Ayyur-a, tamaziɣt yettwasteɛṛef yes-s d tutlayt tunṣibt ɣer yidis n taɛṛabt deg Ldzayer. Aya yella-d d targit yeffɣen ɣer tidet i wakk imeɣnasen n tutlayt tamaziɣt deg Ldzayer.

http://www.aps.dz/tamazight/tal...un%E1%B9%A3ibt


Tamaziɣt ad tesɛu, imir-a, azref akken ad d-ters deg teftarin tunṣibin yerna ad tettwasselmed deg yal aɣerbaz deg wakal-nneɣ aɣelnaw.

Asagem n Tatoeba (aydeg llant ugar n 100.000 n tefyar s tmaziɣt) ad iɛawen, war ccek, inelmaden, imeskaren d yiselmaden n tmaziɣt deg yirmuden-nsen yemgerraden yerzan tutlayt-a. S usagem anect n wa, Tatoeba yuɣal d yiwet seg teɣbula timeṛkantiyin timidrusin n tutlayt tamaziɣt ay yellan deg Internet.


I lmend n waya, ɣseɣ ad snemmreɣ akk wid ay ixeddmen akken ad rren adeg-a d taɣbalut yelhan yerna s wazal-nnes deg usmal n tutlayin, almad ed uḥraz-nsent, yerna ttwaliɣ dakken Tatoeba mazal ad ikemmel ad yili, i wacḥal n yiseggasen, d allal i usmal n tmaziɣt deg Internet.





kaŝi la respondojn
sabretou
antaŭ 14 horoj
Congratulations, Amastan! :)
odexed
antaŭ 11 horoj
ما أحسن خبر!
أهنّئك بمناسبة نجاحك يا أخي.
sacredceltic
antaŭ 10 horoj
Enfin une injustice de réparée et la fin du mythe ridicule des pays « arabes » monolingues, dans lesquels ni l'ethnie arabe ni la langue arabe n'ont pourtant jamais été majoritaires.

Et encore bravo pour le travail que tu as effectué et qui est remarquable !
al_ex_an_der
antaŭ 8 horoj
Wa d isali yessefṛaḥen. Nessaram-ak akk afud igerrzen. :)
TRANG
antaŭ 4 horoj
Thank you for sharing the news, Amastan.

I am very grateful that we have contributors like you in our community. Tatoeba is only able to provide a valuable resource for Amazigh (or Berber) thanks to your hard work and dedication, so you are the one who should be thanked the most :)
Hybrid
antaŭ 2 horoj
Congratulations!
sacredceltic
antaŭ 9 horoj
*** Statistiques des phrases selon la langue natale déclarée du contributeur ***

J'ai fait l'analyse de données brutes fournies par CK du nombre de phrases par contributeur et par langue.

Tatoeba compte 2722 contributeurs qui ont déclaré une langue natale (de diverses manières discutables) et qui totalisent 3.766.000 phrases dans des langues vivantes non construites (j'ai exclu les langues mortes et les langues construites)

En analysant la répartition de ces contributeurs en fonction du nombre de contributions par langue vivante non construite et du taux de contribution dans leur langue natale déclarée, j'obtiens les résultats suivants :

Nombre de phrases maximum créées dans une langue / % de contributeurs ayant contribué à plus de 50% dans une langue différente de leur langue déclarée

moins de 50 phrases => 9,83%
moins de 100 phrases => 8,40%
moins de 500 phrases => 7,64%
plus de 500 phrases => 2,88% (12 contributeurs seulement sur 416)
plus de 1000 phrases => 2,32% (6 contributeurs seulement sur 259)

Ceci nous indique que moins un contributeur crée de phrases, plus il y a de chances qu'il en crée majoritairement dans une langue qui n'est pas sa langue natale.
Inversement, plus un contributeur est actif, plus il contribue majoritairement dans sa langue natale.

Comment j'interprète cela :
- Les nouveaux venus ont davantage tendance à jouer avec le service, en expérimentant la création de traductions dans différentes langues.
- Les contributeurs plus massifs, et donc plus aguerris, comprennent mieux les enjeux qualitatifs du service.

Si l'on veut afficher une meilleure qualité, je suggère donc, sur la page d'accueil qui est le premier contact avec les nouveaux venus, de non seulement ne pas afficher les phrases non adoptées, mais de ne pas afficher non plus les phrases des nouveaux venus, qui ont plus de chances de n'être pas des phrases de leur langue natale, et donc présentent un plus grand risque d'erreur. Une sorte de mise en quarantaine.
Ceci devrait permettre de moins dissuader les nouveaux venus qui sont rebutés par des fautes qu'ils constatent dès le premier contact.
Cela devrait s'appliquer non seulement à la phrase qui s'affiche aléatoirement, mais bien sûr aussi à ses traductions.

À savoir, le "champion" toutes catégories a créé des phrases dans 130 langues différentes (je vous laisse en imaginer la qualité), le 2e dans 54.
60 contributeurs ont contribué dans plus de 10 langues vivantes non construites...
kaŝi la respondojn
gillux
antaŭ 6 horoj
Merci pour cette analyse intéressante.

> Si l'on veut afficher une meilleure qualité, je suggère donc, sur la page d'accueil qui est le premier contact avec les nouveaux venus, de non seulement ne pas afficher les phrases non adoptées, mais de ne pas afficher non plus les phrases des nouveaux venus, qui ont plus de chances de n'être pas des phrases de leur langue natale, et donc présentent un plus grand risque d'erreur. Une sorte de mise en quarantaine.

C’est une bonne idée, mais sur quel critère détermine-t-on qu’un contributeur est ou n’est pas un « nouveau venu » ?
kaŝi la respondojn
sacredceltic
antaŭ 6 horoj - redaktita antaŭ 6 horoj
À voir. J'imagine une combinaison de nombre de phrases (plus de 50...) et de durée d'inscription (plus d'un mois...). Il faudrait instaurer un statut de « nouveau venu » qui se transforme automatiquement en statut « contributeur normal » dès qu'on a passé ces critères.

Le problème que je vois, au niveau de la qualité, c'est que très peu de contributeurs étiquettent OK les phrases. Je suis un des rares à le faire en français, et j'avoue que ce n'est pas mon activité principale, étant donné qu'elle n'est pas réciproque. La seule raison pour laquelle il y en a beaucoup en anglais, c'est que CK auto-attribue cette étiquette à ses propres phrases, ce qu'il est, à ma connaissance, le seul à avoir droit.

Si l'étiquetage OK était plus répandu (mais il faudrait alors davantage mobiliser les contributeurs avancés et les gestionnaires...), on pourrait également se baser sur ce critère (plus de 10 phrases OK...), encore que c'est compliqué, parce que j'ai déjà attribué plus de 10 étiquettes OK à des contributeurs dont le français n'est pas la langue natale et qui font par ailleurs plein d'erreurs...

Soit dit en passant, on pourrait conditionner le passage en contributeur avancé, par l'étiquetage OK (ou non OK...) d'une certaine quantité de phrases de sa langue natale. Ça ferait progresser cette classification en même temps que la qualité des phrases (4 yeux valent mieux que 2)
sacredceltic
antaŭ 6 horoj
Si vous voulez vous inspirer, au niveau du développement, vous pouvez utilement regarder le site http://stackoverflow.com/
Ce que je trouve bien sur ce site, c'est le système de « crédits » généralisé.
Chaque action procure une somme de crédits qui permet 2 choses :
1) elle crédibilise le contributeur par l'affichage de ses crédits
2) elle autorise certaines actions qui sont soumises à des paliers

On pourrait considérer, par exemple un système comme suit (qui est déjà partiellement en place, avec la notion de « contributions » qui s'affichent sur les profils mais je ne sais pas s'il fonctionne vraiment et ça ne conditionne rien...)

créer une phrase = 2 points
commenter une phrase = 1 point
traduire une phrase (créer une phrase + 1 lien de traduction) = 3 points
lier 2 traductions = 1 point
recevoir une étiquette OK = 3 points
appliquer une étiquette OK = 1 point

Pour passer de « nouveau venu » à « contributeur », il faudrait avoir accumulé 100 points sur au moins 1 mois.
Pour passer à « contributeur avancé », il faudrait avoir accumulé 10.000 points sur au moins 1 an.
SuperNicky
antaŭ 6 horoj
Köszönöm srácok! Thank you guys! :) #al_ex_an_der, #maaster
CK
CK
antaŭ 20 horoj
Sentences Counts - Sorted by Username - Then by Language

(Usernames with Declared Native Languages)
http://bit.ly/sentencecounts1

(Usernames with No Declared Native Languages)
http://bit.ly/sentencecounts2

Someone sent me a private message about this. Perhaps others will find it interesting, too.
kaŝi la respondojn
Shishir
antaŭ 10 horoj
indeed, thanks!
SuperNicky
hieraŭ
Hello! I am new here! :) I am from Hungary. :)
kaŝi la respondojn
al_ex_an_der
hieraŭ
Üdvözlünk a Tatoebán! A Tatoeba lehetőséget ad a számodra, hogy kipróbáld magad a fordítás művészetében. A japán szó "Tatoeba" jelentése "például".
maaster
hieraŭ
Szuper, Nicky! Üdv a Tatoebán! Welcome on Tatoeba!
pullnosemans
antaŭ 6 tagoj
**deleting unowned english sentences**
**introducing a feature to keep unowned translations from being displayed**


to extract one central point from the recent big discussion about improving the quality of the tatoeba corpus, I would like to again address separately the issue of deleting the unowned english sentences (and possibly, the japanese and/or french ones).

ck has stated that they have checked the entire english corpus and adopted all sentences they deemed worthwile, so that the percentage of good unadopted english sentences right now should be "very low", if not zero.


*since the discussion about deleting the unowned sentences has not yielded any real results, I hereby request that they be deleted, and ask anyone interested to state whether they are for or against this.*


I want them to be deleted because it often happens that I search for a japanese word, and most if not all hits have only one direct translation in one of the languages tatoeba displays to me, english. these translations are frequently unowned sentences from the tanaka corpus, and they feel very unnatural to me. they also include many prime examples of my recently mentioned problem of tatoeba sentences lacking context (e.g. "he shuddered at the sight.": who? what sight?).
this generally makes me abandon the japanese sentence they are linked to as well because my japanese is not good enough to judge whether a sentence is good or not.
this process is always frustrating, because I have to read through sentence after sentence only to find that the english translation is rubbish, or at least untrustworthy. I would much prefer not having these translations displayed at all and speeding up my search by being able to only pay attention to sentences that are usable.


the situation is more complicated with japanese, where deleting all unowned sentences would mean losing 60-70% of the corpus, so I cannot really say anything on this one.
with french, I have only read one clearly stated opinion, which was sacredceltic's, saying that he would rather have them deleted right away because they are "a huge smear on the french corpus".


alternatively, I call for the feature of not displaying orphan sentences by default to be extended to direct and indirect translations of owned sentences. this way, even if the bad sentences remain, they are easy to avoid entirely.
another alternative would be not deleting the sentences, but simply hiding them from public display until a decision is made on how to treat them.

the only option I think we should definitely NOT go for is simply leaving things as they are.
kaŝi la respondojn
odexed
antaŭ 6 tagoj
> so that the percentage of good unadopted english sentences right now should be "very low", if not zero.

If so, I wonder why there are still unowned English sentences tagged 'OK'

https://tatoeba.org/eng/sentenc...amp;sort=words

The same for Japanese
https://tatoeba.org/eng/sentenc...amp;sort=words
kaŝi la respondojn
CK
CK
antaŭ 6 tagoj - redaktita antaŭ 6 tagoj
2 points concerning what Odexed wrote....

I was misquoted. I never said "if not zero". You can see what I wrote in the 2nd paragraph of this post. https://tatoeba.org/eng/wall/sh...#message_25153

I wouldn't want to adopt those English sentences (https://tatoeba.org/eng/sentenc...amp;sort=words).
That doesn't necessarily mean that someone else won't adopt them.
Nowadays, I prefer to only adopt sentences that I think are the kind of good example sentences that I would have personally contributed.

I didn't check to see who tagged them OK, but perhaps they were tagged OK because there weren't any obvious grammar errors. I personally don't think just "lack of grammar errors" is enough for an OK tag.

kaŝi la respondojn
sacredceltic
antaŭ 5 tagoj
>I personally don't think just "lack of grammar errors" is enough for an OK tag.

It's funny how a simple tag and judgment such as "OK", or a behaviour such as adopting a sentence can mean so many different things, since they've never been clearly defined.

Personally, my policy is the following : outside of a clear definition (which I might not accept when defined...), I adopt only sentences that "I would say", except under the circumstances that I create a sentence in Tatoeba that I wouldn't say, but which, according to me, needs to be documented on Internet, because it's living but rare or interesting, and/or local, and I have confirmed the proof of its existence with local natives (for instance, I wouldn't say myself a lot of sentences that I tagged "Belgian-French", but I am sure they do exist, and I not only heard them but cross-checked them with locals, asking them to provide examples of use in context.)

But I OK-tag sentences that I know are OK, even though I wouldn't say them myself.
The nuance might seem minor, but that's actually a hell of a difference, because there are lots of sentences in my native language that I know are OK but which I don't use myself.
I don't tag "OK ", though, sentences that I disapprove of, although they might represent some kind of "valid" language...

So I apply a different policy when it comes to adoption and OK-tagging.
CK
CK
antaŭ 5 tagoj - redaktita antaŭ 5 tagoj
I scanned the original post. I didn't read the original post carefully, since I find it tedious to read so much text that doesn't have proper capitalization, so I may have missed a few things.

Here are my opinions on the 2 things that have been suggested.


**deleting unowned english sentences**

This might be a bad idea.

Eventually, I do agree it would be a good idea to get rid of these sentences. However, I think it should be done carefully.


**introducing a feature to keep unowned translations from being displayed**

This might be a good idea.

This would somewhat solve the problem. However, there would need to be a way for members to find these, so they could be reviewed, adopted and edited if needed.


Another idea would be to color code all unowned sentences, and possibly include an icon, too. We already do that with sentences that are possible copyright infringements or that are problematic in some other way.

If we chose another color for unowned sentences, then members would immediately know that the sentence may not be a good one to use for language study and probably shouldn't be translated. Perhaps it would be possible to make such sentences untranslatable, or at least not by members who haven't been registered for at least a few months. We already have it so that the "red" sentences can't have translations added by anyone.

One thing to note is that we still have many incoming English sentences that are as bad as a lot of the unadopted sentences. The same is true for Japanese, and I would assume for other languages as well.

If we could get a color-coding system working, it might be a good idea to color-code all non-native sentences, too. This would help warn members about sentences that may more likely contain errors or not sound natural.


If you want to see some color-coding ideas, see these pages. These are pages that were already online and not specifically made as examples for this discussion.

English Sentences from Tatoeba.org with Audio
http://study.aitech.ac.jp/audio/14.html
Read the message at the top of the page to understand the color-coding.

Browsing Japanese Sentences, Showing Ratings
http://www.manythings.org/corpus/browse/919.html
Jump to the first page to see sentences rated OK and tagged OK.
Jump to the last page to see sentences rated "not OK".


TRANG
antaŭ 5 tagoj
> since the discussion about deleting the unowned sentences has not yielded any
> real results, I hereby request that they be deleted, and ask anyone interested
> to state whether they are for or against this.

I think deleting them can actually affect Jim Breen as well, since he relies on the Japanese-English pairs, and not the Japanese sentences alone.
But it's not only a problem for Jim Breen. It can affect many users, it can even affect you. It all depends what situation you are in.

The reason why you want them to be deleted, if I understood correctly, is because:
- You search Japanese sentences.
- You often find that the English translations are bad.
- As result you don't trust the Japanese sentences and don't want to use them.

If we delete the bad English sentences, there are 2 situations.

1. You search Japanese sentences **translated into English**.
==> The Japanese sentences that were linked to the bad English sentences won't appear anymore.
==> As result your search results are not polluted by sentences you wouldn't trust.
==> But if the words you are searching for were only illustrated by these sentences that had bad English translations, you would get no result at all. Maybe you are fine with this trade-off, but maybe not everyone else is.

2. You search Japanese sentences **without specifying the target language**.
==> The Japanese sentences that were linked to the bad English will still appear.
==> But they won't have an English translation anymore, therefore whoever finds them and doesn't speak Japanese wouldn't even have an idea of what they mean.

I think what you actually need is a better ranking of the search results: sentences which translations are all unadopted sentences should be displayed lower in the results, and sentences which translations are all owned by trusted contributors should be displayed higher in the results.

Another thing we can consider doing as well is to mark all English unadopted sentences as unapproved. This joins CK's idea of displaying unadopted sentences in another color and would allow you to detect more easily sentences you want to trust and sentences you don't want to trust. I'm not entirely sure about this though. Even though the consequences are not as significant as deleting the sentences, there are still some consequences we need to consider.
kaŝi la respondojn
odexed
antaŭ 5 tagoj
I think it could be some kind of solution if pullnosemans used 'Advanced search' where he can set the option 'Is orphan' to 'No' on the right side (concerning Translations) so that he wouldn't see any unadopted English translations in the search results.

Also I would like to remind that some people don't adopt quite good sentences just because they see some rude words or expressions. They may not use this kind of language in real life but it would be still a good example of colloquial real speech.
pullnosemans
antaŭ 4 tagoj - redaktita antaŭ 4 tagoj
**to CK**
"I was misquoted. I never said "if not zero"."
"I didn't read the original post carefully, since I find it tedious to read so much text that doesn't have proper capitalization, so I may have missed a few things."

One of the things you missed is that I only quoted you saying the percentage was "very low"; "if not zero" was outside of the quotation marks.
I'm sorry that my disuse of capitalisation is hard for you to read. I'll try to remember this when I post something in the future.


"[opinions on deleting unowned sentences or modifying the non-display feature]"

I like the idea of using colour coding. If I remember correctly, the last time we talked about this we already addressed the issue of colourblind people and agreed to use colour coding in combination with icons. I think this could be implemented very well.




**to everyone**

Unfortunately, I wasn't successful in my request to have people clearly state "for deleting" or "against deleting", but as far as I read the answers correctly, for now everyone is against it because they think it needs to be done carefully, even if right now no one seems to have a specific idea on what this would mean.

I thank everyone who suggested other ways to solve my personal problem with the unadopted sentences in English and Japanese, via commenting on this thread or sending a pm, and I will try all the suggestions out and see what they can do for me.


However, and this is very important:
I have not primarily started this thread because I have a problem with my search results working with the Japanese corpus.
I have started it because Tatoeba right now has a significant problem that is frequently being addressed, and yet even after the last rather big discussion about the problem, we arrived at no result at all. The discussion just sort of ended.


I may be reading too much into this right now, but I am getting the impression that another important thing Tatoeba lacks right now is a sense of determined companionship, teamwork, call it whatever. To me, subjectively, most people on here appear to be doing their thing, trying to work around things they don't like, and through everyone's individual interest or belief in the project, things eventually do change, but much less than they could.
I can think of some prime examples of people just contributing their own stuff without a lot of context or interaction with others and then simply leaving, as well as of some examples of people who do work together to achieve something that just makes Tatoeba a better site as a whole, but the point here is not to scold or praise any individual people.
The point is to tell you that I think with all the competent people that we have on here, that are constantly involved each on their own, we could gain much more momentum if the completely open landscape of the site would develop a more intimate, closely-knit core. Right now, it appears to me like it's simply TRANG's site, and everyone else is just getting involved where they please, some more and some less, but generally unable to really get together to get something moving on a larger scale, and often times even against one another instead of together. A lot of energy simply evaporates, people shout their thoughts into the prairie and then mostly go on doing their own thing again. This is also the case with this thread: Everyone states their opinions, but there is no certainty at all that in the end *anything will be implemented*. The only concrete proposal was made by CK, and so far, no one has said anything about it.

What could be happening instead is having a thread started in a forum after choosing a single problem to work on (in this case, "What should we do with the large number of unowned sentences right now?") with a 100% goal of arriving at a decision what do to with them within a week. This thread would not simply be started, but the problem would be chosen to be the next one on which a thread is started by the community.

I will try to find the time to create the poll I recently spoke of, and see if it finds any response and can actually start a change to this to a certain degree. But the paradoxical thing about this is that if I am the only one who thinks this should happen, I myself, as an individual, will not be able to make it happen.
I will therefore now open another thread with a link to this one (thanks for the suggestion, CK) where I will ask whether people are interested in this concept, and they think making such a poll would be worthwhile.
kaŝi la respondojn
TRANG
antaŭ 2 tagoj
​> I have started it because Tatoeba right now has a significant problem that is
> frequently being addressed, and yet even after the last rather big discussion
> about the problem, we arrived at no result at all.

I wouldn't say "no result at all" :)

First, you got suggestions that could concretely solve your current frustration with the unadopted sentences. I would by the way be interested if using the advanced search works out for you.

And the discussion itself brought further confirmation regarding the question: why does it matter so much to improve the quality of the corpus and what can we do about it?

The main reasons that people have expressed so far, about why quality matters, are:

- Because bad sentences makes the project look bad. It makes us look not serious and we won't attract more contributors if we don't look serious.
- Because bad sentences bring a bad user experience. When there are too many of them, it takes too much effort for users to find sentences they can rely on.

Your instincts are naturally telling you that solving the problem is a matter of removing these sentences. No more bad sentences, no more problem. It is indeed a possible solution but you have to be aware that it is not a sustainable, nor scalable solution. It's like taking pain killer. It doesn't solve the root problem, but makes the pain go away for a short time. And it may have side effects that are worse that the initial pain.

If we want to address this issue in the long term, the questions we should actually try to answer are:

1. What are our criteria for good and bad quality?
2. How do we teach people to contribute sentences with better quality?
3. How do we make Tatoeba more resilient to bad quality?

These are to me quite difficult questions, but they are the questions we need to work on if we want to seriously solve the problem of quality.
kaŝi la respondojn
CK
CK
hieraŭ - redaktita hieraŭ
>The main reasons that people have expressed so far, about why quality matters, are: ...

One additional reason is that we aren't meeting one of the aims of the Tatoeba Project.

In a blog entry on 2009-11-28:
So the concept is : we gather a lot of data, try to organize it, ensure it is of good quality and make it freely accessible, downloadable and redistributable, so that anyone who has a great idea for a language learning application (or a language tool) can just focus on coding the application and rely on us to provide data of excellent quality.


> 1. What are our criteria for good and bad quality?

While there might be a gray area between what people consider good and bad in some cases, I think that many of us could likely agree that some of the items in the corpus are definitely wrong and should be eliminated.

There may always be disagreement on some points. For example: Is a sentence considered "good" if it's not what a native speaker would say, uses the wrong vocabulary choice or has grammar errors, but communicates the intended idea? Is a sentence considered "good" if it is utter nonsense, but is grammatically correct?


> 2. How do we teach people to contribute sentences with better quality?

One obvious way is to really encourage people to contribute in their native languages. It's very easy to sound natural in your own native language, and very easy to sound unnatural in your non-native language.

Even if some sentences by non-native speakers are good, it's really hard to trust that they are good, so members would be helping us much more by limiting their contributions to sentences in their own native languages.


> 3. How do we make Tatoeba more resilient to bad quality?

This is somewhat related to Number 2. If we increase the percentage of good quality sentences, then the bad sentences become more obvious, so members are less likely to just ignore them. If most members, or all members, resisted the urge to contribute in their non-native languages, and also kept encouraging others to do the same, we would have fewer incoming bad contributions.

We also need to make it very clear to new members that this is not a site similar to websites such a www.lang-8.com where the purpose is to have others correct what you have written in a language you are learning.

By allowing so many bad sentences to remain in the Tatoeba Corpus, things will likely get worse. The Broken Window Theory somewhat applies, I think. (https://en.wikipedia.org/wiki/B...windows_theory)



kaŝi la respondojn
sacredceltic
hieraŭ
>We also need to make it very clear to new members that this is not a site similar to websites such a www.lang-8.com where the purpose is to have others correct what you have written in a language you are learning.

It's true that this has always been the big ambiguity of Tatoeba, which, at first, may look as a playing ground for learners and is often perceived as such by newcomers, as a result.

>The Broken Window Theory somewhat applies, I think.

It does very much.
TRANG
hieraŭ
> ​so that anyone who has a great idea for a language learning application
> (or a language tool) can just focus on coding the application and rely
> on us to provide data of excellent quality.

On that topic, it seems to me that quality is currently not a big issue for other projects who want to reuse our data. The quality of our content is good enough that third parties can start developing something while having tangible data to work with. Their main issues is that they need to do a lot of work on processing our data in order to tailor it to their need (i.e. extracting only the sentences they need, restructuring it to fit into their system).
Therefore for this goal, the main priority would definitely not be improving the quality of the sentences, but rather providing tools to make it easier for other projects to reuse our data.


> While there might be a gray area between what people consider good and bad in
> some cases, I think that many of us could likely agree that some of the items
> in the corpus are definitely wrong and should be eliminated.

The gray area is the biggest issue though, isn't it? It seems to me that most the unadopted English sentences are part of this gray area.


> One obvious way is to really encourage people to contribute in their native
> languages.

My current impression is that most people already contribute mostly in their native languages and that people not contributing in their native languages are not significantly dragging down the quality of the corpus. I could be wrong but it doesn't feel that we need to invest much more efforts into this than we already do, since people generally understand already that they should contribute in priority in their native languages.

What I really meant when I asked "How do we teach people to contribute sentences with better quality" was how do we teach people to improve (the improvement is what I want to focus on), regardless which language they are contributing in.
Things such as what process can someone go through in order to check the quality of their sentences, and what tools can they use to help them in that.


> If we increase the percentage of good quality sentences, then the bad sentences
> become more obvious, so members are less likely to just ignore them.

I don't think the percentage of good quality sentences has anything to do with the obviousness of bad sentences. Bad sentences become more obvious only when we have a clear and agreed definition for them.

Having a higher percentage of good quality sentences would rather trick people into believing that bad sentences are actually good, I think. If their impression is that 99.99% of the sentences are good, when they stumble upon a sentence that is in the 0.01% remaining, they would be less likely question it.


> By allowing so many bad sentences to remain in the Tatoeba Corpus, things will
> likely get worse. The Broken Window Theory somewhat applies, I think.

I have a very different vision on this. You're going with the assumption that more bad sentences means things are getting worse, and less bad sentences means things are getting better.

My assumption is that bad sentences are part of the deal. There will always be bad sentences being added and we can never stop it. So rather than spending efforts on figuring out how to reduce the number of bad sentences, I would rather spend efforts on trying to design a system in which no matter how many bad sentences you pour into it, it will still manage to deliver a good experience and a good service.
CK
CK
antaŭ 2 tagoj
We have a new Japanese voice on tatoeba.org. Her username is huizi99.

You can hear her voice on sentences at the top of this list.

https://tatoeba.org/sentences/s...nly-with-audio
kaŝi la respondojn
sharptoothed
antaŭ 2 tagoj
Really nice recordings. Thanks, huizi99-san! We want more! :-)
deyta
antaŭ 2 tagoj - redaktita antaŭ 2 tagoj
Hala inanamıyorum. Bu ne güzel ses.
Japon çizgi filmlerinden fırlamış gibi.

Keep up the good work, huizi99.
CK
CK
antaŭ 3 tagoj - redaktita antaŭ 3 tagoj
Though I know it might be confusing to other members, I wonder if anyone would object too much if I used the current "rating system" (called "collections") in the following manner.


* The "OK" Rating (https://tatoeba.org/eng/collect.../ok/page:99999)

1. English sentences I recommend translating first.
(Currently this list: https://tatoeba.org/eng/sentences_lists/show/4000)

2. All English sentences I use in my projects.
(Currently this list: https://tatoeba.org/eng/sentences_lists/show/907)


* The "Unsure" Rating

3. Sentences I've chosen not to use for now and am unlikely to ever use, but I may come back and review them again for possible use.

4. Sentences I've ignored and don't want to rate. Some of these are just automatically filtered out because they are too long, contain certain common errors, or for some other reason.


* The "Not OK" Rating

5. Sentences I'm very, very unlikely to ever use. I don't plan to go back and review these again.


For all the lists used for 3 through 5, see http://bit.ly/tatoebafiltering if you are interested.



** Why? **

This would make it a lot easier for me to filter-in and filter-out sentences that I use on my own projects, since when viewing sentences I can easily see which of the 3 groups a sentence is in, and whether I've "rated" it already or not. (http://prntscr.com/a1ou23)

The problem is, of course, that the current "OK", "Unsure", and "Not OK" words don't really represent what I would mean by my "ratings."

This is still not ideal for what I would like to do, but would be a way that I could more efficiently use tatoeba.org as it is.

kaŝi la respondojn
sacredceltic
antaŭ 3 tagoj
Does it mean that you're going to change the tags on the sentences ?
Otherwise, I can't see why anybody would object to anybody handling lists...or did I miss something ?
al_ex_an_der
antaŭ 3 tagoj - redaktita antaŭ 3 tagoj
I think these categories "OK", "Unsure", "Not OK"
should allways have the same basic meaning.
As far as I understand, their meanings are approximately as follows.

OK
According to the standards commonly respected in the concerning language.

Unsure
Something seems (to me) not quite right.
Somebody (more competent than me) should check whether this sentence is OK or not.

Not OK
Not according to the standards commonly respected in the concerning language.
In a comment beneath, I specified why and/or proposed necessary changes.

You may have better definitions, and every definition permits some interpretation (what is OK for one person, may be hardly acceptable for another one). But at any rate, I'm in favor of finding and sticking to common definitions. Otherwise the whole classification would become pointless, wouldn't it?
pullnosemans
antaŭ 4 tagoj - redaktita antaŭ 4 tagoj
**creating a team of coordinated core members on Tatoeba**

I think that now that Tatoeba has grown into a community of decent size, with some fairly constantly committed members, it is time to think of creating a team of core contributors.

Many ideas are presented on this wall, but they are often so uncoordinated that they are lost in the chaos of everyone just saying what they think should be done without any possibility of them actually doing it because in the end, they have no say in what is done with the site. There are no clearly assigned roles as to who is able to decide what.

To make better use of the people's ideas, I think we need to have some kind of interface where a discussion can be started with the fixed goal that at the end of the discussion, all suggestions are taken into account and a decision is made.

To be able to do this, I suggest we form a "family" of experienced and trustworthy Tatoebans who are familiar with each other's competences, willing to pick issues to work on in an organised manner, and then working together to make it happen.

For a more in-detail explanation of why I think we need this, see https://tatoeba.org/fra/wall/sh...#message_25456

I am thinking of taking the time to create a poll where people can say they would be willing to be members of this core community.

Do you think this would be a good idea?
If yes, what else do you think should be included in the poll?

Please let me know. Thank you.
kaŝi la respondojn
gillux
antaŭ 4 tagoj
I think your analysis is right and I have to say it is the same on the development side. Even though there are very few active developers (only me and Trang at the moment), we’re unable to coordinate and decide upon what to do. I remember I had trouble with this in the past [1], but I eventually stopped caring and I kept focusing on what matters to me (furigana for Japanese, advanced search…), while I sometimes fix bugs or add features on people requests when I’m in the mood.

> To make better use of the people's ideas, I think we need to have some kind of interface where a discussion can be started with the fixed goal that at the end of the discussion, all suggestions are taken into account and a decision is made.

I agree. This reminds me of the forum idea: https://tatoeba.org/wall/show_m...#message_19996

However, I think the tool is not the problem. If we’re unable to decide upon what do to after discussing a topic on the Wall, what would make using a different tool different? And like Trang said [1], how do we prioritize tasks? How do we gather people’s opinions in an efficient and relevant way? Since everyone have their own personal interests, I think it’s rather a political issue.

[1] https://tatoeba.org/wall/show_message/22454
kaŝi la respondojn
sacredceltic
antaŭ 4 tagoj
> I think it’s rather a political issue.

Tout est politique.
<<< 1234567 >>