Burada Tatoeba'nın nasıl kullanılacağı, hatalar veya garip davranışların nasıl raporlanacağı gibi genel sorular sorabilir ya da en basitinden topluluğun geri kalanı ile kaynaşabilirsiniz.
Soru sormadan önce SSS'yi okuduğunuzdan emin olun.
En son mesajlar
Wall (3688 threads)
Does Tatoeba have an automatic duplicate-checker, such that it can directly prevent users from posting a sentence which already exists in the corpus? If not, could that be implemented? I think it would be very useful.
P.S. How do I search this wall? I don't want to post things that have already been discussed, but I don't see any way to check previous topics without reading through all 3686 of them, which I'm not too keen on doing.
This is actually more useful, I think, when you think e.g. of the scenario where a new member adds common sentences and translates them into an underrepresented language that they hadn't been translated into before. If this member couldn't do so and was instead forced to use the search function every time to find the already existing common sentence s/he wishes to add and translate that one, their willingness would be depleted quite faster. This way, there is no necessity to bother and the script will take care of it quietly later.
1) The stats for sentences per language.
https://dev.tatoeba.org/eng/sta...es_by_language (edit: corrected link)
You can access them from the homepage, and then click on "show all language" below the list of top 5 languages.
2) The stats for the languages of the members, which is a new page.
On the dev, this page now shows up when you click in the menu "Members" > "Languages of members".
First, I'd like to know how you understand these stats. For instance, what do you think these numbers mean?
Second, I'd like you to add and remove languages in your profile, and check if the stats update as expected and report to me any problem.
> 1) The stats for sentences per language.
You mean https://dev.tatoeba.org/eng/sta...s_by_language, right?
I personally like the colored bust to represent “admin”, “corpus maintainers” etc. in the column header, but I guess it’s a no-go for colorblinds.
It would make more sense to move the link to show_all_in/<lang> to the sentence number cell, so that you’re brought to the list of sentences when clicking on the sentences number rather than the language name.
Sortable columns would be super useful.
> 2) The stats for the languages of the members, which is a new page.
The title columns “5, 4, 3, 2, 1, ?” are *very* cryptic, I think we should totally change them. The meaning of a number itself is different among languages (for instance Japanese uses 1 for the highest level, while others uses 1 for the lowest level). It’s also confusing because the table mixes cardinal numbers for sentences and ordinal numbers for levels, which are two totally different things yet represented by the same symbol.
Again, sortable columns would be super useful.
At least, it's clear to me what these numbers mean. I find the background colors also nice and intuitive.
Perhaps you could add a heading above like "Self-assessed levels" and mouseover texts "Native", "Fluent", etc. That would be enough in my opinion.
I doubt that Japanese uses 1 for the highest level any more than the rest of the world does.
In the Olympics, Number 1 is higher than Number 2.
On the Top 20 chart for music, Number 1 is higher than Number 2.
"I came in first" is better than "I came in second."
This should definitely be added.
I would assume that the "native" numbers might be pretty accurate, since people will (likely) know what their native language is and (likely) be honest about it.
However, I would also assume that many people's self-assigned "Level 4" might be higher than other people's self-assigned "Level 3" or even "Level 2."
A clear title on the page, a short explanation, and a "key" explaining what numbers and icons mean would clear up a lot of confusion. (These should all be above the table.)
The colored icons should perhaps have numbers in the icons, so those who are color blind can figure things out.
I, too, would like to see the table sortable.
For me, at least, it would be great if the default sort was on "native" rather than "total."
Guess you meant lower?
Anyway, it's true there is a huge discrepancy, as some people are remarkably balder (bolder?) about their language proficiency than the other. Perhaps we should have a rough guideline concerning what each level is supposed to be like.
I have updated the dev website to address the main issues that I felt were necessary to fix before the weekend.
I moved out the stats about the admins, corpus maintainers, etc to a dedicated page.
http://dev.tatoeba.org/eng/stats/native_speakers ("Members" > "Native speakers")
I hope things are clearer now.
I won't have time to implement the other suggestions, but I'm keeping them in mind. I don't want to spend too much time on these stats.
Just a note about the "progress bar" on the sentences stats page. They are not progress bars, they are a visual representation of the repartition of the languages in Tatoeba. You should look at it as a graph rather than a progress bars. Languages where there is no bar are languages that have barely any content compared to other languages.
So I'd suggest using the previous version, changing the title to "Languages on Tatoeba" and adding the captions "(Number of) Sentences" and "(Number of) Native Speakers".
The list of languages sorted by the number of native speakers is of some interest, to be sure, but I don't think it's worth dedicating a single page for it. It would just do if the table was sortable.
Yes, it's that page.
> I liked it because it would visitors a clear idea about the corpus and the community.
That was also my idea.
Initially I wanted to simply add a column that displays the number of corpus maintainer that we have for each language, because I wanted to clearly see how much we can take care of each language, which languages have enough corpus maintainers and which languages don't have any. And perhaps having such stats would encourage more people to try and become corpus maintainers, or search for new members who could become corpus maintainers.
Then I figured, why not display the number of contributors we have for each language, based on their status (admin, corpus maintainer, advanced contributor, contributor). At first I included all language levels, not just the native speakers. But I realised it would make more sense to only include native speakers, since it's not really relevant to see (for instance) that there is one corpus maintainer in Japanese if the level of the corpus maintainer is "beginner".
Then I wanted to add some warning icon, for languages where we lack corpus maintainers. Perhaps display each language with a different level of warning: big warning when we lack corpus maintainers, smaller warning if we have advanced contributors but no corpus maintainers. Perhaps also green "check" if we have more than 2 corpus maintainers.
Then I had plenty of other ideas in mind, but this is usually the point where I tell myself I'm getting carried away and I stop.
> So I'd suggest using the previous version, changing the title to "Languages on Tatoeba"
> and adding the captions "(Number of) Sentences" and "(Number of) Native Speakers".
I also felt that changing the title might be a solution to prevent the confusion, but then it led me to ask myself some questions.
Perhaps dividing the native speakers by status it is too detailed and perhaps it would be enough to either have the total number of native speakers, or maybe just the number of admins + corpus maintainers.
Perhaps there is a way to add some information about the level of expertise we have in each language, which can be an interesting information as well (some languages have many members, but none of them are native, while other languages have only a few members but most of them are native speakers).
In the end I didn't want to spend time deciding on these things, since they are not part of my initial goal, which is why I decided to create a separate page for the native speakers stats because rather than tweaking the current sentences_by_language page. I prefer to give myself more time to think of a new page that would synthesize all the information we have about each language, in a way that is satisfying enough for veteran users, and not too cryptic for new visitors.
But I completely share your vision that it would be much better if the link "all languages" would show more than the number of sentences in each language.
They have barely any content compared to English and Esperanto, but not compared to most other languages.
Since very few languages are in the higher part of the list, and too many in the lower, it makes sense to make the graph that is useful for the latter ones. If you don't like square roots, you could just use a 'broken line' for the languages in the upper part of the list (like it's often used on charts, http://www.andypope.info/charts/brokencol.gif ) to show English and Esperanto have way more sentences than others. In fact, the place where the bar is broken could be used to show how much sentences these first languages have.
For instance to me, each bar was never meant to be useful on its own. The important information is carried by the whole graph, which is there to give a global view about how the languages are spread within the corpus. If we take the root square or logarithm or whatever, it would give the wrong idea about the weight of the language in the corpus.
If the information that you are looking for is to know "how many sentences do we need to add in a certain language so that it exceeds the language above", then obviously it is not best represented by the current graph. But this kind of information should be implemented as an extra feature, not by replacing the current graph, which in my opinion is not really useless but simply has another purpose.
In any case, for other visualisations of Tatoeba's data, I encourage anyone who has a bit of programming knowledge to use the files that we export and make your own visualisation, just like someone else did for the translations some time ago: https://tatoeba.org/eng/wall/sh...#message_21926
I personally loooove data visualisation but I can't afford to spend time implementing such things in Tatoeba, so I'm counting on other people to do it and share their work with us on the Wall :)
Edit. It could be also interesting if there would be some comparisons between the languages present in Tatoeba and in the world. I mean, what is the most common language in Tatoeba (most speakers) vs. the most spoken languages in the world. Etc. http://www.washingtonpost.com/b...rts/?tid=sm_fb
(göre) türkçe'ye çevrilmemiş cümleler.
ama arada türkçe'ye çevirilmiş birçok cümle var.
sayfa hızı da çok çok düşük.
gelen cümlelerin çoğu türkçe'ye çevrilmiş.
yine çoğu cümle, türkçe'ye çevrilmiş.
yine bir sürü çevrilmiş cümle var.
etikete göre arama.
durum yine aynı.
ve sayfa hızı da facia.
6-tatoeba'da ingilizce cümleler konusunda tam bir anarşi var.
bu anarşiyi aşmak için cümleler; kalıplarına göre (sentence pattern) sınıflanmalı.
ve ayrıca türkçe'ye çevrilip çevrilmeyenler net olarak ayrılmalı.
böyle yapılırsa çok daha hızlı ve başarılı bir çeviri etkinliği yapılmış olur.
aksi durumda tatoeba'ya yabancılaşma olması kaçınılmazdır.
* You can now choose to restrict the list of languages for the "From" and "To" fields in a search query to the ones that you've listed in your profile. To enable this restriction, go to your settings and select the appropriate checkbox.
* The default level for a language that you add to your profile is now zero rather than unspecified.
* Unapproved sentences are displayed with a warning icon. Sidebar text about unapproved sentences has been removed.
* Hovering with the mouse over a tag will now show the name of the user who added it.
* Dynamic button labels that were not localized are localized now.
* Dynamic button labels that were not localized are localized now.
Two long awaited changes. Thanks to all who worked on them.
Zwei lang erwartete Änderungen. Vielen Dank an alle, die daran gearbeitet haben!
just one thing: I see you've been trying to strengthen the new "indicating your languages on your profile" feature lately, but I personally think it would be even cooler to be able to limit the list of languages manually [edit: or simply use the same list that limits the languages in which translations are shown]. that way, people could also include languages which they are interested in, but which they don't necessarily think are worth indicating on their profile.
still pretty cool, though. that said, what are dynamic button labels?
https://tatoeba.org/ita/users/for_language (on the tab name there is just "Users" even if the interface isn't in English)
https://tatoeba.org/ita/contrib...meline/2015/04 (on the column on the right-side month names are localized, while on the top of the stats the month name isn't localized)
EDIT = when editing the language skill of a given language in my own profile, i see on the tab name just "User", and I'm using the UI in Italian
►Neue Bewerbung als fortgeschrittener Mitarbeiter bzw. als Korpuspflegerin
►Novaj petoj pri la statusoj de progresinta kontribuanto kaj frazara bontenanto
[ENG] Max would like to become an advanced contributor, Silja would like to become a corpus maintainer. They both speak Finnish as their first language. You can learn more by clicking on the links to their profiles above. As always, please feel free to share your opinion by sending us a private message using the link below.
Advanced contributors can link and tag sentences. Corpus maintainers help when the author of a sentence that needs to be corrected is inactive and does not respond to comments.
[DEU] Max möchte fortgeschrittener Mitarbeiter werden, Silja Korpuspflegerin. Beide sprechen Finnisch als Muttersprache. Mehr erfahrt ihr, wenn ihr oben auf die Verknüpfungen zu den Profilen klickt. Wie immer bitten wir euch, nicht zu zögern, uns eure Meinung mitzuteilen, indem ihr uns mit Hilfe der untenstehenden Verknüpfung eine Privatnachricht zukommen laßt.
Fortgeschrittene Mitarbeiter können Sätze verknüpfen und etikettieren. Korpuspfleger schreiten ein, wenn der Autor eines Satzes, der korrigiert werden muß, inaktiv ist und nicht auf Kommentare reagiert.
auparavant, il suffisait de zoner une phrase avec sa souris pour pouvoir la sélectionner afin de la copier.
Désormais, avec la réforme de la disposition, on ne peut plus zoner une phrase sans faire toute une gymnastique ou sans embarquer le drapeau avec (qui se transforme en code langue lorsqu'on fait un copier-coller)
Serait-il possible de rétablir la possibilité de sélectionner une phrase sans trop de gymnastique, ou mieux, comme je l'ai déjà réclamé plusieurs fois sans succès, de disposer d'un bouton permettant la copie directe en un seul clic ?
Je crée beaucoup de variantes de phrases, pour le plus grand intérêt des apprenants, et j'en créerai bien davantage si je disposais d'un moyen simple de copier-coller, de sorte que je n'aie pas à tout retaper à chaque fois.
1) La phrase m'appartient (et je ne veux pas entrer en mode édition alors que je veux juste la copier : c'est très irritant)
2) la phrase ne m'appartient pas.
Lorsque vous effectuez la recette des changements, merci de faire attention à tester complètement les 2 cas et pas un seul des 2.
Ton commentaire m’a donné une idée. Et si on préremplissait le contenu du champ traduction lorsqu’on ajoute plusieurs traductions à une même phrase ? Concrètement, ça donnerait :
1. Je clique sur le bouton traduction, j’entre une traduction, j’envoie, paf.
2. Je clique à nouveau sur le bouton traduction de la même phrase, et le champ traduction contient ma précédente traduction, dont je peux facilement faire une variante.
3. Je clique sur le bouton traduction d’une autre phrase, et le champ traduction est vide, comme à l’étape 1.
Cela faciliterait ton cas d’utilisation, mais embêterait peut-être d’autres personnes dans d’autres cas ?
En plus;, il y aurait un risque de créer des doublons intempestifs.
Ou alors il faut que ce soit un paramètre utilisateur, comme ça, on a le choix de l'activer ou de le désactiver, éventuellement.
Le triple-clic (je n'y pense jamais à celui-là, mais c'est une bonne idée...) embarque toute la ligne, y compris un retour-chariot à la fin. Peux-tu y faire quelque chose ?
When I switch this page into Russian, I see a different statistic. https://tatoeba.org/rus/users/for_language/rus
The problematic string can be fixed here (giving you have an account on Transifex and have been accepted in the Russian translators team): https://www.transifex.com/proje...ource/42114924
Follow this guide to translate on Transifex: http://en.wiki.tatoeba.org/arti...ce-translation and see especially the “Plural strings” section.
You can test how the translation renders by waiting a few minutes and going to http://dev.tatoeba.org/.
> * Hovering with the mouse over a tag will now show the name of the user who added it.
> * Dynamic button labels that were not localized are localized now.
> Two long awaited changes. Thanks to all who worked on them.
> Zwei lang erwartete Änderungen. Vielen Dank an alle, die daran gearbeitet haben!
You’re welcome. I didn’t know the dynamic buttons localization was such a long awaited change. I fixed it because Guybrush88 brought that topic yet another time, and it took me less than an hour to fix it. To be honest, as a developer I tend to work on things that bugs me personally, or that I feel important, but I have very little idea about the areas regular members would like progress to be made. For instance, recently I’ve been working on improving the autogenerated furiganas and make them editable (it’s still incomplete), but only because I personally care about that.
I’d like to work more on things that matters to members, but I don’t really know. So my question is: how could I?
For those who don’t know, we developers are using Github to keep track of issues and discuss technical side of ideas . At the moment, we have 170 open issues and this number has been stable for the past ten months , although we closed a lot of issues (412) (which means we roughly opened as many issues as we closed). Many of these recorded issues are too vague or half-relevant however. Some of us tried to label priorities but they are often quite objective.