menu
Tatoeba
language
Kaydol Giriş yap
language Türkçe
menu
Tatoeba

chevron_right Kaydol

chevron_right Giriş yap

Göz At

chevron_right Rastgele cümle göster

chevron_right Dile göre ara

chevron_right Listeye göre ara

chevron_right Etikete göre ara

chevron_right Ses ara

Topluluk

chevron_right Duvar

chevron_right Tüm üyelerin listesi

chevron_right Üyelerin dilleri

chevron_right Ana diller

search
clear
swap_horiz
search

Menü

Duvar'a dön

TRANG TRANG 3 Mayıs 2015, 4 Mayıs 2015 tarihinde düzenlendi 3 Mayıs 2015 22:06:54 UTC, 4 Mayıs 2015 08:08:14 UTC düzenlendi link Kalıcı bağlantı

I've implemented more detailed stats and I'd like some people to test these changes on the dev website (http://dev.tatoeba.org).

1) The stats for sentences per language.
https://dev.tatoeba.org/eng/sta...es_by_language (edit: corrected link)
You can access them from the homepage, and then click on "show all language" below the list of top 5 languages.

2) The stats for the languages of the members, which is a new page.
http://dev.tatoeba.org/eng/stats/users_languages
On the dev, this page now shows up when you click in the menu "Members" > "Languages of members".

First, I'd like to know how you understand these stats. For instance, what do you think these numbers mean?
- http://prntscr.com/717tto
- http://prntscr.com/717ucp

Second, I'd like you to add and remove languages in your profile, and check if the stats update as expected and report to me any problem.

Thank you.

{{vm.hiddenReplies[22462] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
gillux gillux 4 Mayıs 2015 4 Mayıs 2015 03:50:47 UTC link Kalıcı bağlantı

Nice improvement!

> 1) The stats for sentences per language.
> http://dev.tatoeba.org/eng/stats/users_languages

You mean https://dev.tatoeba.org/eng/sta...s_by_language, right?
I personally like the colored bust to represent “admin”, “corpus maintainers” etc. in the column header, but I guess it’s a no-go for colorblinds.
It would make more sense to move the link to show_all_in/<lang> to the sentence number cell, so that you’re brought to the list of sentences when clicking on the sentences number rather than the language name.
Sortable columns would be super useful.

> 2) The stats for the languages of the members, which is a new page.
The title columns “5, 4, 3, 2, 1, ?” are *very* cryptic, I think we should totally change them. The meaning of a number itself is different among languages (for instance Japanese uses 1 for the highest level, while others uses 1 for the lowest level). It’s also confusing because the table mixes cardinal numbers for sentences and ordinal numbers for levels, which are two totally different things yet represented by the same symbol.

Again, sortable columns would be super useful.

{{vm.hiddenReplies[22463] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
tommy_san tommy_san 4 Mayıs 2015 4 Mayıs 2015 04:14:22 UTC link Kalıcı bağlantı

> Japanese uses 1 for the highest level
え、そうなんですか?

{{vm.hiddenReplies[22465] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
gillux gillux 4 Mayıs 2015 4 Mayıs 2015 09:56:35 UTC link Kalıcı bağlantı

あれ?一級とか一段は最高なので日本人はそういう風に考えると思ってましたけど。言語能力試験にもそうでしょう。

{{vm.hiddenReplies[22467] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
tommy_san tommy_san 4 Mayıs 2015 4 Mayıs 2015 12:34:43 UTC link Kalıcı bağlantı

確かに「級」は1級が最高ですが、「段」は違うのではないでしょうか。
http://en.wikipedia.org/wiki/Rank_in_Judo
それに「レベル1」と「レベル2」では「レベル2」の方が上だと感じる人がほとんどだと思います。
他の文化圏のことは分かりませんが。

At least, it's clear to me what these numbers mean. I find the background colors also nice and intuitive.
Perhaps you could add a heading above like "Self-assessed levels" and mouseover texts "Native", "Fluent", etc. That would be enough in my opinion.

{{vm.hiddenReplies[22469] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
CK CK 4 Mayıs 2015, 30 Ekim 2019 tarihinde düzenlendi 4 Mayıs 2015 13:24:27 UTC, 30 Ekim 2019 07:52:19 UTC düzenlendi link Kalıcı bağlantı

[not needed anymore- removed by CK]

{{vm.hiddenReplies[22473] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
Ooneykcall Ooneykcall 5 Mayıs 2015 5 Mayıs 2015 23:22:20 UTC link Kalıcı bağlantı

"However, I would also assume that many people's self-assigned "Level 4" might be higher than other people's self-assigned "Level 3" or even "Level 2."
Guess you meant lower?
Anyway, it's true there is a huge discrepancy, as some people are remarkably balder (bolder?) about their language proficiency than the other. Perhaps we should have a rough guideline concerning what each level is supposed to be like.

pullnosemans pullnosemans 4 Mayıs 2015 4 Mayıs 2015 09:47:39 UTC link Kalıcı bağlantı

I'm kind of wondering what relevance the number of admins, corpus maintainers, etc. bears for the number of sentences in the respective languages. wouldn't that statistic fit better with the new "languages of members" page instead of the "number of sentences" page?

{{vm.hiddenReplies[22466] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
AlanF_US AlanF_US 4 Mayıs 2015 4 Mayıs 2015 12:51:53 UTC link Kalıcı bağlantı

I absolutely agree. It took me about 30 seconds to figure out what was going on. My first guess was that the figures in those columns listed the number of sentences added by admins, corpus maintainers, etc. since you started collecting statistics on the dev site.

PaulP PaulP 4 Mayıs 2015 4 Mayıs 2015 10:15:53 UTC link Kalıcı bağlantı

> First, I'd like to know how you understand these stats. For instance, what do you think these numbers mean?
- http://prntscr.com/717tto

I have no idea. For my language Dutch it shows 1 0 0 0. That is double Dutch to me :P

User55521 User55521 4 Mayıs 2015, 4 Mayıs 2015 tarihinde düzenlendi 4 Mayıs 2015 13:02:48 UTC, 4 Mayıs 2015 13:03:17 UTC düzenlendi link Kalıcı bağlantı

The 'progress bars' are completely useless for most languages in the current form (only a 1/4 of languages have a progress bar that actually shows something). Maybe you could use a cubic root of the number of the sentences for the progress bar, and not the exact number, or otherwise tweak it, to make it more useful?

{{vm.hiddenReplies[22471] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
Lepotdeterre Lepotdeterre 4 Mayıs 2015 4 Mayıs 2015 16:29:24 UTC link Kalıcı bağlantı

I also agree, the progress bars are absolutely impractical. However, using cubic root progress bars would be even worse, because each subsequent sentence added for a particular language would contribute less. So, for the first 64 sentences, one would have 4 points on the progress bar, and after the next 64, only 5.03? That doesn't make any sense. Everything would be distorted and comparisons would be meaningless. The only thing I can suggest is making the bars much wider, and there is certainly enough space for that (at least on my Browser).

tommy_san tommy_san 4 Mayıs 2015 4 Mayıs 2015 13:10:04 UTC link Kalıcı bağlantı

Why are there -10 members who speak an unknown language? Who are they? ☺

TRANG TRANG 4 Mayıs 2015, 4 Mayıs 2015 tarihinde düzenlendi 4 Mayıs 2015 21:41:11 UTC, 4 Mayıs 2015 22:05:00 UTC düzenlendi link Kalıcı bağlantı

Thank you for your feedback, everyone.

I have updated the dev website to address the main issues that I felt were necessary to fix before the weekend.

I moved out the stats about the admins, corpus maintainers, etc to a dedicated page.
http://dev.tatoeba.org/eng/stats/native_speakers ("Members" > "Native speakers")
I hope things are clearer now.

I won't have time to implement the other suggestions, but I'm keeping them in mind. I don't want to spend too much time on these stats.

Just a note about the "progress bar" on the sentences stats page. They are not progress bars, they are a visual representation of the repartition of the languages in Tatoeba. You should look at it as a graph rather than a progress bars. Languages where there is no bar are languages that have barely any content compared to other languages.

{{vm.hiddenReplies[22475] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
tommy_san tommy_san 4 Mayıs 2015 4 Mayıs 2015 23:36:24 UTC link Kalıcı bağlantı

I thought https://dev.tatoeba.org/stats/s...es_by_language was almost OK the way it was. It's the page linked to the top page, right? I liked it because it would visitors a clear idea about the corpus and the community. The only problem was that the page was entitled "Numbers of Sentences".

So I'd suggest using the previous version, changing the title to "Languages on Tatoeba" and adding the captions "(Number of) Sentences" and "(Number of) Native Speakers".

The list of languages sorted by the number of native speakers is of some interest, to be sure, but I don't think it's worth dedicating a single page for it. It would just do if the table was sortable.

{{vm.hiddenReplies[22478] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
TRANG TRANG 5 Mayıs 2015 5 Mayıs 2015 14:56:29 UTC link Kalıcı bağlantı

​> I thought https://dev.tatoeba.org/stats/s...es_by_language was almost OK the way it was. It's the page linked to the top page, right?

​Yes, it's that page.​


​> ​I liked it because it would visitors a clear idea about the corpus and the community.

​That was also my idea.

Initially I wanted to simply add a column that displays the number of corpus maintainer that we have for each language, because I wanted to clearly see how much we can take care of each language, which languages have enough corpus maintainers and which languages don't have any. And perhaps having such stats would encourage more people to try and become corpus maintainers, or search for new members who could become corpus maintainers.

Then I figured, why not display the number of contributors we have for each language, based on their status (admin, corpus maintainer, advanced contributor, contributor). At first I included all language levels, not just the native speakers. But I realised it would make more sense to only include native speakers, since it's not really relevant to see (for instance) that there is one corpus maintainer in Japanese if the level of the corpus maintainer is "beginner".

​Then I wanted to add some warning icon, for languages where we lack corpus maintainers. Perhaps display each language with a different level of warning: big warning when we lack corpus maintainers, smaller warning if we have advanced contributors but no corpus maintainers. Perhaps also green "check" if we have more than 2 corpus maintainers.​

Then I had plenty of other ideas in mind, but this is usually the point where I tell myself I'm getting carried away and I stop.


> So I'd suggest using the previous version, changing the title to "Languages on Tatoeba"
> and adding the captions "(Number of) Sentences" and "(Number of) Native Speakers".

I also felt that changing the title might be a solution to prevent the confusion, but then it led me to ask myself some questions.
Perhaps dividing the native speakers by status it is too detailed and perhaps it would be enough to either have the total number of native speakers, or maybe just the number of admins + corpus maintainers.
Perhaps there is a way to add some information about the level of expertise we have in each language, which can be an interesting information as well (some languages have many members, but none of them are native, while other languages have only a few members but most of them are native speakers).

In the end I didn't want to spend time deciding on these things, since they are not part of my initial goal, which is why I decided to create a separate page for the native speakers stats because rather than tweaking the current sentences_by_language page. I prefer to give myself more time to think of a new page that would synthesize all the information we have about each language, in a way that is satisfying enough for veteran users, and not too cryptic for new visitors.

But I completely share your vision that it would be much better if the link "all languages" would show more than the number of sentences in each language.

User55521 User55521 5 Mayıs 2015, 5 Mayıs 2015 tarihinde düzenlendi 5 Mayıs 2015 07:19:59 UTC, 5 Mayıs 2015 07:21:49 UTC düzenlendi link Kalıcı bağlantı

> that have barely any content compared to other languages

They have barely any content compared to English and Esperanto, but not compared to most other languages.

Since very few languages are in the higher part of the list, and too many in the lower, it makes sense to make the graph that is useful for the latter ones. If you don't like square roots, you could just use a 'broken line' for the languages in the upper part of the list (like it's often used on charts, http://www.andypope.info/charts/brokencol.gif ) to show English and Esperanto have way more sentences than others. In fact, the place where the bar is broken could be used to show how much sentences these first languages have.

{{vm.hiddenReplies[22482] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
TRANG TRANG 5 Mayıs 2015 5 Mayıs 2015 12:41:39 UTC link Kalıcı bağlantı

You say that the bars are not useful for languages in the lower part of the graph, but what information do you expect from these bars?

For instance to me, each bar was never meant to be useful on its own. The important information is carried by the whole graph, which is there to give a global view about how the languages are spread within the corpus. If we take the root square or logarithm or whatever, it would give the wrong idea about the weight of the language in the corpus.

If the information that you are looking for is to know "how many sentences do we need to add in a certain language so that it exceeds the language above", then obviously it is not best represented by the current graph. But this kind of information should be implemented as an extra feature, not by replacing the current graph, which in my opinion is not really useless but simply has another purpose.

In any case, for other visualisations of Tatoeba's data, I encourage anyone who has a bit of programming knowledge to use the files that we export[1] and make your own visualisation, just like someone else did for the translations some time ago: https://tatoeba.org/eng/wall/sh...#message_21926
I personally loooove data visualisation but I can't afford to spend time implementing such things in Tatoeba, so I'm counting on other people to do it and share their work with us on the Wall :)

[1] http://downloads.tatoeba.org/exports/

{{vm.hiddenReplies[22483] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
Silja Silja 5 Mayıs 2015, 5 Mayıs 2015 tarihinde düzenlendi 5 Mayıs 2015 16:48:25 UTC, 5 Mayıs 2015 17:32:03 UTC düzenlendi link Kalıcı bağlantı

How about adding how many percents each language covers of the whole database (percents of total sentences)? That would give a holostic view for also to those who prefer numbers over graphs.

Edit. It could be also interesting if there would be some comparisons between the languages present in Tatoeba and in the world. I mean, what is the most common language in Tatoeba (most speakers) vs. the most spoken languages in the world. Etc. http://www.washingtonpost.com/b...rts/?tid=sm_fb

{{vm.hiddenReplies[22486] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
Lepotdeterre Lepotdeterre 6 Mayıs 2015 6 Mayıs 2015 09:20:34 UTC link Kalıcı bağlantı

I think that the percent idea is good, as long as numerical percentage is written. If it's just another progress bar, the length of each progress bar will be the same length as it is now, due to proportionality, and the less-represented languages will still have miniature lines for bars.