menu
Tatoeba
language
Kaydol Giriş yap
language Türkçe
menu
Tatoeba

chevron_right Kaydol

chevron_right Giriş yap

Göz At

chevron_right Rastgele cümle göster

chevron_right Dile göre ara

chevron_right Listeye göre ara

chevron_right Etikete göre ara

chevron_right Ses ara

Topluluk

chevron_right Duvar

chevron_right Tüm üyelerin listesi

chevron_right Üyelerin dilleri

chevron_right Ana diller

search
clear
swap_horiz
search

Menü

Duvar'a dön

Aiji Aiji 20 Ocak 2020 20 Ocak 2020 02:22:42 UTC flag Report link Kalıcı bağlantı

** Introducing Tatoeba playground **

I was thinking, for a while, of sharing all the scripts I use to play with Tatoeba data. My goal was to provide clear tools out of the box, customizable, and that do not require any programming knowledge. And it's finally done, available online. All my thanks to Alan and Ricardo for their help and precious feedback.

So what is the playground? It is a collection of notebooks providing fully customizable functions to do pretty much anything you want. For now, the following are available out of the box, and more should come later if people are interested:

- All sentences. Filtered by language. All sentences containing a word.
- Sentences owned by a user. Sentences owned by other than specific users.
- Corpus analysis: word counts (beta version. Incorrect for most languages)
- Sentences that do not have final punctuation. Sentences that do not start with a capital letter
- Audio contributions (sentences with / without audio)
- Languages of a user. List of speakers of a language. List of natives of a language. List of natives of X speaking Y


It does not require any installation, as it is available online, here: https://mybinder.org/v2/gh/agro...yground/master
When you click on that link, the technology provided by binder (https://mybinder.org ) will build the playground environment for you. It may take a while but when it's done, you can access the playground in a closed and safe environment.
For more information on how all that works, you can check the README available on this github repository: https://github.com/agrodet/Tatoeba-playground

Some functions are useful for corpus maintenance, some are there just for fun. I'll let you judge :)

If you have any question, feel free to ask here or directly on the github repository: https://github.com/agrodet/Tatoeba-playground. Notice that it has nothing to do with the Tatoeba development repository.

PS: I'm aware it might be difficult to use for non-English speakers.

{{vm.hiddenReplies[34004] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
Yorwba Yorwba 20 Ocak 2020 20 Ocak 2020 12:42:34 UTC flag Report link Kalıcı bağlantı

Nice work! Requiring users to change values directly in the code is a daring choice of interface, but I think you explained everything well enough.

When filtering for sentences without final punctuation, your examples are all single characters, but I'd like to point out that it's also possible to define longer sequences. For example, quotation marks at the end of a sentence should be preceded by the punctuation of the quoted sentence: '."', '!"', '?"'.