menu
Tatoeba
language
Зареєструватись Увійти
language Українська
menu
Tatoeba

chevron_right Зареєструватись

chevron_right Увійти

Перегляд

chevron_right Показати випадкове речення

chevron_right Перегляд за мовами

chevron_right Перегляд за списками

chevron_right Перегляд за теґами

chevron_right Перегляд звуків до речень

Спільнота

chevron_right Стіна

chevron_right Список всіх учасників

chevron_right Мови учасників

chevron_right Ті для кого мова є рідною

search
clear
swap_horiz
search
Aiji Aiji 20 січня 2020 р. 20 січня 2020 р. о 02:22:42 UTC flag Report link Постійне посилання

** Introducing Tatoeba playground **

I was thinking, for a while, of sharing all the scripts I use to play with Tatoeba data. My goal was to provide clear tools out of the box, customizable, and that do not require any programming knowledge. And it's finally done, available online. All my thanks to Alan and Ricardo for their help and precious feedback.

So what is the playground? It is a collection of notebooks providing fully customizable functions to do pretty much anything you want. For now, the following are available out of the box, and more should come later if people are interested:

- All sentences. Filtered by language. All sentences containing a word.
- Sentences owned by a user. Sentences owned by other than specific users.
- Corpus analysis: word counts (beta version. Incorrect for most languages)
- Sentences that do not have final punctuation. Sentences that do not start with a capital letter
- Audio contributions (sentences with / without audio)
- Languages of a user. List of speakers of a language. List of natives of a language. List of natives of X speaking Y


It does not require any installation, as it is available online, here: https://mybinder.org/v2/gh/agro...yground/master
When you click on that link, the technology provided by binder (https://mybinder.org ) will build the playground environment for you. It may take a while but when it's done, you can access the playground in a closed and safe environment.
For more information on how all that works, you can check the README available on this github repository: https://github.com/agrodet/Tatoeba-playground

Some functions are useful for corpus maintenance, some are there just for fun. I'll let you judge :)

If you have any question, feel free to ask here or directly on the github repository: https://github.com/agrodet/Tatoeba-playground. Notice that it has nothing to do with the Tatoeba development repository.

PS: I'm aware it might be difficult to use for non-English speakers.

{{vm.hiddenReplies[34004] ? 'expand_more' : 'expand_less'}} сховати відповіді показати відповіді
Yorwba Yorwba 20 січня 2020 р. 20 січня 2020 р. о 12:42:34 UTC flag Report link Постійне посилання

Nice work! Requiring users to change values directly in the code is a daring choice of interface, but I think you explained everything well enough.

When filtering for sentences without final punctuation, your examples are all single characters, but I'd like to point out that it's also possible to define longer sequences. For example, quotation marks at the end of a sentence should be preceded by the punctuation of the quoted sentence: '."', '!"', '?"'.