menu
Tatoeba
language
Luo käyttäjätili Kirjaudu sisään
language Suomi
menu
Tatoeba

chevron_right Luo käyttäjätili

chevron_right Kirjaudu sisään

Selaa

chevron_right Näytä satunnainen lause

chevron_right Selaa kielen mukaan

chevron_right Selaa listan mukaan

chevron_right Selaa tunnisteen mukaan

chevron_right Selaa äänitteitä

Yhteisö

chevron_right Seinä

chevron_right Luettelo kaikista jäsenistä

chevron_right Jäsenten kielet

chevron_right Äidinkieliset puhujat

search
clear
swap_horiz
search
Aiji Aiji 20. tammikuuta 2020 20. tammikuuta 2020 klo 2.22.42 UTC flag Tee ilmoitus link Ikilinkki

** Introducing Tatoeba playground **

I was thinking, for a while, of sharing all the scripts I use to play with Tatoeba data. My goal was to provide clear tools out of the box, customizable, and that do not require any programming knowledge. And it's finally done, available online. All my thanks to Alan and Ricardo for their help and precious feedback.

So what is the playground? It is a collection of notebooks providing fully customizable functions to do pretty much anything you want. For now, the following are available out of the box, and more should come later if people are interested:

- All sentences. Filtered by language. All sentences containing a word.
- Sentences owned by a user. Sentences owned by other than specific users.
- Corpus analysis: word counts (beta version. Incorrect for most languages)
- Sentences that do not have final punctuation. Sentences that do not start with a capital letter
- Audio contributions (sentences with / without audio)
- Languages of a user. List of speakers of a language. List of natives of a language. List of natives of X speaking Y


It does not require any installation, as it is available online, here: https://mybinder.org/v2/gh/agro...yground/master
When you click on that link, the technology provided by binder (https://mybinder.org ) will build the playground environment for you. It may take a while but when it's done, you can access the playground in a closed and safe environment.
For more information on how all that works, you can check the README available on this github repository: https://github.com/agrodet/Tatoeba-playground

Some functions are useful for corpus maintenance, some are there just for fun. I'll let you judge :)

If you have any question, feel free to ask here or directly on the github repository: https://github.com/agrodet/Tatoeba-playground. Notice that it has nothing to do with the Tatoeba development repository.

PS: I'm aware it might be difficult to use for non-English speakers.

{{vm.hiddenReplies[34004] ? 'expand_more' : 'expand_less'}} piilota vastaukset näytä vastaukset
Yorwba Yorwba 20. tammikuuta 2020 20. tammikuuta 2020 klo 12.42.34 UTC flag Tee ilmoitus link Ikilinkki

Nice work! Requiring users to change values directly in the code is a daring choice of interface, but I think you explained everything well enough.

When filtering for sentences without final punctuation, your examples are all single characters, but I'd like to point out that it's also possible to define longer sequences. For example, quotation marks at the end of a sentence should be preceded by the punctuation of the quoted sentence: '."', '!"', '?"'.