** Introducing Tatoeba playground **
I was thinking, for a while, of sharing all the scripts I use to play with Tatoeba data. My goal was to provide clear tools out of the box, customizable, and that do not require any programming knowledge. And it's finally done, available online. All my thanks to Alan and Ricardo for their help and precious feedback.
So what is the playground? It is a collection of notebooks providing fully customizable functions to do pretty much anything you want. For now, the following are available out of the box, and more should come later if people are interested:
- All sentences. Filtered by language. All sentences containing a word.
- Sentences owned by a user. Sentences owned by other than specific users.
- Corpus analysis: word counts (beta version. Incorrect for most languages)
- Sentences that do not have final punctuation. Sentences that do not start with a capital letter
- Audio contributions (sentences with / without audio)
- Languages of a user. List of speakers of a language. List of natives of a language. List of natives of X speaking Y
It does not require any installation, as it is available online, here: https://mybinder.org/v2/gh/agro...yground/master
When you click on that link, the technology provided by binder (https://mybinder.org ) will build the playground environment for you. It may take a while but when it's done, you can access the playground in a closed and safe environment.
For more information on how all that works, you can check the README available on this github repository: https://github.com/agrodet/Tatoeba-playground
Some functions are useful for corpus maintenance, some are there just for fun. I'll let you judge :)
If you have any question, feel free to ask here or directly on the github repository: https://github.com/agrodet/Tatoeba-playground. Notice that it has nothing to do with the Tatoeba development repository.
PS: I'm aware it might be difficult to use for non-English speakers.
Nice work! Requiring users to change values directly in the code is a daring choice of interface, but I think you explained everything well enough.
When filtering for sentences without final punctuation, your examples are all single characters, but I'd like to point out that it's also possible to define longer sequences. For example, quotation marks at the end of a sentence should be preceded by the punctuation of the quoted sentence: '."', '!"', '?"'.