Register Log in
language English

chevron_right Register

chevron_right Log in


chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio


chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

Aiji Aiji January 20, 2020 January 20, 2020 at 2:22:42 AM UTC link Permalink

** Introducing Tatoeba playground **

I was thinking, for a while, of sharing all the scripts I use to play with Tatoeba data. My goal was to provide clear tools out of the box, customizable, and that do not require any programming knowledge. And it's finally done, available online. All my thanks to Alan and Ricardo for their help and precious feedback.

So what is the playground? It is a collection of notebooks providing fully customizable functions to do pretty much anything you want. For now, the following are available out of the box, and more should come later if people are interested:

- All sentences. Filtered by language. All sentences containing a word.
- Sentences owned by a user. Sentences owned by other than specific users.
- Corpus analysis: word counts (beta version. Incorrect for most languages)
- Sentences that do not have final punctuation. Sentences that do not start with a capital letter
- Audio contributions (sentences with / without audio)
- Languages of a user. List of speakers of a language. List of natives of a language. List of natives of X speaking Y

It does not require any installation, as it is available online, here:
When you click on that link, the technology provided by binder ( ) will build the playground environment for you. It may take a while but when it's done, you can access the playground in a closed and safe environment.
For more information on how all that works, you can check the README available on this github repository:

Some functions are useful for corpus maintenance, some are there just for fun. I'll let you judge :)

If you have any question, feel free to ask here or directly on the github repository: Notice that it has nothing to do with the Tatoeba development repository.

PS: I'm aware it might be difficult to use for non-English speakers.

{{vm.hiddenReplies[34004] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba January 20, 2020 January 20, 2020 at 12:42:34 PM UTC link Permalink

Nice work! Requiring users to change values directly in the code is a daring choice of interface, but I think you explained everything well enough.

When filtering for sentences without final punctuation, your examples are all single characters, but I'd like to point out that it's also possible to define longer sequences. For example, quotation marks at the end of a sentence should be preceded by the punctuation of the quoted sentence: '."', '!"', '?"'.