menu
Tatoeba
language
En em enskrivañ Kevreañ
language Brezhoneg
menu
Tatoeba

chevron_right En em enskrivañ

chevron_right Kevreañ

Merdeiñ

chevron_right Diskouez ur frazenn dre zegouezh

chevron_right Diskouez dre yezh

chevron_right Diskouez dre listenn

chevron_right Diskouez dre valiz

chevron_right Diskouez an aodio

Kumuniezh

chevron_right Moger

chevron_right Listenn an holl Izili

chevron_right Yezhoù an Izili

chevron_right Komzerien a-vihanik

search
clear
swap_horiz
search
lbdx lbdx 20 Mezheven 2020, aozet 20 Mezheven 2020 20 Mezheven 2020 da 09:48:52 UTC, aozet 20 Mezheven 2020 da 09:49:19 UTC flag Report link Liamm-peurbadus

A new version of tatominer ( https://tatominer.imfast.io ) is available.

The identification of words often searched but little covered by Tatoeba has been improved. You can now also access sentences containing popular words that are hardly translated into the target language of your choice.

I hope that this tool will be useful for those of you who would like to expand the vocabulary available on Tatoeba. 20 languages (and thus 380 language pairs) are already supported. Feel free to ask me to add a language you are interested in.

{{vm.hiddenReplies[35527] ? 'expand_more' : 'expand_less'}} kuzhat ar respontoù diskouez ar respontoù
AlanF_US AlanF_US 20 Mezheven 2020, aozet 20 Mezheven 2020 20 Mezheven 2020 da 17:33:33 UTC, aozet 20 Mezheven 2020 da 17:37:16 UTC flag Report link Liamm-peurbadus

I like the language drop-down. It makes the list of phrases easier to work with.

I thought I had added sentences for all the English phrases with fewer than two occurrences, but when I went back to the list, I saw the terms in bold, which I hadn't seen before. I'm not sure why.

{{vm.hiddenReplies[35531] ? 'expand_more' : 'expand_less'}} kuzhat ar respontoù diskouez ar respontoù
lbdx lbdx 20 Mezheven 2020 20 Mezheven 2020 da 18:15:38 UTC flag Report link Liamm-peurbadus

It's normal.

Following feedback from users who regretted the use of exact matches during searches, I finally decided to drop this constraint in the languages for which I had a stemmer.

As you have noticed, this greatly changes the results.

{{vm.hiddenReplies[35533] ? 'expand_more' : 'expand_less'}} kuzhat ar respontoù diskouez ar respontoù
AlanF_US AlanF_US 20 Mezheven 2020 20 Mezheven 2020 da 19:31:18 UTC flag Report link Liamm-peurbadus

What I'm saying is that first I went through the first two pages of the list, namely the words with 0 occurrences, then the words with 1 occurrence. But when I went back to the first page, I saw items listed in bold, with 0 occurrences, that I hadn't seen just 10 minutes before, when I was on that page. I don't see how stemming would cause that change in results unless stem-matches were first excluded, then included, during the same run, and I don't see why you would do that.

Another thing: your "occurrences" values seem not to change until a new batch of words is downloaded (which might happen weekly or so). But when I add a sentence for a phrase, I'd like to see the number of occurrences go up by one. This would help me keep track of where I am, and it would also give me a feeling of accomplishment. Although I realize it would be considerably harder to implement, it would be nice if clicking on a phrase would take me to a place (perhaps not on Tatoeba) where I could add a sentence, after which the occurrences of each word in my sentence would be stored in a Tatominer-local database. Then, whenever the "occurrences" value was queried for a word, it would be calculated as the sum of (a) whatever it was when you did the last download and (b) the number of times it occurred within the sentences that people added from inside your app since the last download.

{{vm.hiddenReplies[35535] ? 'expand_more' : 'expand_less'}} kuzhat ar respontoù diskouez ar respontoù
lbdx lbdx 21 Mezheven 2020 21 Mezheven 2020 da 07:55:51 UTC flag Report link Liamm-peurbadus

Thank you for your interest in Tatominer. Originally the goal of this project was to build a script that analyzes Tatoeba's search log to extract useful information for the community. The script is working quite well now (in the languages I know anyway), and could potentially be used for all languages supported by Tatoeba.

The site I put online is only intended to share these results. It simply consists of static pages that I update once a week from the weekly exports. The new features you propose would be very useful but unfortunately, synchronizing the data of the two sites while Tatoeba doesn't offer an API at the moment is much too difficult for me.

The functionality you ask for is very similar to what https://tatoeba.org/eng/vocabulary/add_sentences offers. I think that the word lists I generate could advantageously replace those currently online. A similar page for words that need translations could even be added. If developers are interested in working on these features, I would be happy to help them implement my script.