menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
robot_fury robot_fury May 11, 2019 May 11, 2019 at 2:44:31 PM UTC link Permalink

Allow adding sentences to lists automatically through advanced search.

I might be missing something but I scanned all the issues with lists and did not see this feature. If I am missing the obvious then feel free to direct message me.

I would like to make lists of specific language pairs to use to train NMT models.

Perhaps developers are to use the large list of sentences to make said lists but it would be more convenient for my students (high school) to be able to make lists easier using the advanced search feature and not having to select the list icon and then find the list and then click ok.

To make a several thousand long list will take hours when it could be seconds.

{{vm.hiddenReplies[31822] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux May 13, 2019 May 13, 2019 at 5:05:22 AM UTC link Permalink

I’m not familiar with natural language processing. What are NMT models? Can you elaborate on what you’re trying to achieve, as opposed to the features you’d like to have? Do you want to download the search results? If so, what filtering criteria are you using?

Note that because the advanced search is limited to 1000 results, it will always give you a partial view of the corpus, so I don’t think it’s a good way to go.

{{vm.hiddenReplies[31845] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir May 13, 2019 May 13, 2019 at 5:41:03 AM UTC link Permalink

Presumably: https://en.wikipedia.org/wiki/N...ne_translation

TRANG TRANG May 13, 2019 May 13, 2019 at 6:05:20 PM UTC link Permalink

As you've realized by yourself in your other post[1], you indeed have to parse yourself the sentences.csv file that we provide on the Downloads page.

I'm just wondering though: are you trying to extract *all* sentences and translations in a specific pair of languages, or do you have any additional criteria? For instance sentences instead of all English sentences and French translations, only English sentences containing the word "robot" and their French translations.

In case you want *all* sentences, you may want to check our FAQ[2]. Maybe you'll find some script that other people did, that would spare you the time to make your own.

If you have additional criterias, I'd be curious to know which ones.

---

[1] https://tatoeba.org/eng/wall/sh...#message_31824
[2] https://en.wiki.tatoeba.org/art...-translations-