Warning

The data you will find here will NOT be useful unless you are coding a language tool or processing data.

If you simply want sentences that you can use to learn a language, check out the sentence lists. You can build your own, or view the ones that others have created. The lists can be downloaded and printed.

Creative commons

These files are released under CC-BY.

Creative Commons License

For those who wonder why we're not leaving the data in the public domain, some explanation here.

Questions?

If you have questions or requests, feel free to contact us. In general, we answer quickly.

Downloads

Attention: As of 2014-08-16, the URL to download the latest files has changed and the new export files are provided in a compressed format. The old URL is still available, but will not contain the latest data.

Sentences

Download
1. http://downloads.tatoeba.org/exports/sentences.tar.bz2
2. http://downloads.tatoeba.org/exports/sentences_detailed.tar.bz2
Fields and structure
1. id [tab] lang [tab] text
2. id [tab] lang [tab] text [tab] username [tab] date_added [tab] date_last_modified
File description
Contains all the sentences. Each sentence is associated with a unique id and an ISO 639-3 language code. The first file (sentences.tar.bz2) contains this information alone. The second file (sentences_detailed.tar.bz2) contains additional fields for those who would like to filter the sentences based on the contributor who owns the sentence, or the date when it was added or last modified.

Links

Download
http://downloads.tatoeba.org/exports/links.tar.bz2
Fields and structure
sentence_id [tab] translation_id
File description
Contains the links between the sentences. 1 [tab] 77 means that sentence #77 is the translation of sentence #1. The reciprocal link is also present, so the file will also contain a line that says 77 [tab] 1.

Tags

Download
http://downloads.tatoeba.org/exports/tags.tar.bz2
Fields and structure
sentence_id [tab] tag_name
File description
Contains the list of tags associated with each sentence. 381279 [tab] proverb means that sentence #381279 has been assigned the "proverb" tag.

Lists

Download
http://downloads.tatoeba.org/exports/user_lists.tar.bz2
Fields and structure
id [tab] username [tab] date_created [tab] date_last_modified [tab] list_name
File description
Contains the list of sentence lists.

Sentences in lists

Download
http://downloads.tatoeba.org/exports/sentences_in_lists.tar.bz2
Fields and structure
list_id [tab] sentence_id
File description
Indicates the sentences that are contained by any lists. 13 [tab] 381279 means that sentence #381279 is contained by the list that has an id of 13.

Japanese indices

Download
http://downloads.tatoeba.org/exports/jpn_indices.tar.bz2
Fields and structure
sentence_id [tab] meaning_id [tab] text
File description
Contains the equivalent of the "B lines" in the Tanaka Corpus file distributed by Jim Breen. See this page for the format. Each entry is associated with a pair of Japanese/English sentences. sentence_id refers to the id of the Japanese sentence. meaning_id refers to the id of the English sentence.

Sentences with audio

Download
http://downloads.tatoeba.org/exports/sentences_with_audio.tar.bz2
Fields and structure
sentence_id
File description
Contains the ids of the sentences, in all languages, for which audio is available.

General information about the files

The files provided here are updated every Saturday at 9 a.m., France time.

Many of the Japanese and English sentences are from the Tanaka Corpus, which belongs to the public domain.