Downloads
Sentences
- Download
- 1. http://tatoeba.org/files/downloads/sentences.csv
- 2. http://tatoeba.org/files/downloads/sentences_detailed.csv
- Fields and structure
- 1. id [tab] lang [tab] text
- 2. id [tab] lang [tab] text [tab] username [tab] date_added [tab] date_last_modified
- Description
-
Contains all the sentences. Each sentence is associated to a
unique id and a language code
(ISO 639-3).
We provide two files. The first file (sentences.csv) only contains the minimum. The second file (sentences_detailed.csv) contains more information, for those who would like to filter the sentences based, for instance, on the contributor who owns the sentence or on the date when it was added.
Links
- Download
- http://tatoeba.org/files/downloads/links.csv
- Fields and structure
- sentence_id [tab] translation_id
- Description
- Contains the links between the sentences. 1 [tab] 77 means that sentence nº77 is the translation of sentence nº1. The reciprocal link is also present. In other words, you will also have a line that says 77 [tab] 1.
Tags
- Download
- http://tatoeba.org/files/downloads/tags.csv
- Fields and structure
- sentence_id [tab] tag_name
- Description
- Contains the list of tags associated to each sentence. 381279 [tab] proverb means that sentence nº381279 has been tagged with "proverb".
Lists
- Download
- http://tatoeba.org/files/downloads/lists.csv
- Fields and structure
- sentence_id [tab] list_name
- Description
- Contains the list of sentence lists associated to each sentence. 381279 [tab] My own list means that sentence nº381279 is part of "My own list".
Japanese indices
- Download
- http://tatoeba.org/files/downloads/jpn_indices.csv
- Fields and structure
- sentence_id [tab] meaning_id [tab] text
- Description
- Contains the equivalent of the "B lines" in the file of the Tanaka Corpus distributed by Jim Breen. See this page to learn the format. Each entry is associated to a pair of Japanese/English sentences. sentence_id refers to the id of the Japanese sentence. meaning_id refers to the id of the English sentence.
General information about the files
The files provided here are updated every Saturday at 9AM, France time.
Most of the Japanese and English sentences are from the Tanaka Corpus, which belongs to the public domain.