Downloads
Sentences
- Download
- http://tatoeba.org/files/downloads/sentences.csv
- Fields and structure
- id [tab] lang [tab] text
- Description
- Contains all the sentences. Each sentence is associated to a unique id and a language code (ISO 639-3).
Links
- Download
- http://tatoeba.org/files/downloads/links.csv
- Fields and structure
- sentence_id [tab] translation_id
- Description
- Contains the links between the sentences. 1 [tab] 77 means that sentence nº77 is the translation of sentence nº1. The reciprocal link is also present. In other words, you will also have a line that says 77 [tab] 1.
Japanese indices
- Download
- http://tatoeba.org/files/downloads/jpn_indices.csv
- Fields and structure
- sentence_id [tab] meaning_id [tab] text
- Description
- Contains the equivalent of the "B lines" in the file of the Tanaka Corpus distributed by Jim Breen. See this page to learn the format. Each entry is associated to a pair of Japanese/English sentences. sentence_id refers to the id of the Japanese sentence. meaning_id refers to the id of the English sentence.
General information about the files
The files provided here are updated every Saturday at 9AM, France time.
Most of the Japanese and English sentences are from the Tanaka Corpus, which belongs to the public domain.