Note

The data you will find here will NOT be useful unless you are coding a language tool or processing data.

If you simply want sentences that you can use to learn a language, check out the sentence lists. You can build your own, or view the ones that others have created. The lists can be downloaded and printed.

General information about the files

Many of the Japanese and English sentences are from the Tanaka Corpus, which belongs to the public domain.

Creative commons

These files are released under CC BY 2.0 FR.

A part of our sentences are also available under CC0 1.0.

Licenses covering audio

The license covering an audio file is chosen by the contributor, and is indicated on the page that lists the audio files that he or she has contributed.

Questions?

If you have questions or requests, feel free to contact us. In general, we answer quickly.

Ko'chirib olish

Use this tool to generate and download customized exports on demand.

translate

Sentence pairs

Download all sentences in language A with translations in language B

Download all sentences in language A that are translated into language B, along with the translations.

Sentence language:

Translation language:

info The files provided below are updated every Saturday at 6:30 a.m. (UTC).

Sentences

Filename: {{sentences | filename}}
All languages
Only sentences in: Abxaz tili adigey Afrihili afrikaans Aklanon alban Algerian Arabic amxar Ancient Hebrew arab aragon assam Assyrian Neo-Aramaic asturiy avadxi Avar tili aymara aynu bali Baluj tili bambara Banjar bask Bavarian Baybayanon belarus Bengal tili Berber Berom birman bislama bodo bolgar bosniy boshqird breton Brithenig Buryat tili bxojpuri Central Bikol Central Dusun Central Huasteca Nahuatl Central Kanuri Central Kurdish (Soranî) Central Mnong Coastal Kadazan Cuyonon CycL Dan tili Dhivehi Drents Dungan tili Dutton World Speedwords Eastern Armenian Egyptian Arabic Emilian Erromintxela erzya esperanto estoncha eve Evenk tili Extremaduran farercha fiji Fiji Hindi Finikiy tili fincha fors fransuzcha Frisian friul ga gagauz gaityan galisiy gan Garhwali gavaycha Gheg Albanian gilbert Got tili Greenlandic grek Gronings gruzincha Guadeloupean Creole French guarani Guerrero Nahuatl gujarot Gulf Arabic Hakka Chinese hayda hiligaynon Hill Mari hind Hitchiti Hmong Daw (White) Hmong Njua (Green) Ho Hunsrik iban idish ie igbo Ilocano indonez inglizcha Ingrian Interglossa interlingva inuktitut io Iraqi Arabic irland Isan island ispancha italyan ivrit Jamaican Patois janubiy hayda janubiy kurd janubiy oltoy janubiy saam janubiy soto Jewish Babylonian Aramaic Jewish Palestinian Aramaic Jin Chinese Juhuri (Judeo-Tat) K'iche' kabardin kabil kamba kannada kanton Kapampangan karel katalan kayuga kashmircha Kashubian Kekchi (Q'eqchi') Kelantan-Pattani Malay Keningau Murut kechua kinyaruanda Kirundi klingon Kölsch komi-permyak Komi-Zyrian Konkani (Goan) koreyscha korn korsikan Kotava Kven Finnish kxasi kxosa Láadan Ladin ladino lakota Langua Franca Nova laos Latgalian Latish tili Laz Levantine Arabic Libyan Arabic Ligurian limburg lingala Literary Chinese litva Livonian lojban Lombard lotincha Low German (Low Saxon) Luganda luiziana kreol Lushootseed lyuksemburgcha madur Mahasu Pahari makedon malagasiy malay Malay (Vernacular) malayalam maltiy Mambae Manipur tili Manchu maori mapuche maratxi marshall maythili Meadow Mari men Middle English Middle French Middle Persian (Pahlavi) mikmak Min Nan Chinese minangkabau Mingrelian miranda mohauk moksha Mon mongol Mono (USA) morisyen Moroccan Arabic Muskogee (Creek) Naga (Tangshang) Nahuatl Nande Nauruan navaxo neapolitan nemis (Shveytsariya) nemischa nepal nevar Ngeq niderland Nigerian Fulfulde niue North Frisian North Moluccan Malay Northern Kurdish (Kurmancî) Northern Zaza (Kirmanjki) norveg-bokmal norveg-nyunorsk Novial nuer Nuosu Nyungar no‘g‘ay Odia (Oriya) Ojibwe Okinawan oksitan Old Aramaic Old East Slavic Old English Old French Old Frisian Old Norse Old Saxon Old Spanish Old Tupi Old Turkish Orizaba Nahuatl Osetin tili ozarbayjon Palatine German palau Pali pangasinan papiyamento Pennsylvania German Picard Piedmontese Pipil polyakcha portugalcha Prus tili Pulaar Punjabi (Eastern) Punjabi (Western) pushtu Qadimgi yunon tili qalmoq Qashqay tili qirgʻizcha Qoraqalpoq tili Qoraxoniylar tili qorachoy-bolqor qozoqcha Qrimtatar tili Quenya quyi sorb qo‘miq Rapa Nui Rendille rohinja Romani romansh rumincha Rusyn ruscha samoa Samogitian sango sanskrit santali Saraiki sardin Saterland Frisian saxa sebuan serbcha Setswana Seychellois Creole Silez tili Silot tili Sindarin sindhi singal sitsiliya slovakcha slovencha somalicha South Levantine Arabic Southern Subanen Southern Zaza (Dimli) sranan-tongo suaxili suaxili (Kongo) sundan Suryoniy tili Swabian Swazi Tagal Murut Tahaggart Tamahaq taiti Talossan tamazigxt tamil Tarifit tatar tay Tashelhit Tachawit tekislik kri telugu Temuan Tetun tibet tigre tigrinya tl tojik tok-piksin Tokelauan tokipona Tolish tili Tonga (Zambezi) tongan tsonga tumbuka turk turkman tuva Tuvaluan Uab Meto udmurt ukrain umbundu urdu Urhobo Usmonli turk tili uyg‘ur valliy vallon varay Venetian venger Veps tili volapyuk volof Võro vyetnam Wayuu Western Armenian Xakas tili Xalaj tili xausa Xiang Chinese xitoy xmer xorvat yapon yavan yoruba Yucatec Maya yuqori sorb zaza Zeelandic zulu O'odham o‘zbek Shanghainese shimoliy saam shona shotland shotland-gel Shumer tili Shuswap shved chamorro Chavacano cheroki chex chechen Chinese Pidgin English Chinook Jargon Chinyanja Chigʻatoy tili choktav Chukcha tili chuvash Unknown language
File description: Contains all the sentences in the selected language. Each sentence is associated with a unique id and an ISO 639-3 language code.
Fields and structure: Sentence id [tab] Lang [tab] Text

Detailed Sentences

Filename: {{sentencesDetailed | filename}}
All languages
Only sentences in: Abxaz tili adigey Afrihili afrikaans Aklanon alban Algerian Arabic amxar Ancient Hebrew arab aragon assam Assyrian Neo-Aramaic asturiy avadxi Avar tili aymara aynu bali Baluj tili bambara Banjar bask Bavarian Baybayanon belarus Bengal tili Berber Berom birman bislama bodo bolgar bosniy boshqird breton Brithenig Buryat tili bxojpuri Central Bikol Central Dusun Central Huasteca Nahuatl Central Kanuri Central Kurdish (Soranî) Central Mnong Coastal Kadazan Cuyonon CycL Dan tili Dhivehi Drents Dungan tili Dutton World Speedwords Eastern Armenian Egyptian Arabic Emilian Erromintxela erzya esperanto estoncha eve Evenk tili Extremaduran farercha fiji Fiji Hindi Finikiy tili fincha fors fransuzcha Frisian friul ga gagauz gaityan galisiy gan Garhwali gavaycha Gheg Albanian gilbert Got tili Greenlandic grek Gronings gruzincha Guadeloupean Creole French guarani Guerrero Nahuatl gujarot Gulf Arabic Hakka Chinese hayda hiligaynon Hill Mari hind Hitchiti Hmong Daw (White) Hmong Njua (Green) Ho Hunsrik iban idish ie igbo Ilocano indonez inglizcha Ingrian Interglossa interlingva inuktitut io Iraqi Arabic irland Isan island ispancha italyan ivrit Jamaican Patois janubiy hayda janubiy kurd janubiy oltoy janubiy saam janubiy soto Jewish Babylonian Aramaic Jewish Palestinian Aramaic Jin Chinese Juhuri (Judeo-Tat) K'iche' kabardin kabil kamba kannada kanton Kapampangan karel katalan kayuga kashmircha Kashubian Kekchi (Q'eqchi') Kelantan-Pattani Malay Keningau Murut kechua kinyaruanda Kirundi klingon Kölsch komi-permyak Komi-Zyrian Konkani (Goan) koreyscha korn korsikan Kotava Kven Finnish kxasi kxosa Láadan Ladin ladino lakota Langua Franca Nova laos Latgalian Latish tili Laz Levantine Arabic Libyan Arabic Ligurian limburg lingala Literary Chinese litva Livonian lojban Lombard lotincha Low German (Low Saxon) Luganda luiziana kreol Lushootseed lyuksemburgcha madur Mahasu Pahari makedon malagasiy malay Malay (Vernacular) malayalam maltiy Mambae Manipur tili Manchu maori mapuche maratxi marshall maythili Meadow Mari men Middle English Middle French Middle Persian (Pahlavi) mikmak Min Nan Chinese minangkabau Mingrelian miranda mohauk moksha Mon mongol Mono (USA) morisyen Moroccan Arabic Muskogee (Creek) Naga (Tangshang) Nahuatl Nande Nauruan navaxo neapolitan nemis (Shveytsariya) nemischa nepal nevar Ngeq niderland Nigerian Fulfulde niue North Frisian North Moluccan Malay Northern Kurdish (Kurmancî) Northern Zaza (Kirmanjki) norveg-bokmal norveg-nyunorsk Novial nuer Nuosu Nyungar no‘g‘ay Odia (Oriya) Ojibwe Okinawan oksitan Old Aramaic Old East Slavic Old English Old French Old Frisian Old Norse Old Saxon Old Spanish Old Tupi Old Turkish Orizaba Nahuatl Osetin tili ozarbayjon Palatine German palau Pali pangasinan papiyamento Pennsylvania German Picard Piedmontese Pipil polyakcha portugalcha Prus tili Pulaar Punjabi (Eastern) Punjabi (Western) pushtu Qadimgi yunon tili qalmoq Qashqay tili qirgʻizcha Qoraqalpoq tili Qoraxoniylar tili qorachoy-bolqor qozoqcha Qrimtatar tili Quenya quyi sorb qo‘miq Rapa Nui Rendille rohinja Romani romansh rumincha Rusyn ruscha samoa Samogitian sango sanskrit santali Saraiki sardin Saterland Frisian saxa sebuan serbcha Setswana Seychellois Creole Silez tili Silot tili Sindarin sindhi singal sitsiliya slovakcha slovencha somalicha South Levantine Arabic Southern Subanen Southern Zaza (Dimli) sranan-tongo suaxili suaxili (Kongo) sundan Suryoniy tili Swabian Swazi Tagal Murut Tahaggart Tamahaq taiti Talossan tamazigxt tamil Tarifit tatar tay Tashelhit Tachawit tekislik kri telugu Temuan Tetun tibet tigre tigrinya tl tojik tok-piksin Tokelauan tokipona Tolish tili Tonga (Zambezi) tongan tsonga tumbuka turk turkman tuva Tuvaluan Uab Meto udmurt ukrain umbundu urdu Urhobo Usmonli turk tili uyg‘ur valliy vallon varay Venetian venger Veps tili volapyuk volof Võro vyetnam Wayuu Western Armenian Xakas tili Xalaj tili xausa Xiang Chinese xitoy xmer xorvat yapon yavan yoruba Yucatec Maya yuqori sorb zaza Zeelandic zulu O'odham o‘zbek Shanghainese shimoliy saam shona shotland shotland-gel Shumer tili Shuswap shved chamorro Chavacano cheroki chex chechen Chinese Pidgin English Chinook Jargon Chinyanja Chigʻatoy tili choktav Chukcha tili chuvash Unknown language
File description: Contains additional fields for each sentence (owner name, date created/modified).
Fields and structure: Sentence id [tab] Lang [tab] Text [tab] Username [tab] Date added [tab] Date last modified

Original and Translated Sentences

Filename

sentences_base.tar.bz2

File description

Each sentence is listed as original or a translation of another. The "base" field can have the following values:

zero: The sentence is original, not a translation of another.
greater than zero: The id of the sentence from which it was translated.
\N: Unknown (rare).

Fields and structure

Sentence id [tab] Base field

Sentences (CC0)

Filename: {{sentencesCC0 | filename}}
All languages
Only sentences in: Algerian Arabic Ancient Hebrew arab belarus Bengal tili Berber Dan tili esperanto Finikiy tili fincha fransuzcha hind Ho idish inglizcha interlingva io ispancha italyan ivrit Jewish Babylonian Aramaic Jewish Palestinian Aramaic kabil kanton karel katalan klingon Konkani (Goan) Kven Finnish Láadan ladino Ligurian Literary Chinese lotincha Middle English nemischa niderland norveg-bokmal Nyungar Old Aramaic Old Frisian Old Norse polyakcha portugalcha Qadimgi yunon tili ruscha santali Silot tili tamazigxt Tachawit tokipona ukrain valliy venger volapyuk xitoy yapon shved chex Unknown language
File description: Contains all the sentences available under CC0.
Fields and structure: Sentence id [tab] Lang [tab] Text [tab] Date last modified

Lists

Filename: user_lists.tar.bz2
File description: Contains the list of sentence lists.
Fields and structure: List id [tab] Username [tab] Date created [tab] Date last modified [tab] List name [tab] Editable by

Sentences in lists

Filename: sentences_in_lists.tar.bz2
File description: Indicates the sentences that are contained by any lists. 13 [tab] 381279 means that sentence #381279 is contained by the list that has an id of 13.
Fields and structure: List id [tab] Sentence id

Japanese indices

Filename: jpn_indices.tar.bz2
File description: Contains the equivalent of the "B lines" in the Tanaka Corpus file distributed by Jim Breen. See this page for the format. Each entry is associated with a pair of Japanese/English sentences. Sentence id refers to the id of the Japanese sentence. Meaning id refers to the id of the English sentence.
Fields and structure: Sentence id [tab] Meaning id [tab] Text

Sentences with audio

Filename: sentences_with_audio.tar.bz2
File description: Contains the ids of the sentences, in all languages, for which audio is available. Other fields indicate who recorded the audio, its license and a URL to attribute the author. If the license field is empty, you may not reuse the audio outside the Tatoeba project.
Downloading audio: A single sentence can have one or more audio, each from a different voice. To download a particular audio, use its audio id to compute the download URL. For example, to download the audio with the id 1234, the URL is https://tatoeba.org/audio/download/1234.
Fields and structure: Sentence id [tab] Audio id [tab] Username [tab] License [tab] Attribution URL

User skill level per language

Filename: user_languages.tar.bz2
File description: Indicates the self-reported skill levels of members in individual languages.
Fields and structure: Lang [tab] Skill level [tab] Username [tab] Details

Users' sentence reviews

Filename: users_sentences.csv
File description: Contains sentences reviewed by users. The value of the review can be -1 (sentence not OK), 0 (undecided or unsure), or 1 (sentence OK). Warning: this data is still experimental.
Fields and structure: Username [tab] Sentence id [tab] Review [tab] Date added [tab] Date last modified

Transcriptions

Filename: {{transcriptions | filename}}
All languages
Only sentences in: kanton xitoy yapon o‘zbek
File description: Contains all transcriptions in auxiliary or alternative scripts. A username associated with a transcription indicates the user who last reviewed and possibly modified it. A transcription without a username has not been marked as reviewed. The script name is defined according to the ISO 15924 standard.
Fields and structure: Sentence id [tab] Lang [tab] Script name [tab] Username [tab] Transcription

Note

General information about the files

Creative commons

Licenses covering audio

Questions?

Ko'chirib olish

Sentences

Detailed Sentences

Original and Translated Sentences

Sentences (CC0)

Links

Tags

Lists

Sentences in lists

Japanese indices

Sentences with audio

User skill level per language

Users' sentence reviews

Transcriptions

Need some help?

Developers

About

Note

General information about the files

Creative commons

Licenses covering audio

Questions?

Ko'chirib olish

Custom exports

Sentence pairs

Weekly exports

Sentences

Detailed Sentences

Original and Translated Sentences

Sentences (CC0)

Links

Tags

Lists

Sentences in lists

Japanese indices

Sentences with audio

User skill level per language

Users' sentence reviews

Transcriptions

Need some help?

Developers

About