Nota
The data you will find here will NOT be useful unless you are coding a language tool or processing data.
If you simply want sentences that you can use to learn a language, check out the sentence lists. You can build your own, or view the ones that others have created. The lists can be downloaded and printed.
General information about the files
Many of the Japanese and English sentences are from the Tanaka Corpus, which belongs to the public domain.
Creative commons
These files are released under CC BY 2.0 FR.

A part of our sentences are also available under CC0 1.0.

Licenses covering audio
The license covering an audio file is chosen by the contributor, and is indicated on the page that lists the audio files that he or she has contributed.
Questions?
If you have questions or requests, feel free to contact us. In general, we answer quickly.
Descargas
Use this tool to generate and download customized exports on demand.
Download all sentences in language A that are translated into language B, along with the translations.
Frases
- Filename
-
Tódalas linguas Only sentences in: acerbaixano adigueo Africáner Afrihili aimará Ainu Albanés Alemán Alemán de Pensilvania alemán suízo altai meridional alto sorbio amhárico Ancient Hebrew Árabe alxeriano Árabe exipcio Árabe iraquí Árabe levantino Árabe libio árabe marroquí Arábigo aragonés Armenio occidental assamés Assyrian Neo-Aramaic Asturiano awadhi baixo sorbio balinés bambara baxkir Baybayanon Bengalí Berber Berom bhojpuri Bielorruso birmano bislama bodo Bosníaco Bretón Brithenig Búlgaro cabardiano cabila calmuco Cantonés carachaio-bálcara carelio Casaco Castelán castelán medieval Catalán caxemirés cayuga cebuano Central Dusun Central Huasteca Nahuatl Central Kanuri Central Kurdish (Soranî) Central Mnong Chagatai Chamorro Chavacano checheno Checo cherokee chinés Chinés clásico Chinese Pidgin English Chinook Jargon Chinyanja choctaw chuvaxo cingalés Coreano córnico corso cree das chairas Crioulo das Seychelles crioulo de Luisiana crioulo haitiano crioulo mauriciano Croata Cuyonon CycL Dinamarqués Drents Dungan Dutton World Speedwords Eastern Armenian Emilian Erromintxela erzya escocés Eslavo oriental antigo Eslovaco Esloveno Esperanto Estoniano Éuscaro Evenki Feroés Finés fixiano Francés francés medio Frisón friulano ga Gaélico escocés gagauz Galego galés Gan Garhwali Gheg Albanian Grego grego antigo Gronings Guadeloupean Creole French guaraní Gulf Arabic Gun guxarati haida haida do sur hausa hawaiano Hebreo hiligaynon Hindi Hindi de Fidxi Hitchiti Hmong Daw (White) Hmong Njua (Green) Ho Holandés Húngaro Hunsrik iacuto iban Ido ie igbo Indonesio Inglés inglés antigo Inglés medio Interglossa Interlingua Interslavic inuktitut ioruba Iraniano Irlandés Isan Islandés Italiano Jewish Babylonian Aramaic Jewish Palestinian Aramaic Jin Chinese Juhuri (Judeo-Tat) K'iche' kamba kannará Karakhanid Kekchi (Q'eqchi') Kelantan-Pattani Malay Keningau Murut Khalaj khasi khmer Kinande kiñaruanda kirguiz kiribatiano Klingon Kölsch komi permio Konkani (Goan) Kotava kumyk kurdo meridional Kven Finnish Láadan ladino lakota laosiano Latín Letón Lezgi limburgués lingala Lingua abaza Lingua abkhaza Lingua aklanon Lingua avar Lingua baluchi Lingua banjar Lingua bávara Lingua bikol central Lingua buriata Lingua caxubia Lingua Chukoto Lingua estremeña Lingua ewe Lingua fenicia Lingua Franca Nova Lingua frisoa oriental Lingua frisoa setentrional Lingua gótica Lingua grenlandesa Lingua hakka Lingua ilocana Lingua ingria Lingua kadazan Lingua kapampangan Lingua karakalpak Lingua khakas Lingua kirundi Lingua komi-zyrian Lingua ladina Lingua latgaliá Lingua laz Lingua lígur Lingua livonia Lingua lombarda Lingua luganda Lingua maia iucateca lingua maldivana Lingua manchú Lingua mari das montañas Lingua mari das pradeiras Lingua meithei Lingua mingreliana Lingua mon Lingua náhuatl de Guerrero Lingua náhuatl de Orizaba Lingua nauruana Lingua nuosu Lingua o'odham Lingua ojibwa Lingua okinawana Lingua pali Lingua picarda Lingua piemontesa Lingua pipil Lingua prusiana antiga Lingua rapanui Lingua rifeña Lingua romaní Lingua rutena Lingua saraiki Lingua shilha Lingua shuswap Lingua silesiana Lingua siríaca Lingua suaba Lingua suazi Lingua sumeria Lingua sylheti Lingua talysh Lingua tártara de Crimea Lingua tokelauana Lingua turca otomá Lingua tuvalesa Lingua véneta Lingua vepsa Lingua võro Lingua wayuu Lingua zelandesa Lituano Lojban Low German (Low Saxon) Lushootseed luxemburgués macedonio madurés Mahasu Pahari maithili Malaiala Malaio Malay (Vernacular) malgaxe maltés Mambae Mandar manx maorí mapuche marathi marshalés micmac Middle Persian (Pahlavi) Min Nan Chinese minangkabau mirandés mohawk moksha Mongol Mono (USA) Muskogee (Creek) Naga (Tangshang) Nahuatl napolitano navajo nepalí newari Ngeq Nigerian Fulfulde niueano nogai nórdico antigo North Moluccan Malay Northern Kurdish (Kurmancî) Northern Zaza (Kirmanjki) noruegués bokmål noruegués nynorsk Novial nuer Nyungar occitano Odia (Oriya) Old Aramaic Old French Old Frisian Old Saxon Old Turkish Ossetio Palatine German palauano pangasinan papiamento Patois xamaicano paxto Polonés Portugués Pulaar Punjabi (Eastern) Punjabi (Western) Qashqai Quechua Quenya Rendille rohingya Romanche Romanés Ruso saami meridional saami setentrional samoano Samoxiciano sango Sánscrito santali sardo Serbio sesotho Setswana shona Siciliano Sindarin sindhi somalí South Levantine Arabic Southern Subanen Southern Zaza (Dimli) sranan tongo Suahili suahili congolés Sueco sundanés Tachawit Tagal Murut Tagalo Tahaggart Tamahaq tahitiano Tailandés Talossan tamazight marroquí estándar támil Tártaro taxico telugu Temuan Tetun tibetano tigré tigriña tok pisin Toki Pona Tonga (Zambezi) tongano tsonga tumbuka Tupinambá Turco turcomán tuvaniano Uab Meto Ucraíno udmurto Uigur umbundu Urdú Urhobo Usbeco valón Vietnamita Volapük waray-waray West-Central Oromo wólof Xangainés Xaponés xavanés Xeorxiano xhosa Xiang Chinese Yiddish zazaki zulú Unknown language - File description
- Contains all the sentences in the selected language. Each sentence is associated with a unique id and an ISO 639-3 language code.
- Casas e estrutura
- Sentence id [tab] Lang [tab] Text
Detailed Sentences
- Filename
-
{{sentencesDetailed | filename}}
Tódalas linguas Only sentences in: acerbaixano adigueo Africáner Afrihili aimará Ainu Albanés Alemán Alemán de Pensilvania alemán suízo altai meridional alto sorbio amhárico Ancient Hebrew Árabe alxeriano Árabe exipcio Árabe iraquí Árabe levantino Árabe libio árabe marroquí Arábigo aragonés Armenio occidental assamés Assyrian Neo-Aramaic Asturiano awadhi baixo sorbio balinés bambara baxkir Baybayanon Bengalí Berber Berom bhojpuri Bielorruso birmano bislama bodo Bosníaco Bretón Brithenig Búlgaro cabardiano cabila calmuco Cantonés carachaio-bálcara carelio Casaco Castelán castelán medieval Catalán caxemirés cayuga cebuano Central Dusun Central Huasteca Nahuatl Central Kanuri Central Kurdish (Soranî) Central Mnong Chagatai Chamorro Chavacano checheno Checo cherokee chinés Chinés clásico Chinese Pidgin English Chinook Jargon Chinyanja choctaw chuvaxo cingalés Coreano córnico corso cree das chairas Crioulo das Seychelles crioulo de Luisiana crioulo haitiano crioulo mauriciano Croata Cuyonon CycL Dinamarqués Drents Dungan Dutton World Speedwords Eastern Armenian Emilian Erromintxela erzya escocés Eslavo oriental antigo Eslovaco Esloveno Esperanto Estoniano Éuscaro Evenki Feroés Finés fixiano Francés francés medio Frisón friulano ga Gaélico escocés gagauz Galego galés Gan Garhwali Gheg Albanian Grego grego antigo Gronings Guadeloupean Creole French guaraní Gulf Arabic Gun guxarati haida haida do sur hausa hawaiano Hebreo hiligaynon Hindi Hindi de Fidxi Hitchiti Hmong Daw (White) Hmong Njua (Green) Ho Holandés Húngaro Hunsrik iacuto iban Ido ie igbo Indonesio Inglés inglés antigo Inglés medio Interglossa Interlingua Interslavic inuktitut ioruba Iraniano Irlandés Isan Islandés Italiano Jewish Babylonian Aramaic Jewish Palestinian Aramaic Jin Chinese Juhuri (Judeo-Tat) K'iche' kamba kannará Karakhanid Kekchi (Q'eqchi') Kelantan-Pattani Malay Keningau Murut Khalaj khasi khmer Kinande kiñaruanda kirguiz kiribatiano Klingon Kölsch komi permio Konkani (Goan) Kotava kumyk kurdo meridional Kven Finnish Láadan ladino lakota laosiano Latín Letón Lezgi limburgués lingala Lingua abaza Lingua abkhaza Lingua aklanon Lingua avar Lingua baluchi Lingua banjar Lingua bávara Lingua bikol central Lingua buriata Lingua caxubia Lingua Chukoto Lingua estremeña Lingua ewe Lingua fenicia Lingua Franca Nova Lingua frisoa oriental Lingua frisoa setentrional Lingua gótica Lingua grenlandesa Lingua hakka Lingua ilocana Lingua ingria Lingua kadazan Lingua kapampangan Lingua karakalpak Lingua khakas Lingua kirundi Lingua komi-zyrian Lingua ladina Lingua latgaliá Lingua laz Lingua lígur Lingua livonia Lingua lombarda Lingua luganda Lingua maia iucateca lingua maldivana Lingua manchú Lingua mari das montañas Lingua mari das pradeiras Lingua meithei Lingua mingreliana Lingua mon Lingua náhuatl de Guerrero Lingua náhuatl de Orizaba Lingua nauruana Lingua nuosu Lingua o'odham Lingua ojibwa Lingua okinawana Lingua pali Lingua picarda Lingua piemontesa Lingua pipil Lingua prusiana antiga Lingua rapanui Lingua rifeña Lingua romaní Lingua rutena Lingua saraiki Lingua shilha Lingua shuswap Lingua silesiana Lingua siríaca Lingua suaba Lingua suazi Lingua sumeria Lingua sylheti Lingua talysh Lingua tártara de Crimea Lingua tokelauana Lingua turca otomá Lingua tuvalesa Lingua véneta Lingua vepsa Lingua võro Lingua wayuu Lingua zelandesa Lituano Lojban Low German (Low Saxon) Lushootseed luxemburgués macedonio madurés Mahasu Pahari maithili Malaiala Malaio Malay (Vernacular) malgaxe maltés Mambae Mandar manx maorí mapuche marathi marshalés micmac Middle Persian (Pahlavi) Min Nan Chinese minangkabau mirandés mohawk moksha Mongol Mono (USA) Muskogee (Creek) Naga (Tangshang) Nahuatl napolitano navajo nepalí newari Ngeq Nigerian Fulfulde niueano nogai nórdico antigo North Moluccan Malay Northern Kurdish (Kurmancî) Northern Zaza (Kirmanjki) noruegués bokmål noruegués nynorsk Novial nuer Nyungar occitano Odia (Oriya) Old Aramaic Old French Old Frisian Old Saxon Old Turkish Ossetio Palatine German palauano pangasinan papiamento Patois xamaicano paxto Polonés Portugués Pulaar Punjabi (Eastern) Punjabi (Western) Qashqai Quechua Quenya Rendille rohingya Romanche Romanés Ruso saami meridional saami setentrional samoano Samoxiciano sango Sánscrito santali sardo Serbio sesotho Setswana shona Siciliano Sindarin sindhi somalí South Levantine Arabic Southern Subanen Southern Zaza (Dimli) sranan tongo Suahili suahili congolés Sueco sundanés Tachawit Tagal Murut Tagalo Tahaggart Tamahaq tahitiano Tailandés Talossan tamazight marroquí estándar támil Tártaro taxico telugu Temuan Tetun tibetano tigré tigriña tok pisin Toki Pona Tonga (Zambezi) tongano tsonga tumbuka Tupinambá Turco turcomán tuvaniano Uab Meto Ucraíno udmurto Uigur umbundu Urdú Urhobo Usbeco valón Vietnamita Volapük waray-waray West-Central Oromo wólof Xangainés Xaponés xavanés Xeorxiano xhosa Xiang Chinese Yiddish zazaki zulú Unknown language - File description
- Contains additional fields for each sentence (owner name, date created/modified).
- Casas e estrutura
- Sentence id [tab] Lang [tab] Text [tab] Sobrenome [tab] Date added [tab] Date last modified
Original and Translated Sentences
- Filename
- sentences_base.tar.bz2
- File description
-
Each sentence is listed as original or a translation of another. The "base" field can have the following values:
- zero: The sentence is original, not a translation of another.
- greater than zero: The id of the sentence from which it was translated.
- \N: Unknown (rare).
- Casas e estrutura
- Sentence id [tab] Base field
Sentences (CC0)
- Filename
-
Tódalas linguas Only sentences in: Alemán Ancient Hebrew Árabe alxeriano Arábigo Bengalí Berber Bielorruso cabila Cantonés carelio Castelán Catalán Checo chinés Chinés clásico Dinamarqués Esperanto Finés Francés galés grego antigo Hebreo Hindi Ho Holandés Húngaro Ido ie Inglés Inglés medio Interlingua Italiano Jewish Babylonian Aramaic Jewish Palestinian Aramaic Klingon Konkani (Goan) Kven Finnish Láadan ladino Latín Lingua fenicia Lingua lígur Lingua sylheti nórdico antigo noruegués bokmål Nyungar Odia (Oriya) Old Aramaic Old Frisian Polonés Portugués Ruso santali Sueco Tachawit tamazight marroquí estándar Toki Pona Ucraíno Volapük Xaponés Yiddish Unknown language - File description
- Contains all the sentences available under CC0.
- Casas e estrutura
- Sentence id [tab] Lang [tab] Text [tab] Date last modified
Links
- Filename
- links.tar.bz2
- File description
- Contains the links between the sentences. 1 [tab] 77 means that sentence #77 is the translation of sentence #1. The reciprocal link is also present, so the file will also contain a line that says 77 [tab] 1.
- Casas e estrutura
- Sentence id [tab] Translation id
Etiquetas
- Filename
- tags.tar.bz2
- File description
- Contains the list of tags associated with each sentence. 381279 [tab] proverb means that sentence #381279 has been assigned the "proverb" tag.
- Casas e estrutura
- Sentence id [tab] Tag name
Lists
- Filename
- user_lists.tar.bz2
- File description
- Contains the list of sentence lists.
- Casas e estrutura
- List id [tab] Sobrenome [tab] Date created [tab] Date last modified [tab] List name [tab] Editable by
Sentences in lists
- Filename
- sentences_in_lists.tar.bz2
- File description
- Indicates the sentences that are contained by any lists. 13 [tab] 381279 means that sentence #381279 is contained by the list that has an id of 13.
- Casas e estrutura
- List id [tab] Sentence id
Japanese indices
- Filename
- jpn_indices.tar.bz2
- File description
- Contains the equivalent of the "B lines" in the Tanaka Corpus file distributed by Jim Breen. See this page for the format. Each entry is associated with a pair of Japanese/English sentences. Sentence id refers to the id of the Japanese sentence. Meaning id refers to the id of the English sentence.
- Casas e estrutura
- Sentence id [tab] Meaning id [tab] Text
Frases con son
- Filename
- sentences_with_audio.tar.bz2
- File description
- Contains the ids of the sentences, in all languages, for which audio is available. Other fields indicate who recorded the audio, its license and a URL to attribute the author. If the license field is empty, you may not reuse the audio outside the Tatoeba project.
- Downloading audio
- A single sentence can have one or more audio, each from a different voice. To download a particular audio, use its audio id to compute the download URL. For example, to download the audio with the id 1234, the URL is https://tatoeba.org/audio/download/1234.
- Casas e estrutura
- Sentence id [tab] Audio id [tab] Sobrenome [tab] License [tab] Attribution URL
User skill level per language
- Filename
- user_languages.tar.bz2
- File description
- Indicates the self-reported skill levels of members in individual languages.
- Casas e estrutura
- Lang [tab] Skill level [tab] Sobrenome [tab] Details
Users' sentence reviews
- Filename
- users_sentences.csv
- File description
- Contains sentences reviewed by users. The value of the review can be -1 (sentence not OK), 0 (undecided or unsure), or 1 (sentence OK). Warning: this data is still experimental.
- Casas e estrutura
- Sobrenome [tab] Sentence id [tab] Review [tab] Date added [tab] Date last modified
Transcriptions
- Filename
-
Tódalas linguas Only sentences in: Cantonés chinés Usbeco Xaponés - File description
- Contains all transcriptions in auxiliary or alternative scripts. A username associated with a transcription indicates the user who last reviewed and possibly modified it. A transcription without a username has not been marked as reviewed. The script name is defined according to the ISO 15924 standard.
- Casas e estrutura
- Sentence id [tab] Lang [tab] Script name [tab] Sobrenome [tab] Transcription