menu
Tatoeba
language
S'inscriure Connexion
language Occitan
menu
Tatoeba

chevron_right S'inscriure

chevron_right Connexion

Percórrer

chevron_right Afichar la frasa aleatòria

chevron_right Percórrer per lenga

chevron_right Percórrer per lista

chevron_right Percórrer per etiqueta

chevron_right Percórrer los enregistraments àudio

Community

chevron_right Paret

chevron_right Lista de totes los membres

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Nòta

The data you will find here will NOT be useful unless you are coding a language tool or processing data.

If you simply want sentences that you can use to learn a language, check out the sentence lists. You can build your own, or view the ones that others have created. The lists can be downloaded and printed.

General information about the files

Many of the Japanese and English sentences are from the Tanaka Corpus, which belongs to the public domain.

Creative commons

These files are released under CC BY 2.0 FR.

Creative Commons License CC-BY

A part of our sentences are also available under CC0 1.0.

Creative Commons License CC0

Licenses covering audio

The license covering an audio file is chosen by the contributor, and is indicated on the page that lists the audio files that he or she has contributed.

Questions?

If you have questions or requests, feel free to contact us. In general, we answer quickly.

Telecargaments

arrow_back

Custom exports

Sentence pairs

Use this tool to generate and download customized exports on demand.

translate Sentence pairs
Download all sentences in language A with translations in language B

Download all sentences in language A that are translated into language B, along with the translations.

Weekly exports

info The files provided below are updated every Saturday at 6:30 a.m. (UTC).

Frasas

Filename

{{sentences | filename}}

Totas las lengas
Only sentences in: Abaza Abkhaz Adyghe Afrihili Afrikans Ainu Aklanon Albanés Alemand Algerian Arabic Amharic Ancient Greek Ancient Hebrew Anglés Arabi Arabi egipcian Arabi iraquian Aragonese armèni occidental Assamese Assyrian Neo-Aramaic Asturian Àvar Awadhi Aymara Azerbaijani Balinese Baluchi Bambara Banjar Basc Bashkir Bavarian Baybayanon Bengalí Berber Berom Bhojpuri Bielorus Bislama Bodo Bosniac Breton Brithenig Bulgar Burmese Buryat Cabard Cantonés Catalan Cayuga Cazac Cebuano Central Bikol Central Dusun Central Huasteca Nahuatl Central Kanuri Central Kurdish (Soranî) Central Mnong Chagatai Chamorro Chavacan Chèc Chechen Cherokee Chinés classic Chinese Pidgin English Chinook Jargon Chinyanja Choctaw Chukchi Chuvash Coastal Kadazan Corean Cornish Corsican Croat Cuyonon CycL Danés Divehi Drents Dungan Dutton World Speedwords Eastern Armenian Ebrieu Emilian Erromintxela Erzya Eslau oriental ancian Eslovac Eslovèn Espanhòl Esperanto Estonian Evenki Ewe Extremaduran Fenician Feroés Fiji Hindi Fijian Finés Francés Frison Friulian Ga Gaelic escocés Gagauz Galician Gan Chinese Garhwali Georgian Gheg Albanian Gilbertese Golf Arabic Gothic Grèc Greenlandic Gronings Guadeloupean Creole French Guarani Guerrero Nahuatl Gujarati Gun Haitian Creole Hakka Chinese Hausa Hawaiian Hiligaynon Hill Mari Hitchiti Hmong Daw (White) Hmong Njua (Green) Ho Hunsrik Iacot Iban Ido Igbo Ilocano Indi Indonesian Ingrian Interglossa Interlingua Interlingue Interslavic Inuktitut Irlandés Isan Islandés Italian Jamaican Japonés Javanese Jewish Babylonian Aramaic Jewish Palestinian Aramaic Jin Chinese Juhuri (Judeo-Tat) K'iche' Kabyle Kalmyk Kamba Kannada Kapampangan Karachay-Balkar Karakalpak Karakhanid Karelian Kashmiri Kashubian Kekchi (Q'eqchi') Kelantan-Pattani Malay Keningau Murut Khakas Khalaj Khasi Khmer Kinyarwanda Kirundi Klingon Kölsch Komi-Permyak Komi-Zyrian Konkani (Goan) Kotava Kumyk Kven Finnish Kyrgyz Láadan Ladin Ladino Lakota Lao Latgalian Latin Laz Lezgi Libyan Arabic Ligurian Limburgish Lingala Lingua Franca Nova Lituanian Lituanian Livonian Lojban Lombard Louisiana Creole Low German (Low Saxon) Lower Sorbian Luganda Lushootseed Luxembourgish Macedonian Madurese Mahasu Pahari Maithili Malagasy Malaialam Malay (Vernacular) Malés Maltese Mambae Manchu Mandarin Manx Maori mapoche Marathi Marshallese Meadow Mari Meitei Mi'kmaq Middle English Middle French Middle Persian (Pahlavi) Min Nan Chinese Minangkabau Mingrelian Mirandese Mohawk Moksha Mon Mongòl Mono (USA) Morisyen Moroccan Arabic Muskogee (Creek) Naga (Tangshang) Nahuatl Nande Napolitan Nauruan Navajo Neerlandés nepalés Newari Ngeq Nigerian Fulfulde Niuean Nogai North Frisian North Levantine Arabic North Moluccan Malay Northern Haida Northern Kurdish (Kurmancî) Northern Sami Northern Zaza (Kirmanjki) Norwegian Bokmål Norwegian Nynorsk Novial Nuer Nuosu Nyungar O'odham Occitan Odia (Oriya) Oïgor Ojibwe Old Aramaic Old English Old French Old Frisian Old Norse Old Saxon Old Spanish Old Turkish Ongrés oquinavan Ordo Orizaba Nahuatl Ossèt Ottoman Turkish Ozbèc Palatine German Palauan Pali Pangasinan Papiamento Pashto Pennsylvania German Persan Picard Piedmontese Pipil Plains Cree Polonés Portugués Prussian ancian Pulaar Punjabi (Eastern) Punjabi (Western) Qashqai Quenya Quíchoa Rapa Nui Rendille Rifenc Rohingya Romanch Romanés Romaní Rus Rusyn Samoan Samogitian Sango Sanscrit Santali Saraiki Sardinian Saterland Frisian Scots Sèrbe Setswana Seychellois Creole Shilha Shona Shuswap Sicilian Silesian Sindarin Sindhi Sinhala Somali South Levantine Arabic Southern Altai Southern Haida Southern Kurdish Southern Sami Southern Sotho Southern Subanen Southern Zaza (Dimli) Sranan Tongo Standard Moroccan Tamazight Suedés Sumerian Sundanese Swabian Swahili swahili de Congo Swazi Swiss German Sylheti Syriac Tachawit Tagal Murut Tagalog Tahaggart Tamahaq Tahitian Tai Tajik Talossan Talysh Tamil Tatar Tatar de Crimèa Telugu Temuan Tetun Tibetan Tigre Tigrinya Tok Pisin Tokelauan Toki Pona Tonga (Zambezi) Tongan Tsonga Tumbuka Tupinambá Turc Turkmen tuvaluan Tuvinian Uab Meto Udmurt Umbundu Upper Sorbian Urcraïnian Urhobo Venetian Veps Vietnamian Volapük Võro Walloon Waray Wayuu Welsh West-Central Oromo Wolof wu Xhosa Xiang Chinese Yiddish Yoruba Yucatec Maya Zaza Zeelandic Zulu Unknown language
File description
Contains all the sentences in the selected language. Each sentence is associated with a unique id and an ISO 639-3 language code.
Camps e estructura
Sentence id [tab] Lang [tab] Text

Detailed Sentences

Filename

{{sentencesDetailed | filename}}

Totas las lengas
Only sentences in: Abaza Abkhaz Adyghe Afrihili Afrikans Ainu Aklanon Albanés Alemand Algerian Arabic Amharic Ancient Greek Ancient Hebrew Anglés Arabi Arabi egipcian Arabi iraquian Aragonese armèni occidental Assamese Assyrian Neo-Aramaic Asturian Àvar Awadhi Aymara Azerbaijani Balinese Baluchi Bambara Banjar Basc Bashkir Bavarian Baybayanon Bengalí Berber Berom Bhojpuri Bielorus Bislama Bodo Bosniac Breton Brithenig Bulgar Burmese Buryat Cabard Cantonés Catalan Cayuga Cazac Cebuano Central Bikol Central Dusun Central Huasteca Nahuatl Central Kanuri Central Kurdish (Soranî) Central Mnong Chagatai Chamorro Chavacan Chèc Chechen Cherokee Chinés classic Chinese Pidgin English Chinook Jargon Chinyanja Choctaw Chukchi Chuvash Coastal Kadazan Corean Cornish Corsican Croat Cuyonon CycL Danés Divehi Drents Dungan Dutton World Speedwords Eastern Armenian Ebrieu Emilian Erromintxela Erzya Eslau oriental ancian Eslovac Eslovèn Espanhòl Esperanto Estonian Evenki Ewe Extremaduran Fenician Feroés Fiji Hindi Fijian Finés Francés Frison Friulian Ga Gaelic escocés Gagauz Galician Gan Chinese Garhwali Georgian Gheg Albanian Gilbertese Golf Arabic Gothic Grèc Greenlandic Gronings Guadeloupean Creole French Guarani Guerrero Nahuatl Gujarati Gun Haitian Creole Hakka Chinese Hausa Hawaiian Hiligaynon Hill Mari Hitchiti Hmong Daw (White) Hmong Njua (Green) Ho Hunsrik Iacot Iban Ido Igbo Ilocano Indi Indonesian Ingrian Interglossa Interlingua Interlingue Interslavic Inuktitut Irlandés Isan Islandés Italian Jamaican Japonés Javanese Jewish Babylonian Aramaic Jewish Palestinian Aramaic Jin Chinese Juhuri (Judeo-Tat) K'iche' Kabyle Kalmyk Kamba Kannada Kapampangan Karachay-Balkar Karakalpak Karakhanid Karelian Kashmiri Kashubian Kekchi (Q'eqchi') Kelantan-Pattani Malay Keningau Murut Khakas Khalaj Khasi Khmer Kinyarwanda Kirundi Klingon Kölsch Komi-Permyak Komi-Zyrian Konkani (Goan) Kotava Kumyk Kven Finnish Kyrgyz Láadan Ladin Ladino Lakota Lao Latgalian Latin Laz Lezgi Libyan Arabic Ligurian Limburgish Lingala Lingua Franca Nova Lituanian Lituanian Livonian Lojban Lombard Louisiana Creole Low German (Low Saxon) Lower Sorbian Luganda Lushootseed Luxembourgish Macedonian Madurese Mahasu Pahari Maithili Malagasy Malaialam Malay (Vernacular) Malés Maltese Mambae Manchu Mandarin Manx Maori mapoche Marathi Marshallese Meadow Mari Meitei Mi'kmaq Middle English Middle French Middle Persian (Pahlavi) Min Nan Chinese Minangkabau Mingrelian Mirandese Mohawk Moksha Mon Mongòl Mono (USA) Morisyen Moroccan Arabic Muskogee (Creek) Naga (Tangshang) Nahuatl Nande Napolitan Nauruan Navajo Neerlandés nepalés Newari Ngeq Nigerian Fulfulde Niuean Nogai North Frisian North Levantine Arabic North Moluccan Malay Northern Haida Northern Kurdish (Kurmancî) Northern Sami Northern Zaza (Kirmanjki) Norwegian Bokmål Norwegian Nynorsk Novial Nuer Nuosu Nyungar O'odham Occitan Odia (Oriya) Oïgor Ojibwe Old Aramaic Old English Old French Old Frisian Old Norse Old Saxon Old Spanish Old Turkish Ongrés oquinavan Ordo Orizaba Nahuatl Ossèt Ottoman Turkish Ozbèc Palatine German Palauan Pali Pangasinan Papiamento Pashto Pennsylvania German Persan Picard Piedmontese Pipil Plains Cree Polonés Portugués Prussian ancian Pulaar Punjabi (Eastern) Punjabi (Western) Qashqai Quenya Quíchoa Rapa Nui Rendille Rifenc Rohingya Romanch Romanés Romaní Rus Rusyn Samoan Samogitian Sango Sanscrit Santali Saraiki Sardinian Saterland Frisian Scots Sèrbe Setswana Seychellois Creole Shilha Shona Shuswap Sicilian Silesian Sindarin Sindhi Sinhala Somali South Levantine Arabic Southern Altai Southern Haida Southern Kurdish Southern Sami Southern Sotho Southern Subanen Southern Zaza (Dimli) Sranan Tongo Standard Moroccan Tamazight Suedés Sumerian Sundanese Swabian Swahili swahili de Congo Swazi Swiss German Sylheti Syriac Tachawit Tagal Murut Tagalog Tahaggart Tamahaq Tahitian Tai Tajik Talossan Talysh Tamil Tatar Tatar de Crimèa Telugu Temuan Tetun Tibetan Tigre Tigrinya Tok Pisin Tokelauan Toki Pona Tonga (Zambezi) Tongan Tsonga Tumbuka Tupinambá Turc Turkmen tuvaluan Tuvinian Uab Meto Udmurt Umbundu Upper Sorbian Urcraïnian Urhobo Venetian Veps Vietnamian Volapük Võro Walloon Waray Wayuu Welsh West-Central Oromo Wolof wu Xhosa Xiang Chinese Yiddish Yoruba Yucatec Maya Zaza Zeelandic Zulu Unknown language
File description
Contains additional fields for each sentence (owner name, date created/modified).
Camps e estructura
Sentence id [tab] Lang [tab] Text [tab] Nom d'utilizaire [tab] Date added [tab] Date last modified

Original and Translated Sentences

Filename
sentences_base.tar.bz2
File description
Each sentence is listed as original or a translation of another. The "base" field can have the following values:
  • zero: The sentence is original, not a translation of another.
  • greater than zero: The id of the sentence from which it was translated.
  • \N: Unknown (rare).
Camps e estructura
Sentence id [tab] Base field

Sentences (CC0)

Filename

{{sentencesCC0 | filename}}

Totas las lengas
Only sentences in: Alemand Algerian Arabic Ancient Greek Ancient Hebrew Anglés Arabi Bengalí Berber Bielorus Cantonés Catalan Chèc Chinés classic Danés Ebrieu Espanhòl Esperanto Fenician Finés Francés Ho Ido Indi Interlingua Interlingue Italian Japonés Jewish Babylonian Aramaic Jewish Palestinian Aramaic Kabyle Karelian Klingon Konkani (Goan) Kven Finnish Láadan Ladino Latin Ligurian Mandarin Middle English Neerlandés Norwegian Bokmål Nyungar Odia (Oriya) Old Aramaic Old Frisian Old Norse Ongrés Polonés Portugués Rus Santali Standard Moroccan Tamazight Suedés Sylheti Tachawit Toki Pona Urcraïnian Volapük Welsh Yiddish Unknown language
File description
Contains all the sentences available under CC0.
Camps e estructura
Sentence id [tab] Lang [tab] Text [tab] Date last modified

Links

Filename
links.tar.bz2
File description
Contains the links between the sentences. 1 [tab] 77 means that sentence #77 is the translation of sentence #1. The reciprocal link is also present, so the file will also contain a line that says 77 [tab] 1.
Camps e estructura
Sentence id [tab] Translation id

Etiquetas

Filename
tags.tar.bz2
File description
Contains the list of tags associated with each sentence. 381279 [tab] proverb means that sentence #381279 has been assigned the "proverb" tag.
Camps e estructura
Sentence id [tab] Tag name

Lists

Filename
user_lists.tar.bz2
File description
Contains the list of sentence lists.
Camps e estructura
List id [tab] Nom d'utilizaire [tab] Date created [tab] Date last modified [tab] List name [tab] Editable by

Sentences in lists

Filename
sentences_in_lists.tar.bz2
File description
Indicates the sentences that are contained by any lists. 13 [tab] 381279 means that sentence #381279 is contained by the list that has an id of 13.
Camps e estructura
List id [tab] Sentence id

Japanese indices

Filename
jpn_indices.tar.bz2
File description
Contains the equivalent of the "B lines" in the Tanaka Corpus file distributed by Jim Breen. See this page for the format. Each entry is associated with a pair of Japanese/English sentences. Sentence id refers to the id of the Japanese sentence. Meaning id refers to the id of the English sentence.
Camps e estructura
Sentence id [tab] Meaning id [tab] Text

Frasas amb enregistrament àudio

Filename
sentences_with_audio.tar.bz2
File description
Contains the ids of the sentences, in all languages, for which audio is available. Other fields indicate who recorded the audio, its license and a URL to attribute the author. If the license field is empty, you may not reuse the audio outside the Tatoeba project.
Downloading audio
A single sentence can have one or more audio, each from a different voice. To download a particular audio, use its audio id to compute the download URL. For example, to download the audio with the id 1234, the URL is https://tatoeba.org/audio/download/1234.
Camps e estructura
Sentence id [tab] Audio id [tab] Nom d'utilizaire [tab] License [tab] Attribution URL

User skill level per language

Filename
user_languages.tar.bz2
File description
Indicates the self-reported skill levels of members in individual languages.
Camps e estructura
Lang [tab] Skill level [tab] Nom d'utilizaire [tab] Details

Users' sentence reviews

Filename
users_sentences.csv
File description
Contains sentences reviewed by users. The value of the review can be -1 (sentence not OK), 0 (undecided or unsure), or 1 (sentence OK). Warning: this data is still experimental.
Camps e estructura
Nom d'utilizaire [tab] Sentence id [tab] Review [tab] Date added [tab] Date last modified

Transcriptions

Filename

{{transcriptions | filename}}

Totas las lengas
Only sentences in: Cantonés Japonés Mandarin Ozbèc
File description
Contains all transcriptions in auxiliary or alternative scripts. A username associated with a transcription indicates the user who last reviewed and possibly modified it. A transcription without a username has not been marked as reviewed. The script name is defined according to the ISO 15924 standard.
Camps e estructura
Sentence id [tab] Lang [tab] Script name [tab] Nom d'utilizaire [tab] Transcription