Note
The data you will find here will NOT be useful unless you are coding a language tool or processing data.
If you simply want sentences that you can use to learn a language, check out the sentence lists. You can build your own, or view the ones that others have created. The lists can be downloaded and printed.
General information about the files
Many of the Japanese and English sentences are from the Tanaka Corpus, which belongs to the public domain.
Creative commons
These files are released under CC BY 2.0 FR.
A part of our sentences are also available under CC0 1.0.
Licenses covering audio
The license covering an audio file is chosen by the contributor, and is indicated on the page that lists the audio files that he or she has contributed.
Questions?
If you have questions or requests, feel free to contact us. In general, we answer quickly.
Downloads
Use this tool to generate and download customized exports on demand.
Download all sentences in language A that are translated into language B, along with the translations.
Sentences
- Filename
-
All languages Only sentences in: Abkhaz Adyghe Afrihili Afrikaans Ainu Aklanon Albanian Algerian Arabic Amharic Ancient Greek Ancient Hebrew Arabic Aragonese Assamese Assyrian Neo-Aramaic Asturian Avar Awadhi Aymara Azerbaijani Balinese Baluchi Bambara Banjar Bashkir Basque Bavarian Baybayanon Belarusian Bengali Berber Berom Bhojpuri Bislama Bodo Bosnian Breton Brithenig Bulgarian Burmese Buryat Cantonese Catalan Cayuga Cebuano Central Bikol Central Dusun Central Huasteca Nahuatl Central Kanuri Central Kurdish (Soranî) Central Mnong Chagatai Chamorro Chavacano Chechen Cherokee Chinese Pidgin English Chinook Jargon Chinyanja Choctaw Chukchi Chuvash Coastal Kadazan Congo Swahili Cornish Corsican Crimean Tatar Croatian Cuyonon CycL Czech Danish Dhivehi Drents Dungan Dutch Dutton World Speedwords Eastern Armenian Egyptian Arabic Emilian English Erromintxela Erzya Esperanto Estonian Evenki Ewe Extremaduran Faroese Fiji Hindi Fijian Finnish French Frisian Friulian Ga Gagauz Galician Gan Chinese Garhwali Georgian German Gheg Albanian Gilbertese Gothic Greek Greenlandic Gronings Guadeloupean Creole French Guarani Guerrero Nahuatl Gujarati Gulf Arabic Gun Haitian Creole Hakka Chinese Hausa Hawaiian Hebrew Hiligaynon Hill Mari Hindi Hitchiti Hmong Daw (White) Hmong Njua (Green) Ho Hungarian Hunsrik Iban Icelandic Ido Igbo Ilocano Indonesian Ingrian Interglossa Interlingua Interlingue Inuktitut Iraqi Arabic Irish Isan Italian Jamaican Patois Japanese Javanese Jewish Babylonian Aramaic Jewish Palestinian Aramaic Jin Chinese Juhuri (Judeo-Tat) K'iche' Kabardian Kabyle Kalmyk Kamba Kannada Kapampangan Karachay-Balkar Karakalpak Karakhanid Karelian Kashmiri Kashubian Kazakh Kekchi (Q'eqchi') Kelantan-Pattani Malay Keningau Murut Khakas Khalaj Khasi Khmer Kinyarwanda Kirundi Klingon Kölsch Komi-Permyak Komi-Zyrian Konkani (Goan) Korean Kotava Kumyk Kven Finnish Kyrgyz Láadan Ladin Ladino Lakota Lao Latgalian Latin Latvian Laz Libyan Arabic Ligurian Limburgish Lingala Lingua Franca Nova Literary Chinese Lithuanian Livonian Lojban Lombard Louisiana Creole Low German (Low Saxon) Lower Sorbian Luganda Lushootseed Luxembourgish Macedonian Madurese Mahasu Pahari Maithili Malagasy Malay Malay (Vernacular) Malayalam Maltese Mambae Manchu Mandarin Chinese Manx Maori Mapuche Marathi Marshallese Meadow Mari Meitei Mi'kmaq Middle English Middle French Middle Persian (Pahlavi) Min Nan Chinese Minangkabau Mingrelian Mirandese Mohawk Moksha Mon Mongolian Mono (USA) Morisyen Moroccan Arabic Muskogee (Creek) Naga (Tangshang) Nahuatl Nande Nauruan Navajo Neapolitan Nepali Newari Ngeq Nigerian Fulfulde Niuean Nogai North Frisian North Levantine Arabic North Moluccan Malay Northern Haida Northern Kurdish (Kurmancî) Northern Sami Northern Zaza (Kirmanjki) Norwegian Bokmål Norwegian Nynorsk Novial Nuer Nuosu Nyungar O'odham Occitan Odia (Oriya) Ojibwe Okinawan Old Aramaic Old East Slavic Old English Old French Old Frisian Old Norse Old Prussian Old Saxon Old Spanish Old Tupi Old Turkish Orizaba Nahuatl Ossetian Ottoman Turkish Palatine German Palauan Pali Pangasinan Papiamento Pashto Pennsylvania German Persian Phoenician Picard Piedmontese Pipil Plains Cree Polish Portuguese Pulaar Punjabi (Eastern) Punjabi (Western) Qashqai Quechua Quenya Rapa Nui Rendille Rohingya Romani Romanian Romansh Russian Rusyn Samoan Samogitian Sango Sanskrit Santali Saraiki Sardinian Saterland Frisian Scots Scottish Gaelic Serbian Setswana Seychellois Creole Shanghainese Shona Shuswap Sicilian Silesian Sindarin Sindhi Sinhala Slovak Slovenian Somali South Levantine Arabic Southern Altai Southern Haida Southern Kurdish Southern Sami Southern Sotho Southern Subanen Southern Zaza (Dimli) Spanish Sranan Tongo Standard Moroccan Tamazight Sumerian Sundanese Swabian Swahili Swazi Swedish Swiss German Sylheti Syriac Tachawit Tagal Murut Tagalog Tahaggart Tamahaq Tahitian Tajik Talossan Talysh Tamil Tarifit Tashelhit Tatar Telugu Temuan Tetun Thai Tibetan Tigre Tigrinya Tok Pisin Tokelauan Toki Pona Tonga (Zambezi) Tongan Tsonga Tumbuka Turkish Turkmen Tuvaluan Tuvinian Uab Meto Udmurt Ukrainian Umbundu Upper Sorbian Urdu Urhobo Uyghur Uzbek Venetian Veps Vietnamese Volapük Võro Walloon Waray Wayuu Welsh Western Armenian Wolof Xhosa Xiang Chinese Yakut Yiddish Yoruba Yucatec Maya Zaza Zeelandic Zulu Unknown language - File description
- Contains all the sentences in the selected language. Each sentence is associated with a unique id and an ISO 639-3 language code.
- Fields and structure
- Sentence id [tab] Lang [tab] Text
Detailed Sentences
- Filename
-
{{sentencesDetailed | filename}}
All languages Only sentences in: Abkhaz Adyghe Afrihili Afrikaans Ainu Aklanon Albanian Algerian Arabic Amharic Ancient Greek Ancient Hebrew Arabic Aragonese Assamese Assyrian Neo-Aramaic Asturian Avar Awadhi Aymara Azerbaijani Balinese Baluchi Bambara Banjar Bashkir Basque Bavarian Baybayanon Belarusian Bengali Berber Berom Bhojpuri Bislama Bodo Bosnian Breton Brithenig Bulgarian Burmese Buryat Cantonese Catalan Cayuga Cebuano Central Bikol Central Dusun Central Huasteca Nahuatl Central Kanuri Central Kurdish (Soranî) Central Mnong Chagatai Chamorro Chavacano Chechen Cherokee Chinese Pidgin English Chinook Jargon Chinyanja Choctaw Chukchi Chuvash Coastal Kadazan Congo Swahili Cornish Corsican Crimean Tatar Croatian Cuyonon CycL Czech Danish Dhivehi Drents Dungan Dutch Dutton World Speedwords Eastern Armenian Egyptian Arabic Emilian English Erromintxela Erzya Esperanto Estonian Evenki Ewe Extremaduran Faroese Fiji Hindi Fijian Finnish French Frisian Friulian Ga Gagauz Galician Gan Chinese Garhwali Georgian German Gheg Albanian Gilbertese Gothic Greek Greenlandic Gronings Guadeloupean Creole French Guarani Guerrero Nahuatl Gujarati Gulf Arabic Gun Haitian Creole Hakka Chinese Hausa Hawaiian Hebrew Hiligaynon Hill Mari Hindi Hitchiti Hmong Daw (White) Hmong Njua (Green) Ho Hungarian Hunsrik Iban Icelandic Ido Igbo Ilocano Indonesian Ingrian Interglossa Interlingua Interlingue Inuktitut Iraqi Arabic Irish Isan Italian Jamaican Patois Japanese Javanese Jewish Babylonian Aramaic Jewish Palestinian Aramaic Jin Chinese Juhuri (Judeo-Tat) K'iche' Kabardian Kabyle Kalmyk Kamba Kannada Kapampangan Karachay-Balkar Karakalpak Karakhanid Karelian Kashmiri Kashubian Kazakh Kekchi (Q'eqchi') Kelantan-Pattani Malay Keningau Murut Khakas Khalaj Khasi Khmer Kinyarwanda Kirundi Klingon Kölsch Komi-Permyak Komi-Zyrian Konkani (Goan) Korean Kotava Kumyk Kven Finnish Kyrgyz Láadan Ladin Ladino Lakota Lao Latgalian Latin Latvian Laz Libyan Arabic Ligurian Limburgish Lingala Lingua Franca Nova Literary Chinese Lithuanian Livonian Lojban Lombard Louisiana Creole Low German (Low Saxon) Lower Sorbian Luganda Lushootseed Luxembourgish Macedonian Madurese Mahasu Pahari Maithili Malagasy Malay Malay (Vernacular) Malayalam Maltese Mambae Manchu Mandarin Chinese Manx Maori Mapuche Marathi Marshallese Meadow Mari Meitei Mi'kmaq Middle English Middle French Middle Persian (Pahlavi) Min Nan Chinese Minangkabau Mingrelian Mirandese Mohawk Moksha Mon Mongolian Mono (USA) Morisyen Moroccan Arabic Muskogee (Creek) Naga (Tangshang) Nahuatl Nande Nauruan Navajo Neapolitan Nepali Newari Ngeq Nigerian Fulfulde Niuean Nogai North Frisian North Levantine Arabic North Moluccan Malay Northern Haida Northern Kurdish (Kurmancî) Northern Sami Northern Zaza (Kirmanjki) Norwegian Bokmål Norwegian Nynorsk Novial Nuer Nuosu Nyungar O'odham Occitan Odia (Oriya) Ojibwe Okinawan Old Aramaic Old East Slavic Old English Old French Old Frisian Old Norse Old Prussian Old Saxon Old Spanish Old Tupi Old Turkish Orizaba Nahuatl Ossetian Ottoman Turkish Palatine German Palauan Pali Pangasinan Papiamento Pashto Pennsylvania German Persian Phoenician Picard Piedmontese Pipil Plains Cree Polish Portuguese Pulaar Punjabi (Eastern) Punjabi (Western) Qashqai Quechua Quenya Rapa Nui Rendille Rohingya Romani Romanian Romansh Russian Rusyn Samoan Samogitian Sango Sanskrit Santali Saraiki Sardinian Saterland Frisian Scots Scottish Gaelic Serbian Setswana Seychellois Creole Shanghainese Shona Shuswap Sicilian Silesian Sindarin Sindhi Sinhala Slovak Slovenian Somali South Levantine Arabic Southern Altai Southern Haida Southern Kurdish Southern Sami Southern Sotho Southern Subanen Southern Zaza (Dimli) Spanish Sranan Tongo Standard Moroccan Tamazight Sumerian Sundanese Swabian Swahili Swazi Swedish Swiss German Sylheti Syriac Tachawit Tagal Murut Tagalog Tahaggart Tamahaq Tahitian Tajik Talossan Talysh Tamil Tarifit Tashelhit Tatar Telugu Temuan Tetun Thai Tibetan Tigre Tigrinya Tok Pisin Tokelauan Toki Pona Tonga (Zambezi) Tongan Tsonga Tumbuka Turkish Turkmen Tuvaluan Tuvinian Uab Meto Udmurt Ukrainian Umbundu Upper Sorbian Urdu Urhobo Uyghur Uzbek Venetian Veps Vietnamese Volapük Võro Walloon Waray Wayuu Welsh Western Armenian Wolof Xhosa Xiang Chinese Yakut Yiddish Yoruba Yucatec Maya Zaza Zeelandic Zulu Unknown language - File description
- Contains additional fields for each sentence (owner name, date created/modified).
- Fields and structure
- Sentence id [tab] Lang [tab] Text [tab] Username [tab] Date added [tab] Date last modified
Original and Translated Sentences
- Filename
- sentences_base.tar.bz2
- File description
-
Each sentence is listed as original or a translation of another. The "base" field can have the following values:
- zero: The sentence is original, not a translation of another.
- greater than zero: The id of the sentence from which it was translated.
- \N: Unknown (rare).
- Fields and structure
- Sentence id [tab] Base field
Sentences (CC0)
- Filename
-
All languages Only sentences in: Algerian Arabic Ancient Greek Ancient Hebrew Arabic Belarusian Bengali Berber Cantonese Catalan Czech Danish Dutch English Esperanto Finnish French German Hebrew Hindi Ho Hungarian Ido Interlingua Interlingue Italian Japanese Jewish Babylonian Aramaic Jewish Palestinian Aramaic Kabyle Karelian Klingon Konkani (Goan) Kven Finnish Láadan Ladino Latin Ligurian Literary Chinese Mandarin Chinese Middle English Norwegian Bokmål Nyungar Odia (Oriya) Old Aramaic Old Frisian Old Norse Phoenician Polish Portuguese Russian Santali Spanish Standard Moroccan Tamazight Swedish Sylheti Tachawit Toki Pona Ukrainian Volapük Welsh Yiddish Unknown language - File description
- Contains all the sentences available under CC0.
- Fields and structure
- Sentence id [tab] Lang [tab] Text [tab] Date last modified
Links
- Filename
- links.tar.bz2
- File description
- Contains the links between the sentences. 1 [tab] 77 means that sentence #77 is the translation of sentence #1. The reciprocal link is also present, so the file will also contain a line that says 77 [tab] 1.
- Fields and structure
- Sentence id [tab] Translation id
Tags
- Filename
- tags.tar.bz2
- File description
- Contains the list of tags associated with each sentence. 381279 [tab] proverb means that sentence #381279 has been assigned the "proverb" tag.
- Fields and structure
- Sentence id [tab] Tag name
Lists
- Filename
- user_lists.tar.bz2
- File description
- Contains the list of sentence lists.
- Fields and structure
- List id [tab] Username [tab] Date created [tab] Date last modified [tab] List name [tab] Editable by
Sentences in lists
- Filename
- sentences_in_lists.tar.bz2
- File description
- Indicates the sentences that are contained by any lists. 13 [tab] 381279 means that sentence #381279 is contained by the list that has an id of 13.
- Fields and structure
- List id [tab] Sentence id
Japanese indices
- Filename
- jpn_indices.tar.bz2
- File description
- Contains the equivalent of the "B lines" in the Tanaka Corpus file distributed by Jim Breen. See this page for the format. Each entry is associated with a pair of Japanese/English sentences. Sentence id refers to the id of the Japanese sentence. Meaning id refers to the id of the English sentence.
- Fields and structure
- Sentence id [tab] Meaning id [tab] Text
Sentences with audio
- Filename
- sentences_with_audio.tar.bz2
- File description
- Contains the ids of the sentences, in all languages, for which audio is available. Other fields indicate who recorded the audio, its license and a URL to attribute the author. If the license field is empty, you may not reuse the audio outside the Tatoeba project.
- Downloading audio
- A single sentence can have one or more audio, each from a different voice. To download a particular audio, use its audio id to compute the download URL. For example, to download the audio with the id 1234, the URL is https://tatoeba.org/audio/download/1234.
- Fields and structure
- Sentence id [tab] Audio id [tab] Username [tab] License [tab] Attribution URL
User skill level per language
- Filename
- user_languages.tar.bz2
- File description
- Indicates the self-reported skill levels of members in individual languages.
- Fields and structure
- Lang [tab] Skill level [tab] Username [tab] Details
Users' sentence reviews
- Filename
- users_sentences.csv
- File description
- Contains sentences reviewed by users. The value of the review can be -1 (sentence not OK), 0 (undecided or unsure), or 1 (sentence OK). Warning: this data is still experimental.
- Fields and structure
- Username [tab] Sentence id [tab] Review [tab] Date added [tab] Date last modified
Transcriptions
- Filename
-
All languages Only sentences in: Cantonese Japanese Mandarin Chinese Uzbek - File description
- Contains all transcriptions in auxiliary or alternative scripts. A username associated with a transcription indicates the user who last reviewed and possibly modified it. A transcription without a username has not been marked as reviewed. The script name is defined according to the ISO 15924 standard.
- Fields and structure
- Sentence id [tab] Lang [tab] Script name [tab] Username [tab] Transcription