clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search

Note

The data you will find here will NOT be useful unless you are coding a language tool or processing data.

If you simply want sentences that you can use to learn a language, check out the sentence lists. You can build your own, or view the ones that others have created. The lists can be downloaded and printed.

General information about the files

The files provided here are updated every Saturday at 6:30 a.m. (UTC).

Many of the Japanese and English sentences are from the Tanaka Corpus, which belongs to the public domain.

Creative commons

These files are released under CC BY 2.0 FR.

Creative Commons License CC-BY

A part of our sentences are also available under CC0 1.0.

Creative Commons License CC0

Licenses covering audio

The license covering an audio file is chosen by the contributor, and is indicated on the page that lists the audio files that he or she has contributed.

Questions?

If you have questions or requests, feel free to contact us. In general, we answer quickly.

Downloads

Sentences

Filename

{{sentences | filename}}

All languages
Only sentences in: Abkhaz Adyghe Afrihili Afrikaans Ainu Aklanon Albanian Algerian Arabic Amharic Ancient Greek Arabic Aragonese Armenian Assamese Assyrian Neo-Aramaic Asturian Awadhi Aymara Azerbaijani Balinese Bambara Banjar Bashkir Basque Bavarian Baybayanon Belarusian Bengali Berber Bhojpuri Bodo Bosnian Breton Brithenig Bulgarian Burmese Buryat Cantonese Catalan Cayuga Cebuano Central Bikol Central Dusun Central Huasteca Nahuatl Central Mnong Chagatai Chamorro Chavacano Chechen Cherokee Chinese (Jin) Chinese Pidgin English Chinyanja Choctaw Chukchi Chuvash Coastal Kadazan Cornish Corsican Crimean Tatar Croatian Cuyonon CycL Czech Danish Dhivehi Dungan Dutch Dutton World Speedwords Egyptian Arabic Emilian English Erromintxela Erzya Esperanto Estonian Evenki Ewe Extremaduran Faroese Fiji Hindi Fijian Finnish French Frisian Friulian Ga Gagauz Galician Gan Chinese Garhwali Georgian German Gheg Albanian Gilbertese Gothic Greek Greenlandic Gronings Guadeloupean Creole French Guarani Guerrero Nahuatl Gujarati Gulf Arabic Haitian Creole Hakka Chinese Hausa Hawaiian Hebrew Hiligaynon Hill Mari Hindi Hmong Daw (White) Hmong Njua (Green) Ho Hungarian Hunsrik Iban Icelandic Ido Igbo Ilocano Indonesian Ingrian Interlingua Interlingue Inuktitut Iraqi Arabic Irish Italian Jamaican Patois Japanese Javanese Jewish Babylonian Aramaic Juhuri (Judeo-Tat) K'iche' Kabyle Kalmyk Kamba Kannada Kapampangan Karachay-Balkar Karakalpak Karelian Kashmiri Kashubian Kazakh Kekchi (Q'eqchi') Keningau Murut Khakas Khasi Khmer Kinyarwanda Kirundi Klingon Kölsch Komi-Permyak Komi-Zyrian Konkani (Goan) Korean Kotava Kumyk Kurdish Kven Finnish Kyrgyz Láadan Ladin Ladino Lakota Lao Latgalian Latin Latvian Laz Ligurian Lingala Lingua Franca Nova Literary Chinese Lithuanian Livonian Lojban Lombard Louisiana Creole Low German (Low Saxon) Lower Sorbian Luganda Luxembourgish Macedonian Madurese Maithili Malagasy Malay Malay (Vernacular) Malayalam Maltese Mambae Manchu Mandarin Chinese Manx Maori Marathi Marshallese Meadow Mari Mi'kmaq Middle English Middle French Min Nan Chinese Minangkabau Mingrelian Mirandese Mohawk Moksha Mon Mongolian Morisyen Moroccan Arabic Naga (Tangshang) Nahuatl Nauruan Navajo Nepali Ngeq Nigerian Fulfulde Niuean Nogai North Levantine Arabic North Moluccan Malay Northern Sami Norwegian Bokmål Norwegian Nynorsk Novial Occitan Odia (Oriya) Ojibwe Okinawan Old Aramaic Old East Slavic Old English Old Norse Old Prussian Old Saxon Old Spanish Old Tupi Old Turkish Orizaba Nahuatl Ossetian Ottoman Turkish Palatine German Palauan Pangasinan Papiamento Pashto Pennsylvania German Persian Picard Piedmontese Pipil Polish Portuguese Pulaar Punjabi (Eastern) Punjabi (Western) Quechua Quenya Rapa Nui Romani Romanian Romansh Russian Rusyn Samoan Samogitian Sango Sanskrit Sardinian Scots Scottish Gaelic Serbian Setswana Seychellois Creole Shanghainese Shona Shuswap Sicilian Sindarin Sindhi Sinhala Slovak Slovenian Somali Southern Sami Southern Sotho Spanish Sumerian Sundanese Swabian Swahili Swazi Swedish Swiss German Tachawit Tagal Murut Tagalog Tahaggart Tamahaq Tahitian Tajik Talossan Talysh Tamil Tarifit Tatar Telugu Temuan Tetun Thai Tibetan Tigre Tigrinya Tok Pisin Tokelauan Toki Pona Tongan Tsonga Turkish Turkmen Tuvaluan Tuvinian Uab Meto Udmurt Ukrainian Umbundu Upper Sorbian Urdu Urhobo Uyghur Uzbek Venetian Veps Vietnamese Volapük Võro Walloon Waray Welsh Wolof Xhosa Xiang Chinese Yakut Yiddish Yoruba Zaza Zulu Unknown language
File description
Contains all the sentences in the selected language. Each sentence is associated with a unique id and an ISO 639-3 language code.
Fields and structure
Sentence id[tab]Lang[tab]Text

Detailed Sentences

Filename

{{sentencesDetailed | filename}}

All languages
Only sentences in: Abkhaz Adyghe Afrihili Afrikaans Ainu Aklanon Albanian Algerian Arabic Amharic Ancient Greek Arabic Aragonese Armenian Assamese Assyrian Neo-Aramaic Asturian Awadhi Aymara Azerbaijani Balinese Bambara Banjar Bashkir Basque Bavarian Baybayanon Belarusian Bengali Berber Bhojpuri Bodo Bosnian Breton Brithenig Bulgarian Burmese Buryat Cantonese Catalan Cayuga Cebuano Central Bikol Central Dusun Central Huasteca Nahuatl Central Mnong Chagatai Chamorro Chavacano Chechen Cherokee Chinese (Jin) Chinese Pidgin English Chinyanja Choctaw Chukchi Chuvash Coastal Kadazan Cornish Corsican Crimean Tatar Croatian Cuyonon CycL Czech Danish Dhivehi Dungan Dutch Dutton World Speedwords Egyptian Arabic Emilian English Erromintxela Erzya Esperanto Estonian Evenki Ewe Extremaduran Faroese Fiji Hindi Fijian Finnish French Frisian Friulian Ga Gagauz Galician Gan Chinese Garhwali Georgian German Gheg Albanian Gilbertese Gothic Greek Greenlandic Gronings Guadeloupean Creole French Guarani Guerrero Nahuatl Gujarati Gulf Arabic Haitian Creole Hakka Chinese Hausa Hawaiian Hebrew Hiligaynon Hill Mari Hindi Hmong Daw (White) Hmong Njua (Green) Ho Hungarian Hunsrik Iban Icelandic Ido Igbo Ilocano Indonesian Ingrian Interlingua Interlingue Inuktitut Iraqi Arabic Irish Italian Jamaican Patois Japanese Javanese Jewish Babylonian Aramaic Juhuri (Judeo-Tat) K'iche' Kabyle Kalmyk Kamba Kannada Kapampangan Karachay-Balkar Karakalpak Karelian Kashmiri Kashubian Kazakh Kekchi (Q'eqchi') Keningau Murut Khakas Khasi Khmer Kinyarwanda Kirundi Klingon Kölsch Komi-Permyak Komi-Zyrian Konkani (Goan) Korean Kotava Kumyk Kurdish Kven Finnish Kyrgyz Láadan Ladin Ladino Lakota Lao Latgalian Latin Latvian Laz Ligurian Lingala Lingua Franca Nova Literary Chinese Lithuanian Livonian Lojban Lombard Louisiana Creole Low German (Low Saxon) Lower Sorbian Luganda Luxembourgish Macedonian Madurese Maithili Malagasy Malay Malay (Vernacular) Malayalam Maltese Mambae Manchu Mandarin Chinese Manx Maori Marathi Marshallese Meadow Mari Mi'kmaq Middle English Middle French Min Nan Chinese Minangkabau Mingrelian Mirandese Mohawk Moksha Mon Mongolian Morisyen Moroccan Arabic Naga (Tangshang) Nahuatl Nauruan Navajo Nepali Ngeq Nigerian Fulfulde Niuean Nogai North Levantine Arabic North Moluccan Malay Northern Sami Norwegian Bokmål Norwegian Nynorsk Novial Occitan Odia (Oriya) Ojibwe Okinawan Old Aramaic Old East Slavic Old English Old Norse Old Prussian Old Saxon Old Spanish Old Tupi Old Turkish Orizaba Nahuatl Ossetian Ottoman Turkish Palatine German Palauan Pangasinan Papiamento Pashto Pennsylvania German Persian Picard Piedmontese Pipil Polish Portuguese Pulaar Punjabi (Eastern) Punjabi (Western) Quechua Quenya Rapa Nui Romani Romanian Romansh Russian Rusyn Samoan Samogitian Sango Sanskrit Sardinian Scots Scottish Gaelic Serbian Setswana Seychellois Creole Shanghainese Shona Shuswap Sicilian Sindarin Sindhi Sinhala Slovak Slovenian Somali Southern Sami Southern Sotho Spanish Sumerian Sundanese Swabian Swahili Swazi Swedish Swiss German Tachawit Tagal Murut Tagalog Tahaggart Tamahaq Tahitian Tajik Talossan Talysh Tamil Tarifit Tatar Telugu Temuan Tetun Thai Tibetan Tigre Tigrinya Tok Pisin Tokelauan Toki Pona Tongan Tsonga Turkish Turkmen Tuvaluan Tuvinian Uab Meto Udmurt Ukrainian Umbundu Upper Sorbian Urdu Urhobo Uyghur Uzbek Venetian Veps Vietnamese Volapük Võro Walloon Waray Welsh Wolof Xhosa Xiang Chinese Yakut Yiddish Yoruba Zaza Zulu Unknown language
File description
Contains additional fields for each sentence (owner name, date created/modified).
Fields and structure
Sentence id[tab]Lang[tab]Text[tab]Username[tab]Date added[tab]Date last modified

Sentences (CC0)

Filename

{{sentencesCC0 | filename}}

All languages
Only sentences in: Arabic Belarusian Berber Cantonese Catalan Czech Danish Dutch English Esperanto Finnish French German Hebrew Hungarian Icelandic Ingrian Interlingua Italian Japanese Kabyle Karelian Klingon Kven Finnish Latin Ligurian Literary Chinese Mandarin Chinese Norwegian Bokmål Old Aramaic Polish Portuguese Quenya Russian Spanish Tachawit Turkish Ukrainian Volapük Yiddish
File description
Contains all the sentences available under CC0.
Fields and structure
Sentence id[tab]Lang[tab]Text[tab]Date last modified

Links

Filename
links.tar.bz2
File description
Contains the links between the sentences. 1[tab]77 means that sentence #77 is the translation of sentence #1. The reciprocal link is also present, so the file will also contain a line that says 77[tab]1.
Fields and structure
Sentence id[tab]Translation id

Tags

Filename
tags.tar.bz2
File description
Contains the list of tags associated with each sentence. 381279[tab]proverb means that sentence #381279 has been assigned the "proverb" tag.
Fields and structure
Sentence id[tab]Tag name

Lists

Filename
user_lists.tar.bz2
File description
Contains the list of sentence lists.
Fields and structure
List id[tab]Username[tab]Date created[tab]Date last modified[tab]List name[tab]Editable by

Sentences in lists

Filename
sentences_in_lists.tar.bz2
File description
Indicates the sentences that are contained by any lists. 13[tab]381279 means that sentence #381279 is contained by the list that has an id of 13.
Fields and structure
List id[tab]Sentence id

Japanese indices

Filename
jpn_indices.tar.bz2
File description
Contains the equivalent of the "B lines" in the Tanaka Corpus file distributed by Jim Breen. See this page for the format. Each entry is associated with a pair of Japanese/English sentences. Sentence id refers to the id of the Japanese sentence. Meaning id refers to the id of the English sentence.
Fields and structure
Sentence id[tab]Meaning id[tab]Text

Sentences with audio

Filename
sentences_with_audio.tar.bz2
File description
Contains the ids of the sentences, in all languages, for which audio is available. Other fields indicate who recorded the audio, its license and a URL to attribute the author. If the license field is empty, you may not reuse the audio outside the Tatoeba project.
Fields and structure
Sentence id[tab]Username[tab]License[tab]Attribution URL

User skill level per language

Filename
user_languages.tar.bz2
File description
Indicates the self-reported skill levels of members in individual languages.
Fields and structure
Lang[tab]Skill level[tab]Username[tab]Details

Users' sentence reviews

Filename
users_sentences.csv
File description
Contains sentences reviewed by users. The value of the review can be -1 (sentence not OK), 0 (undecided or unsure), or 1 (sentence OK). Warning: this data is still experimental.
Fields and structure
Username[tab]Lang[tab]Sentence id[tab]Review[tab]Date added[tab]Date last modified