Tanda
The data you will find here will NOT be useful unless you are coding a language tool or processing data.
If you simply want sentences that you can use to learn a language, check out the sentence lists. You can build your own, or view the ones that others have created. The lists can be downloaded and printed.
General information about the files
Many of the Japanese and English sentences are from the Tanaka Corpus, which belongs to the public domain.
Creative commons
These files are released under CC BY 2.0 FR.
A part of our sentences are also available under CC0 1.0.
Licenses covering audio
The license covering an audio file is chosen by the contributor, and is indicated on the page that lists the audio files that he or she has contributed.
Tanong?
If you have questions or requests, feel free to contact us. In general, we answer quickly.
Downloads
Use this tool to generate and download customized exports on demand.
Download all sentences in language A that are translated into language B, along with the translations.
Mga Pangungusap
- Filename
-
Lahat ng wika Mga pangungusap lang na nasa: Adyghe Afrihili Afrikaans Ainu Albanes Aleman Algerian Arabic Amharic Ancient Hebrew Arabe Arabeng Gulf Aragonese Assamese Assyrian Neo-Aramaic Asturian Awadhi Aymara Azerbaijani Balinese Bambara Bashkir Basko Bavaro Baybayanon Belarusian Bengali Berber Berom Bhojpuri Bodo Bosnian Breton Brithenig Bulgaro Burmese Cantonese Catalan Cebuano Central Dusun Central Huasteca Nahuatl Central Kanuri Central Kurdish (Soranî) Central Mnong Chagatai Chamorro Chechen Cherokee Chinese Pidgin English Chinook Jargon Chinyanja Choctaw Chukchi Chuvash Coastal Kadazan Congo Swahili Cornish Corsican Croatian CycL Danish Drents Dutton World Speedwords Eastern Armenian Ebreo Egyptian Arabic Emilian Erromintxela Erzya Esperanto Estonyo Ewe Faroese Fijian Finlandes Frisian Friulian Ga Gagauz Galisyano Gheg Albanian Gilbertese Gothic Griyego Gronings Guadeloupean Creole French Guarani Guerrero Nahuatl Gujarati Haitian Hakka Chinese Hanggaryan Hapon Hausa Hawaiian Hilagang Sami Hiligaynon Hindi Hitchiti Hmong Daw (White) Hmong Njua (Green) Hunsrik Iban Icelandic Ido Igbo Ingles Ingrian Interglossa Interlingua Interlingue Inuktitut Iraqi Arabic Irish Islovak Italyano Jamaican Patois Javanese Jewish Babylonian Aramaic Jewish Palestinian Aramaic Jin Chinese Juhuri (Judeo-Tat) K'iche' Kabyle Kalmyk Kamba Kannada Karachay-Balkar Karakhanid Karelian Kashmiri Kastila Katimugang Sami Katimugang Sotho Katimugang Subanen Kazakh Kekchi (Q'eqchi') Kelantan-Pattani Malay Khalaj Khasi Khmer Kinyarwanda Kirghiz Klingon Kölsch Komi-Permyak Komi-Zyrian Konkani (Goan) Koreano Kotava Kumyk Kven Finnish Láadan Ladino Lakota Lao Latin Latvian Libyan Arabic Lingala Lingua Franca Nova Literary Chinese Lithuanian Livonian Lojban Louisiana Creole Low German (Low Saxon) Lower Sorbian Lushootseed Luxembourgish Macedonian Madurese Mahasu Pahari Maithili Malagasy Malay Malay (Vernacular) Malayalam Maltese Mambae Manchu Manx Maori Mapuche Marathi Marshallese Meadow Mari Meitei Micmac Middle English Middle French Middle Persian (Pahlavi) Minangkabau Mirandese Mohawk Moksha Mongolian Mono (USA) Morisyen Moroccan Arabic Muskogee (Creek) Naga (Tangshang) Nahuatl Nande Navajo Nepali Newari Ngeq Nigerian Fulfulde Niuean Nogai North Frisian North Levantine Arabic North Moluccan Malay Northern Haida Northern Kurdish (Kurmancî) Northern Zaza (Kirmanjki) Noruwegong Bokmål Norwegian Nynorsk Novial Nyungar Occitan Odia (Oriya) Okinawan Olandes Old Aramaic Old East Slavic Old English Old French Old Frisian Old Norse Old Prussian Old Saxon Old Spanish Old Tupi Old Turkish Orizaba Nahuatl Ottoman Turkish Palatine German Palauan Pangasinan Papiamento Pashto Pennsylvania German Persyano Phoenician Picard Pidyi Hindi Plains Cree Polish Portuges Pranses Punjabi (Eastern) Punjabi (Western) Quechua Quenya Rendille Romansh Rumano Ruso Rusyn Samoan Sango Sanskrit Saraiki Sardinian Saterland Frisian Scots Scottish Gaelic Serbyan Setswana Seychellois Creole Shanghainese Shona Sicilian Sindarin Sindhi Sinhala Slovenian Somali South Levantine Arabic Southern Altai Southern Haida Southern Kurdish Southern Zaza (Dimli) Sranan Tongo Standard Moroccan Tamazight Sundanese Suweko Swabian Swahili Swiss German Tachawit Tagal Murut Tagalog Tahaggart Tamahaq Tahitian Tajik Talossan Tamil Tatar Telugu Tetun Thai Tibetan Tigre Tigrinya Tok Pisin Tokelauan Toki Pona Tonga (Zambezi) Tongan Tseko Tsino Mandarin Tsino Min Nan Tsonga Turkmen Turko Tuvinian Udmurt Ukranyano Umbundu Upper Sorbian Urdu Urhobo Uyghur Uzbek Veneto Volapük Walloon Waray Welsh Western Armenian Wikang Abkasiyo Wikang Aklanon Wikang Avar Wikang Balochi Wikang Banjar Wikang Bislama Wikang Bundok Mari Wikang Buryat Wikang Cayuga Wikang Crimean Tatar Wikang Cuyonon Wikang Dhivehi Wikang Dungan Wikang Estremenyo Wikang Evenki Wikang Gan Wikang Garhwali Wikang Georgia Wikang Gitnang Bikol Wikang Groenlandes Wikang Ho Wikang Iloko Wikang Indonesia Wikang Isan Wikang Kabardian Wikang Kapampangan Wikang Karakalpak Wikang Kashubian Wikang Keningau Murut Wikang Khakas Wikang Kirundi Wikang Ladin Wikang Latgalyano Wikang Laz Wikang Ligur Wikang Limburges Wikang Lombardo Wikang Luganda Wikang Mingrelian Wikang Mon Wikang Napolitano Wikang Nauruan Wikang Nuer Wikang Nuosu Wikang O'odham Wikang Ojibwe Wikang Osetyo Wikang Pali Wikang Pipil Wikang Pulaar Wikang Pyemontes Wikang Qashqai Wikang Rapa Nui Wikang Riffian Wikang Rohingya Wikang Romani Wikang Samogityano Wikang Santali Wikang Shilha Wikang Shuswap Wikang Silesyo Wikang Sinaunang Griyego Wikang Sumeryo Wikang Swati Wikang Sylheti Wikang Syriac Wikang Talysh Wikang Temuan Wikang Tumbuka Wikang Tuvalu Wikang Uab Meto Wikang Veps Wikang Vietnam Wikang Võro Wikang Wayuu Wikang Yakut Wikang Zamboangueño Wolof Xhosa Xiang Chinese Yidish Yoruba Yukatekong Maya Zaza Zeelandic Zulu Wikang di-tiyak - File description
- Contains all the sentences in the selected language. Each sentence is associated with a unique id and an ISO 639-3 language code.
- Fields and structure
- id ng pangungusap [tab] Wika [tab] Text
Mga Detalyadong Pangungusap
- Filename
-
{{sentencesDetailed | filename}}
Lahat ng wika Mga pangungusap lang na nasa: Adyghe Afrihili Afrikaans Ainu Albanes Aleman Algerian Arabic Amharic Ancient Hebrew Arabe Arabeng Gulf Aragonese Assamese Assyrian Neo-Aramaic Asturian Awadhi Aymara Azerbaijani Balinese Bambara Bashkir Basko Bavaro Baybayanon Belarusian Bengali Berber Berom Bhojpuri Bodo Bosnian Breton Brithenig Bulgaro Burmese Cantonese Catalan Cebuano Central Dusun Central Huasteca Nahuatl Central Kanuri Central Kurdish (Soranî) Central Mnong Chagatai Chamorro Chechen Cherokee Chinese Pidgin English Chinook Jargon Chinyanja Choctaw Chukchi Chuvash Coastal Kadazan Congo Swahili Cornish Corsican Croatian CycL Danish Drents Dutton World Speedwords Eastern Armenian Ebreo Egyptian Arabic Emilian Erromintxela Erzya Esperanto Estonyo Ewe Faroese Fijian Finlandes Frisian Friulian Ga Gagauz Galisyano Gheg Albanian Gilbertese Gothic Griyego Gronings Guadeloupean Creole French Guarani Guerrero Nahuatl Gujarati Haitian Hakka Chinese Hanggaryan Hapon Hausa Hawaiian Hilagang Sami Hiligaynon Hindi Hitchiti Hmong Daw (White) Hmong Njua (Green) Hunsrik Iban Icelandic Ido Igbo Ingles Ingrian Interglossa Interlingua Interlingue Inuktitut Iraqi Arabic Irish Islovak Italyano Jamaican Patois Javanese Jewish Babylonian Aramaic Jewish Palestinian Aramaic Jin Chinese Juhuri (Judeo-Tat) K'iche' Kabyle Kalmyk Kamba Kannada Karachay-Balkar Karakhanid Karelian Kashmiri Kastila Katimugang Sami Katimugang Sotho Katimugang Subanen Kazakh Kekchi (Q'eqchi') Kelantan-Pattani Malay Khalaj Khasi Khmer Kinyarwanda Kirghiz Klingon Kölsch Komi-Permyak Komi-Zyrian Konkani (Goan) Koreano Kotava Kumyk Kven Finnish Láadan Ladino Lakota Lao Latin Latvian Libyan Arabic Lingala Lingua Franca Nova Literary Chinese Lithuanian Livonian Lojban Louisiana Creole Low German (Low Saxon) Lower Sorbian Lushootseed Luxembourgish Macedonian Madurese Mahasu Pahari Maithili Malagasy Malay Malay (Vernacular) Malayalam Maltese Mambae Manchu Manx Maori Mapuche Marathi Marshallese Meadow Mari Meitei Micmac Middle English Middle French Middle Persian (Pahlavi) Minangkabau Mirandese Mohawk Moksha Mongolian Mono (USA) Morisyen Moroccan Arabic Muskogee (Creek) Naga (Tangshang) Nahuatl Nande Navajo Nepali Newari Ngeq Nigerian Fulfulde Niuean Nogai North Frisian North Levantine Arabic North Moluccan Malay Northern Haida Northern Kurdish (Kurmancî) Northern Zaza (Kirmanjki) Noruwegong Bokmål Norwegian Nynorsk Novial Nyungar Occitan Odia (Oriya) Okinawan Olandes Old Aramaic Old East Slavic Old English Old French Old Frisian Old Norse Old Prussian Old Saxon Old Spanish Old Tupi Old Turkish Orizaba Nahuatl Ottoman Turkish Palatine German Palauan Pangasinan Papiamento Pashto Pennsylvania German Persyano Phoenician Picard Pidyi Hindi Plains Cree Polish Portuges Pranses Punjabi (Eastern) Punjabi (Western) Quechua Quenya Rendille Romansh Rumano Ruso Rusyn Samoan Sango Sanskrit Saraiki Sardinian Saterland Frisian Scots Scottish Gaelic Serbyan Setswana Seychellois Creole Shanghainese Shona Sicilian Sindarin Sindhi Sinhala Slovenian Somali South Levantine Arabic Southern Altai Southern Haida Southern Kurdish Southern Zaza (Dimli) Sranan Tongo Standard Moroccan Tamazight Sundanese Suweko Swabian Swahili Swiss German Tachawit Tagal Murut Tagalog Tahaggart Tamahaq Tahitian Tajik Talossan Tamil Tatar Telugu Tetun Thai Tibetan Tigre Tigrinya Tok Pisin Tokelauan Toki Pona Tonga (Zambezi) Tongan Tseko Tsino Mandarin Tsino Min Nan Tsonga Turkmen Turko Tuvinian Udmurt Ukranyano Umbundu Upper Sorbian Urdu Urhobo Uyghur Uzbek Veneto Volapük Walloon Waray Welsh Western Armenian Wikang Abkasiyo Wikang Aklanon Wikang Avar Wikang Balochi Wikang Banjar Wikang Bislama Wikang Bundok Mari Wikang Buryat Wikang Cayuga Wikang Crimean Tatar Wikang Cuyonon Wikang Dhivehi Wikang Dungan Wikang Estremenyo Wikang Evenki Wikang Gan Wikang Garhwali Wikang Georgia Wikang Gitnang Bikol Wikang Groenlandes Wikang Ho Wikang Iloko Wikang Indonesia Wikang Isan Wikang Kabardian Wikang Kapampangan Wikang Karakalpak Wikang Kashubian Wikang Keningau Murut Wikang Khakas Wikang Kirundi Wikang Ladin Wikang Latgalyano Wikang Laz Wikang Ligur Wikang Limburges Wikang Lombardo Wikang Luganda Wikang Mingrelian Wikang Mon Wikang Napolitano Wikang Nauruan Wikang Nuer Wikang Nuosu Wikang O'odham Wikang Ojibwe Wikang Osetyo Wikang Pali Wikang Pipil Wikang Pulaar Wikang Pyemontes Wikang Qashqai Wikang Rapa Nui Wikang Riffian Wikang Rohingya Wikang Romani Wikang Samogityano Wikang Santali Wikang Shilha Wikang Shuswap Wikang Silesyo Wikang Sinaunang Griyego Wikang Sumeryo Wikang Swati Wikang Sylheti Wikang Syriac Wikang Talysh Wikang Temuan Wikang Tumbuka Wikang Tuvalu Wikang Uab Meto Wikang Veps Wikang Vietnam Wikang Võro Wikang Wayuu Wikang Yakut Wikang Zamboangueño Wolof Xhosa Xiang Chinese Yidish Yoruba Yukatekong Maya Zaza Zeelandic Zulu Wikang di-tiyak - File description
- Contains additional fields for each sentence (owner name, date created/modified).
- Fields and structure
- id ng pangungusap [tab] Wika [tab] Text [tab] Username [tab] Petsa nang idinagdag [tab] Petsa nang huling ibinago
Original and Translated Sentences
- Filename
- sentences_base.tar.bz2
- File description
-
Each sentence is listed as original or a translation of another. The "base" field can have the following values:
- zero: The sentence is original, not a translation of another.
- greater than zero: The id of the sentence from which it was translated.
- \N: Unknown (rare).
- Fields and structure
- id ng pangungusap [tab] Base field
Sentences (CC0)
- Filename
-
Lahat ng wika Mga pangungusap lang na nasa: Aleman Algerian Arabic Ancient Hebrew Arabe Belarusian Bengali Berber Cantonese Catalan Danish Ebreo Esperanto Finlandes Hanggaryan Hapon Hindi Ido Ingles Interlingua Italyano Jewish Babylonian Aramaic Jewish Palestinian Aramaic Kabyle Karelian Kastila Klingon Kven Finnish Láadan Ladino Latin Literary Chinese Middle English Noruwegong Bokmål Nyungar Olandes Old Aramaic Old Frisian Old Norse Phoenician Polish Portuges Pranses Ruso Standard Moroccan Tamazight Suweko Tachawit Toki Pona Tseko Tsino Mandarin Ukranyano Volapük Welsh Wikang Ho Wikang Ligur Wikang Santali Wikang Sinaunang Griyego Wikang Sylheti Yidish Wikang di-tiyak - File description
- Contains all the sentences available under CC0.
- Fields and structure
- id ng pangungusap [tab] Wika [tab] Text [tab] Petsa nang huling ibinago
Mga kawing
- Filename
- links.tar.bz2
- File description
- Contains the links between the sentences. 1 [tab] 77 means that sentence #77 is the translation of sentence #1. The reciprocal link is also present, so the file will also contain a line that says 77 [tab] 1.
- Fields and structure
- id ng pangungusap [tab] id ng Salin
Mga etiketa
- Filename
- tags.tar.bz2
- File description
- Contains the list of tags associated with each sentence. 381279 [tab] proverb means that sentence #381279 has been assigned the "proverb" tag.
- Fields and structure
- id ng pangungusap [tab] Pangalan ng etiketa
Mga talaan
- Filename
- user_lists.tar.bz2
- File description
- Contains the list of sentence lists.
- Fields and structure
- id ng talaan [tab] Username [tab] Petsa nang inilikha [tab] Petsa nang huling ibinago [tab] Pangalan ng talaan [tab] Mababago ni
Mga pangungusap sa talaan
- Filename
- sentences_in_lists.tar.bz2
- File description
- Indicates the sentences that are contained by any lists. 13 [tab] 381279 means that sentence #381279 is contained by the list that has an id of 13.
- Fields and structure
- id ng talaan [tab] id ng pangungusap
Japanese indices
- Filename
- jpn_indices.tar.bz2
- File description
- Contains the equivalent of the "B lines" in the Tanaka Corpus file distributed by Jim Breen. See this page for the format. Each entry is associated with a pair of Japanese/English sentences. id ng pangungusap refers to the id of the Japanese sentence. id ng Kahulugan refers to the id of the English sentence.
- Fields and structure
- id ng pangungusap [tab] id ng Kahulugan [tab] Text
Mga pangungusap na may audio
- Filename
- sentences_with_audio.tar.bz2
- File description
- Contains the ids of the sentences, in all languages, for which audio is available. Other fields indicate who recorded the audio, its license and a URL to attribute the author. If the license field is empty, you may not reuse the audio outside the Tatoeba project.
- Downloading audio
- A single sentence can have one or more audio, each from a different voice. To download a particular audio, use its audio id to compute the download URL. For example, to download the audio with the id 1234, the URL is https://tatoeba.org/audio/download/1234.
- Fields and structure
- id ng pangungusap [tab] Audio id [tab] Username [tab] Lisensiya [tab] Attribution URL
User skill level per language
- Filename
- user_languages.tar.bz2
- File description
- Indicates the self-reported skill levels of members in individual languages.
- Fields and structure
- Wika [tab] Skill level [tab] Username [tab] Mga detalye
Users' sentence reviews
- Filename
- users_sentences.csv
- File description
- Contains sentences reviewed by users. The value of the review can be -1 (sentence not OK), 0 (undecided or unsure), or 1 (sentence OK). Warning: this data is still experimental.
- Fields and structure
- Username [tab] id ng pangungusap [tab] Review [tab] Petsa nang idinagdag [tab] Petsa nang huling ibinago
Transcriptions
- Filename
-
Lahat ng wika Mga pangungusap lang na nasa: Cantonese Hapon Tsino Mandarin Uzbek - File description
- Contains all transcriptions in auxiliary or alternative scripts. A username associated with a transcription indicates the user who last reviewed and possibly modified it. A transcription without a username has not been marked as reviewed. The script name is defined according to the ISO 15924 standard.
- Fields and structure
- id ng pangungusap [tab] Wika [tab] Script name [tab] Username [tab] Transcription