menu
Tatoeba
language
Cofrestru Mewngofnodi
language Cymraeg
menu
Tatoeba

chevron_right Cofrestru

chevron_right Mewngofnodi

Pori

chevron_right Show random sentence

chevron_right Pori yn ôl iaith

chevron_right Pori yn ôl rhestr

chevron_right Pori yn ôl tag

chevron_right Pori sain

Community

chevron_right Mur

chevron_right Rhestr o bob aelod

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Noder

The data you will find here will NOT be useful unless you are coding a language tool or processing data.

Os hoffech chi weld brawddegau er mwyn dysgu iaith, cymerwch olwg ar y rhestri o frawddegau. Gallwch greu un eich hunain neu weld un a grëwyd gan eraill. Gallwch hefyd lawrlwytho ac argraffu'r rhestri.

Gwybodaeth gyffredinol am y ffeiliau

Many of the Japanese and English sentences are from the Tanaka Corpus, which belongs to the public domain.

Creative commons

These files are released under CC BY 2.0 FR.

Creative Commons License CC-BY

A part of our sentences are also available under CC0 1.0.

Creative Commons License CC0

Licenses covering audio

The license covering an audio file is chosen by the contributor, and is indicated on the page that lists the audio files that he or she has contributed.

Cwestiynau?

If you have questions or requests, feel free to contact us. In general, we answer quickly.

Lawrlwythiadau

arrow_back

Custom exports

Sentence pairs

Use this tool to generate and download customized exports on demand.

translate Sentence pairs
Download all sentences in language A with translations in language B

Download all sentences in language A that are translated into language B, along with the translations.

Weekly exports

info The files provided below are updated every Saturday at 6:30 a.m. (UTC).

Brawddegau

Filename

{{sentences | filename}}

Pob iaith
Only sentences in: Abchaseg Affricaneg Affrihili Ainŵeg Aklanon Albaneg Almaeneg Almaeneg Palatin Almaeneg Pensylfania Almaeneg y Swistir Altäeg Deheuol Amhareg Ancient Hebrew Arabeg Arabeg Algeria Arabeg Irac Arabeg Moroco arabeg y lefant Arabeg yr Aifft Aragoneg Arawcaneg Asameg Aserbaijaneg Assyrian Neo-Aramaic Astwrieg Avar Awadhi Aymareg Bafarieg Balïeg Balwtsi Bambareg Banjar Basgeg Bashcireg Baybayanon Bengaleg Belarwseg Berbereg Berom Bhojpuri Bislama Bodo Bosnieg Brithenig Buryat Bwlgareg Byrmaneg Cabardieg Cabileg Calmyceg Camba Cantoneg Careleg Casacheg Càseg Cashmireg Casiwbeg Catalaneg Cayuga Cebuano Central Bikol Central Dusun Central Huasteca Nahuatl Central Kanuri Central Kurdish (Soranî) Central Mnong Cernyweg Ciniarŵandeg Circaseg Gorllewinol Cirgiseg Coastal Kadazan Coreeg Corseg Cotafa Creol Haiti Croateg Cuyonon Cwmiceg Cwrdeg Deheuol CycL Cymraeg Chavacano Chinese Pidgin English Chinook Jargon Chinyanja Chmereg Chukchi Daneg Divehi Drents Dungan Dutton World Speedwords Eastern Armenian Eidaleg Emilian Erromintxela Erzya Esperanto Estoneg Evenki Ewe Extremadureg Feniseg Feps Fietnameg Fiji Hindi Ffaröeg Ffijïeg Ffineg Ffrangeg Ffrangeg Canol Ffriseg Ffriseg Saterland Ffriwleg Ga Gaeleg yr Alban Gagauz Galiseg Gan Garhwali Georgeg Ghegeg Albania Gilberteg Glasynyseg Gotheg Groeg Gronings Guadeloupean Creole French Guerrero Nahuatl Gulf Arabic Gwarani Gwjarati Gwyddeleg Ngeq Haida Hawäieg Hawsa Hebraeg Hen Brwseg Hen Dwpïeg Hen Ffrangeg Hen Groeg Hen Norseg Hen Saesneg Hen Sbaeneg Hen Slafeg Dwyreiniol Hiligaynon Hill Mari Hindi Hitchiti Hmong Daw (White) Hmong Njua (Green) Ho Hunsrik Hwngareg Iaith siloti Ibaneg Ido Iddew-Almaeneg Igbo Ingrian Ilocaneg Indonesieg Interglossa Interlingua Interlingue Inwctitwt Iorwba Isan Iseldireg Islandeg Jafaneg Jewish Babylonian Aramaic Jewish Palestinian Aramaic Jwhwrieg (Judeo-Tat) K'iche' Kannada Kapampangan Karachay-Balkar Karakalpak Karakhanid Kekchi (Q'eqchi') Kelantan-Pattani Malay Keningau Murut Khakas Khalaj Kirundi Kölsch Komi-Permyak Komi-Zyrian Konkani (Goan) Kven Finnish Láadan Ladineg Ladino Lakota Laoseg Latfieg Latgaleg Laz Libyan Arabic Lifoneg Ligurian Lingala Lingua Franca Nova Limbwrgeg Lithwaneg Lojban Lombardeg Louisiana Creole Low German (Low Saxon) Luganda Lushootseed Lwcsembwrgeg Lladin Llydaweg Macedoneg Madwreg Mahasu Pahari Maithili Malagasi Malaialam Malay (Vernacular) Maleieg Malteg Mambae Manaweg Manshw Maori Marati Marsialeg Meadow Mari Meitei Micmaceg Middle Persian (Pahlavi) Mingrelian Minangkabau Mirandeg Mocsia Mongoleg Mohoceg Mon Mono (USA) Morisyen Muskogee (Creek) Nafacho Naga (Tangshang) Nahuatl Nande Naplieg Nawrŵeg Nepaleg Newaeg Nigerian Fulfulde Niuean Nogaieg North Frisian North Moluccan Malay Northern Kurdish (Kurmancî) Northern Zaza (Kirmanjki) Norwyeg Bokmål Norwyeg Nynorsk Novial Nuosu Nŵereg Nyungar O'odham Ocsitaneg Odia (Oriya) Ojibwe Okinawan Old Aramaic Old Frisian Old Saxon Old Turkish Orizaba Nahuatl Oseteg Pangasineg Palawan Pali Papiamento Pashto Perseg Picardeg Piedmonteg Pipileg Pisin Plains Cree Portiwgaleg Pulaar Punjabi (Eastern) Punjabi (Western) Pwyleg Phoeniceg Qashqai Quechua Quenya Rapa Nui Rendille Rohingya Romani Románsh Rusyn Rwmaneg Rwsieg Saesneg Saesneg Canol Sango Sakha Sami Deheuol Sami Gogleddol Samöeg Samogiteg Sansgrit Santali Saraiki Sardeg Sasäeg Sbaeneg Serbeg Sesotheg Deheuol Setswana Seychellois Creole Sgoteg Shona Shuswap Siamaiceg Siapaneg Silesieg Sindarin Sindhi Sinhala Siocto Sisilieg siSwati Slofaceg Slofeneg Somalieg Sorbeg Isaf Sorbeg Uchaf South Levantine Arabic Southern Subanen Southern Zaza (Dimli) Sranan Tongo Swabian Swahili Swahili’r Congo Swedeg Swlŵeg Swmereg Swndaneg Syrieg Tachawit Tafodiaith De Haida Tagal Murut Tagalog Tahaggart Tamahaq Tahitïeg Tai Tajiceg Talossan Talysheg Tamaseit Moroco Safonol Tamileg Tarifit Tashelhit Tatareg Tatareg Crimea Telwgw Temuan Tetun Tibeteg Tigreg Tigrinya Tllingoneg Tonga (Zambezi) Tongeg Tokelauan Toki Pona Tshwfasheg Tsiagataidd Tsiamorro Tsieceg Tsieinëeg Tsieineeg Haca Tsieineeg Jin Tsieinëeg Llenyddol Tsieinëeg Min Nan Tsieinëeg Shanghai Tsieineeg Xiang Tsierocî Tsietsieneg Tsongaeg Tuvaluan Twfwnieg Twmbwca Twrceg Twrceg Otomanaidd Tyrcmeneg Uab Meto Uighur Umbundu Urhobo Volapük Võro Walwneg Wayuu Wcreineg Wdmwrteg Western Armenian Winarayeg Woloff Wrdw Wsbeceg Xhosa Yucatec Maya Zêlandeg Unknown language
File description
Contains all the sentences in the selected language. Each sentence is associated with a unique id and an ISO 639-3 language code.
Fields and structure
Sentence id [tab] Iaith [tab] Testun

Detailed Sentences

Filename

{{sentencesDetailed | filename}}

Pob iaith
Only sentences in: Abchaseg Affricaneg Affrihili Ainŵeg Aklanon Albaneg Almaeneg Almaeneg Palatin Almaeneg Pensylfania Almaeneg y Swistir Altäeg Deheuol Amhareg Ancient Hebrew Arabeg Arabeg Algeria Arabeg Irac Arabeg Moroco arabeg y lefant Arabeg yr Aifft Aragoneg Arawcaneg Asameg Aserbaijaneg Assyrian Neo-Aramaic Astwrieg Avar Awadhi Aymareg Bafarieg Balïeg Balwtsi Bambareg Banjar Basgeg Bashcireg Baybayanon Bengaleg Belarwseg Berbereg Berom Bhojpuri Bislama Bodo Bosnieg Brithenig Buryat Bwlgareg Byrmaneg Cabardieg Cabileg Calmyceg Camba Cantoneg Careleg Casacheg Càseg Cashmireg Casiwbeg Catalaneg Cayuga Cebuano Central Bikol Central Dusun Central Huasteca Nahuatl Central Kanuri Central Kurdish (Soranî) Central Mnong Cernyweg Ciniarŵandeg Circaseg Gorllewinol Cirgiseg Coastal Kadazan Coreeg Corseg Cotafa Creol Haiti Croateg Cuyonon Cwmiceg Cwrdeg Deheuol CycL Cymraeg Chavacano Chinese Pidgin English Chinook Jargon Chinyanja Chmereg Chukchi Daneg Divehi Drents Dungan Dutton World Speedwords Eastern Armenian Eidaleg Emilian Erromintxela Erzya Esperanto Estoneg Evenki Ewe Extremadureg Feniseg Feps Fietnameg Fiji Hindi Ffaröeg Ffijïeg Ffineg Ffrangeg Ffrangeg Canol Ffriseg Ffriseg Saterland Ffriwleg Ga Gaeleg yr Alban Gagauz Galiseg Gan Garhwali Georgeg Ghegeg Albania Gilberteg Glasynyseg Gotheg Groeg Gronings Guadeloupean Creole French Guerrero Nahuatl Gulf Arabic Gwarani Gwjarati Gwyddeleg Ngeq Haida Hawäieg Hawsa Hebraeg Hen Brwseg Hen Dwpïeg Hen Ffrangeg Hen Groeg Hen Norseg Hen Saesneg Hen Sbaeneg Hen Slafeg Dwyreiniol Hiligaynon Hill Mari Hindi Hitchiti Hmong Daw (White) Hmong Njua (Green) Ho Hunsrik Hwngareg Iaith siloti Ibaneg Ido Iddew-Almaeneg Igbo Ingrian Ilocaneg Indonesieg Interglossa Interlingua Interlingue Inwctitwt Iorwba Isan Iseldireg Islandeg Jafaneg Jewish Babylonian Aramaic Jewish Palestinian Aramaic Jwhwrieg (Judeo-Tat) K'iche' Kannada Kapampangan Karachay-Balkar Karakalpak Karakhanid Kekchi (Q'eqchi') Kelantan-Pattani Malay Keningau Murut Khakas Khalaj Kirundi Kölsch Komi-Permyak Komi-Zyrian Konkani (Goan) Kven Finnish Láadan Ladineg Ladino Lakota Laoseg Latfieg Latgaleg Laz Libyan Arabic Lifoneg Ligurian Lingala Lingua Franca Nova Limbwrgeg Lithwaneg Lojban Lombardeg Louisiana Creole Low German (Low Saxon) Luganda Lushootseed Lwcsembwrgeg Lladin Llydaweg Macedoneg Madwreg Mahasu Pahari Maithili Malagasi Malaialam Malay (Vernacular) Maleieg Malteg Mambae Manaweg Manshw Maori Marati Marsialeg Meadow Mari Meitei Micmaceg Middle Persian (Pahlavi) Mingrelian Minangkabau Mirandeg Mocsia Mongoleg Mohoceg Mon Mono (USA) Morisyen Muskogee (Creek) Nafacho Naga (Tangshang) Nahuatl Nande Naplieg Nawrŵeg Nepaleg Newaeg Nigerian Fulfulde Niuean Nogaieg North Frisian North Moluccan Malay Northern Kurdish (Kurmancî) Northern Zaza (Kirmanjki) Norwyeg Bokmål Norwyeg Nynorsk Novial Nuosu Nŵereg Nyungar O'odham Ocsitaneg Odia (Oriya) Ojibwe Okinawan Old Aramaic Old Frisian Old Saxon Old Turkish Orizaba Nahuatl Oseteg Pangasineg Palawan Pali Papiamento Pashto Perseg Picardeg Piedmonteg Pipileg Pisin Plains Cree Portiwgaleg Pulaar Punjabi (Eastern) Punjabi (Western) Pwyleg Phoeniceg Qashqai Quechua Quenya Rapa Nui Rendille Rohingya Romani Románsh Rusyn Rwmaneg Rwsieg Saesneg Saesneg Canol Sango Sakha Sami Deheuol Sami Gogleddol Samöeg Samogiteg Sansgrit Santali Saraiki Sardeg Sasäeg Sbaeneg Serbeg Sesotheg Deheuol Setswana Seychellois Creole Sgoteg Shona Shuswap Siamaiceg Siapaneg Silesieg Sindarin Sindhi Sinhala Siocto Sisilieg siSwati Slofaceg Slofeneg Somalieg Sorbeg Isaf Sorbeg Uchaf South Levantine Arabic Southern Subanen Southern Zaza (Dimli) Sranan Tongo Swabian Swahili Swahili’r Congo Swedeg Swlŵeg Swmereg Swndaneg Syrieg Tachawit Tafodiaith De Haida Tagal Murut Tagalog Tahaggart Tamahaq Tahitïeg Tai Tajiceg Talossan Talysheg Tamaseit Moroco Safonol Tamileg Tarifit Tashelhit Tatareg Tatareg Crimea Telwgw Temuan Tetun Tibeteg Tigreg Tigrinya Tllingoneg Tonga (Zambezi) Tongeg Tokelauan Toki Pona Tshwfasheg Tsiagataidd Tsiamorro Tsieceg Tsieinëeg Tsieineeg Haca Tsieineeg Jin Tsieinëeg Llenyddol Tsieinëeg Min Nan Tsieinëeg Shanghai Tsieineeg Xiang Tsierocî Tsietsieneg Tsongaeg Tuvaluan Twfwnieg Twmbwca Twrceg Twrceg Otomanaidd Tyrcmeneg Uab Meto Uighur Umbundu Urhobo Volapük Võro Walwneg Wayuu Wcreineg Wdmwrteg Western Armenian Winarayeg Woloff Wrdw Wsbeceg Xhosa Yucatec Maya Zêlandeg Unknown language
File description
Contains additional fields for each sentence (owner name, date created/modified).
Fields and structure
Sentence id [tab] Iaith [tab] Testun [tab] Enw defnyddiwr [tab] Dyddiad ychwanegu [tab] Dyddiad golygu diwethaf

Original and Translated Sentences

Filename
sentences_base.tar.bz2
File description
Each sentence is listed as original or a translation of another. The "base" field can have the following values:
  • zero: The sentence is original, not a translation of another.
  • greater than zero: The id of the sentence from which it was translated.
  • \N: Unknown (rare).
Fields and structure
Sentence id [tab] Base field

Sentences (CC0)

Filename

{{sentencesCC0 | filename}}

Pob iaith
Only sentences in: Almaeneg Ancient Hebrew Arabeg Arabeg Algeria Bengaleg Belarwseg Berbereg Cabileg Cantoneg Careleg Catalaneg Cymraeg Daneg Eidaleg Esperanto Ffineg Ffrangeg Hebraeg Hen Groeg Hen Norseg Hindi Ho Hwngareg Iaith siloti Ido Iddew-Almaeneg Interlingua Iseldireg Jewish Babylonian Aramaic Jewish Palestinian Aramaic Konkani (Goan) Kven Finnish Láadan Ladino Ligurian Lladin Norwyeg Bokmål Nyungar Old Aramaic Old Frisian Portiwgaleg Pwyleg Phoeniceg Rwsieg Saesneg Saesneg Canol Santali Sbaeneg Siapaneg Swedeg Tachawit Tamaseit Moroco Safonol Tllingoneg Toki Pona Tsieceg Tsieinëeg Tsieinëeg Llenyddol Volapük Wcreineg Unknown language
File description
Contains all the sentences available under CC0.
Fields and structure
Sentence id [tab] Iaith [tab] Testun [tab] Dyddiad golygu diwethaf

Dolenni

Filename
links.tar.bz2
File description
Contains the links between the sentences. 1 [tab] 77 means that sentence #77 is the translation of sentence #1. The reciprocal link is also present, so the file will also contain a line that says 77 [tab] 1.
Fields and structure
Sentence id [tab] Translation id

Tagiau

Filename
tags.tar.bz2
File description
Contains the list of tags associated with each sentence. 381279 [tab] proverb means that sentence #381279 has been assigned the "proverb" tag.
Fields and structure
Sentence id [tab] Tag name

Rhestri

Filename
user_lists.tar.bz2
File description
Contains the list of sentence lists.
Fields and structure
List id [tab] Enw defnyddiwr [tab] Dyddiad creu [tab] Dyddiad golygu diwethaf [tab] Enw'r rhestr [tab] Editable by

Sentences in lists

Filename
sentences_in_lists.tar.bz2
File description
Indicates the sentences that are contained by any lists. 13 [tab] 381279 means that sentence #381279 is contained by the list that has an id of 13.
Fields and structure
List id [tab] Sentence id

Japanese indices

Filename
jpn_indices.tar.bz2
File description
Contains the equivalent of the "B lines" in the Tanaka Corpus file distributed by Jim Breen. See this page for the format. Each entry is associated with a pair of Japanese/English sentences. Sentence id refers to the id of the Japanese sentence. Meaning id refers to the id of the English sentence.
Fields and structure
Sentence id [tab] Meaning id [tab] Testun

Brawddegau â sain

Filename
sentences_with_audio.tar.bz2
File description
Contains the ids of the sentences, in all languages, for which audio is available. Other fields indicate who recorded the audio, its license and a URL to attribute the author. If the license field is empty, you may not reuse the audio outside the Tatoeba project.
Downloading audio
A single sentence can have one or more audio, each from a different voice. To download a particular audio, use its audio id to compute the download URL. For example, to download the audio with the id 1234, the URL is https://tatoeba.org/audio/download/1234.
Fields and structure
Sentence id [tab] Audio id [tab] Enw defnyddiwr [tab] License [tab] Attribution URL

User skill level per language

Filename
user_languages.tar.bz2
File description
Indicates the self-reported skill levels of members in individual languages.
Fields and structure
Iaith [tab] Skill level [tab] Enw defnyddiwr [tab] Manylion

Users' sentence reviews

Filename
users_sentences.csv
File description
Contains sentences reviewed by users. The value of the review can be -1 (sentence not OK), 0 (undecided or unsure), or 1 (sentence OK). Warning: this data is still experimental.
Fields and structure
Enw defnyddiwr [tab] Sentence id [tab] Review [tab] Dyddiad ychwanegu [tab] Dyddiad golygu diwethaf

Transcriptions

Filename

{{transcriptions | filename}}

Pob iaith
Only sentences in: Cantoneg Siapaneg Tsieinëeg Wsbeceg
File description
Contains all transcriptions in auxiliary or alternative scripts. A username associated with a transcription indicates the user who last reviewed and possibly modified it. A transcription without a username has not been marked as reviewed. The script name is defined according to the ISO 15924 standard.
Fields and structure
Sentence id [tab] Iaith [tab] Script name [tab] Enw defnyddiwr [tab] Transcription