menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Wall (7,000 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages subdirectory_arrow_right

urro

7 days ago

subdirectory_arrow_right

Augustus

8 days ago

feedback

coinxee

10 days ago

feedback

sharptoothed

12 days ago

subdirectory_arrow_right

LanguageExpert

15 days ago

feedback

changkuoth

15 days ago

feedback

Igider

15 days ago

subdirectory_arrow_right

samir_t

19 days ago

subdirectory_arrow_right

doemaar14

19 days ago

subdirectory_arrow_right

Warwari

19 days ago

21 hours ago July 26, 2024 at 9:28:21 AM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

3 days ago July 23, 2024 at 9:03:12 AM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

coinxee coinxee 10 days ago July 17, 2024 at 2:52:34 AM UTC link Permalink

Is there an open-source English sentence database similar to Tatoeba?

{{vm.hiddenReplies[40692] ? 'expand_more' : 'expand_less'}} hide replies show replies
Augustus Augustus 8 days ago July 18, 2024 at 8:53:16 PM UTC link Permalink

Mozilla's Common Voice is similar in collecting sentences and recordings thereof. It does not have the translation aspect of Tatoeba.

See https://commonvoice.mozilla.org/

urro urro 7 days ago, edited 7 days ago July 20, 2024 at 1:23:22 AM UTC, edited July 20, 2024 at 1:27:38 AM UTC link Permalink

If you just need English sentences, there are a few. However, I have looked myself, and found Tatoeba to be of the best quality, especially for English.

English-only:
• English Penn Treebank (Pennsylvania State University)
... is not something I know much about.
• English Web Treebank (Universal Dependencies)
... is mostly composed of biased sentence picks, but each has a grammatical breakdown. Stanford's NLP project Stanza uses it.
• Common Voice (Mozilla Foundation)
... as Augustus said!

With translation:
• OpenSubtitles2018 Corpus (OpenSubtitles)
... isn't very good for high-fidelity translation, but is rather natural, apart from its dramatizations.

Honorable mentions:
• Google Books Ngram Dataset (Google)
... only has a few languages. For example, their Japanese dataset is old and can only be accessed via purchase in yen.
• Wikipedia and Wiktionary (Wikimedia Foundation)

• Any other English (meta)corpora out there

https://www.google.com/search?q...s"%7C"dataset"

It really depends on your intentions and usage, as all corpora have their biases, unfortunately.

sharptoothed sharptoothed 12 days ago July 14, 2024 at 6:13:38 PM UTC link Permalink

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

changkuoth changkuoth 15 days ago July 11, 2024 at 2:08:14 PM UTC link Permalink

I recently discovered that the Nuer language has been added to Google Translate, and I was thrilled to contribute, as I have always looked forward to this astonishing news. I have a passion for the Nuer language and have spoken it since birth. I can write, read, and speak Nuer fluently.

{{vm.hiddenReplies[40687] ? 'expand_more' : 'expand_less'}} hide replies show replies
LanguageExpert LanguageExpert 15 days ago July 11, 2024 at 8:55:16 PM UTC link Permalink

Yes, I've noticed that too! Also, I appreciate all the contributions that you've done on Tatoeba. I love languages, and I just thought I'd show you my appreciation for contributing sentences in Nuer. :) I enjoyed reading your story about your passion for Nuer.

Igider Igider 15 days ago, edited 15 days ago July 11, 2024 at 12:26:39 PM UTC, edited July 11, 2024 at 7:10:54 PM UTC link Permalink

Azul, Hi,

(Kab) Wikigzawal - (Eng) Wikitionary*

Maca

(Kab) Wikipedia - (Eng) Wikipedia

Ilaq awalen-a n teglizit ad ten-nerr ar teqbaylit s unamek d uqaleb n teqbaylit.

*Takadimit taqvaylit (Kabyle academy)

doemaar14 doemaar14 20 days ago July 6, 2024 at 11:18:58 PM UTC link Permalink

I recently discovered that Tamazight has been added to Google Translate. I'm wondering: did they use the incredible amount of Tamazight sentences here on Tatoeba?
If that's the case, I congratulate all of the Tamazight contributors on here.
If not, I wonder what data they trained their algorithm on.

{{vm.hiddenReplies[40679] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yettutlay Yettutlay 19 days ago July 7, 2024 at 7:31:10 AM UTC link Permalink

Thank you ❤️ Tanemmirt ❤️

samir_t samir_t 19 days ago, edited 19 days ago July 7, 2024 at 7:50:46 AM UTC, edited July 7, 2024 at 11:34:29 AM UTC link Permalink

Yes, Google Translate added what it called "tamazight" and used the both Tatoeba corpus (ber and kab), but in reality it is 99% Kabyle, which is why it poses a problem for its use, moreover Moroccans have already complained to Google, because the word Tamazight normally includes several North African languages ​​including Algerian and Moroccan languages, and not just Kabyle.

{{vm.hiddenReplies[40681] ? 'expand_more' : 'expand_less'}} hide replies show replies
doemaar14 doemaar14 19 days ago July 7, 2024 at 8:32:26 PM UTC link Permalink

@samir_t Interesting. If it's not too much work, could you give me an example where Google Translate gives you a Kabyle sentence instead of ''standard Tamazight''?

{{vm.hiddenReplies[40683] ? 'expand_more' : 'expand_less'}} hide replies show replies
samir_t samir_t 19 days ago, edited 19 days ago July 7, 2024 at 8:44:21 PM UTC, edited July 7, 2024 at 8:52:36 PM UTC link Permalink

Yes, here are examples:

https://tatoeba.org/fr/sentences/show/12559102

https://tatoeba.org/fr/sentences/show/12559103

Look at the Kabyle translation of each sentence in English, then enter it into Google translations for translation into "Tamazight", and compare: it gives the same Kabyle sentence. It actually only translates into Kabyle even though the name of the language is "Tamazight".

Warwari Warwari 19 days ago July 7, 2024 at 10:29:21 AM UTC link Permalink

Ce que vient de faire Google est une révolution pour notre langue amazighe. Le projet est ouvert à tout le monde, y compris les Marocains. D'ailleurs plusieurs amazighophones marocains y ont participé (et participent ici sur Tatoeba) et tout le monde est le bien venu pour participer au développement de la traduction automatique amazighe aussi bien sur Tatoeba. Tanemmirt (merci), doemaar14 pour ton message de félicitation.

21 days ago July 5, 2024 at 12:48:00 PM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

CK CK 23 days ago, edited 23 days ago July 4, 2024 at 12:35:41 AM UTC, edited July 4, 2024 at 12:54:29 AM UTC link Permalink

🍎 Screenshots 2014 to 2024

All times are in GMT+9 (Japan Time)

Most of these are from July 4th.

Most of these are of the "Number of sentences per language" page.

► Number of sentences per language at 2024-07-04 08:58
https://imgur.com/a/1gv0Zt2
12,125,173

► Number of sentences per language 2023-07-05 at 8:31
https://imgur.com/a/zTbcNPJ
11,484,149

► Number of sentences per language 2022-07-04 16:16
https://imgur.com/a/AAAeYHH
10,542,864

► Number of sentences per language 2021-07-04 at 10:59
https://imgur.com/a/sVQtM2S
9,733,562

► Number of sentences per language 2020-07-04 at 12:14
https://imgur.com/a/FQ2naCf
8,487,237

► Number of sentences per language 2019-07-11 at 11:28
https://imgur.com/a/JylbPlA
7,653,357

► Number of sentences per language 2018-07-04 at 9:52
https://imgur.com/a/suukPes
6,603,517

► Number of sentences per language 2017-10-12 at 12:01
https://imgur.com/a/FpJT9CN
6,021,256

► Number of sentences with audio by language 2016-10-20 at 11:09
https://imgur.com/a/thn6WPj
The only screenshot I have from 2016 is for the audio.

► Number of sentences with audio by language 2015-01-22 at 20:18
https://imgur.com/a/bfL97Fm
The only screenshot I have from 2015 is for the audio.

► Number of sentences per language 2014-07-11 at 22:56
https://imgur.com/a/7amGeSw
3,244,250

24 days ago July 2, 2024 at 12:38:13 PM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.