menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Wall (7,160 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages subdirectory_arrow_right

Vortarulo

11 hours ago

subdirectory_arrow_right

gillux

yesterday

subdirectory_arrow_right

gillux

2 days ago

subdirectory_arrow_right

brauchinet

3 days ago

feedback

gillux

5 days ago

subdirectory_arrow_right

TATAR1

6 days ago

feedback

Tartar

7 days ago

subdirectory_arrow_right

TATAR1

7 days ago

subdirectory_arrow_right

Rok

7 days ago

subdirectory_arrow_right

TATAR1

7 days ago

May 15, 2025 May 15, 2025 at 12:45:39 PM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

frpzzd frpzzd May 12, 2025, edited May 12, 2025 May 12, 2025 at 10:53:24 PM UTC, edited May 12, 2025 at 11:02:57 PM UTC flag Report link Permalink

Just for funsies, I ran a script to list the languages that are least well represented on Tatoeba, compared to the estimated speaker population sizes of those languages. (Specifically, the languages were restricted to those with >= 1mil speakers sorted by the quotient of the number of sentences on Tatoeba to the speaker population size.)

As you might expect, many of the worst-represented languages by this metric are various different variants of Chinese. Aside from those, the top 10 worst-represented languages are:

1. Sindhi (snd, 6 sentences vs. ~38.4mil speakers)
2. Sesotho (sot, 2 sentences vs. ~6.4mil speakers)
3. Maithili (mai, 8 sentences vs. ~19.3mil speakers)
4. Madurese (mad, 8 sentences vs. ~17.0mil speakers)
5. Libyan Arabic (ayl, 3 sentences vs. ~5.6mil speakers)
6. Western Punjabi (pnb, 72 sentences vs. ~113mil speakers)
7. Aymara (aym, 2 sentences vs. ~2.8mil speakers)
8. Pashto (pus, 47 sentences vs. ~53.0mil speakers)
9. Igbo (ibo, 35 sentences vs. ~28.0mil speakers)
10. Sundanese (sun, 40 sentences vs. ~32.0mil speakers)

If we restrict instead to languages with an estimated number of speakers >= 50mil, then here are the top 5 (excluding Chinese variants):

1. Western Punjabi (pnb, 72 sentences vs. ~113mil speakers)
2. Pashto (pus, 47 sentences vs. ~53mil speakers)
3. Punjabi (pan, 204 sentences vs. ~200mil speakers)
4. Gujarati (guj, 168 sentences vs. ~60mil speakers)
5. Telugu (tel, 271 sentences vs. ~95mil speakers)

On a more cheery note, here are the 5 *best* represented languages (that are not conlangs) with >= 1mil speakers, by the same metric:

1. Kabyle (kab, ~765k sentences vs. ~3.4mil speakers)
2. Macedonian (mkd, ~78k sentences vs. ~1.4mil speakers)
3. Lithuanian (lit, ~123k sentences vs. ~2.3mil speakers)
4. Hungarian (hun, ~420k sentences vs. ~11.8mil speakers)
5. Finnish (fin, ~151k sentences vs. ~5.2mil speakers)

And those with >= 50mil speakers:

1. Italian (ita, ~918k sentences vs. ~65mil speakers)
2. Turkish (tur, ~739k sentences vs. ~76mil speakers)
3. German (deu, ~721k sentences vs. ~92mil speakers)
4. Russian (rus, ~1.1mil sentences vs. ~170mil speakers)
5. French (fra, ~665k sentences vs. ~203mil speakers)

{{vm.hiddenReplies[41063] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx May 13, 2025, edited May 14, 2025 May 13, 2025 at 4:13:08 PM UTC, edited May 14, 2025 at 10:18:31 AM UTC flag Report link Permalink

Thanks Franklin. It's interesting to see how Eurocentric the Tatoeba corpus still is.

Based on the 2025 edition of Ethnologue 200, I found that some of the world's 100 most widely spoken languages are still completely unavailable on Tatoeba:

- Nigerian Pidgin [pcm] → 120.7M speakkers
- Dari [prs] → 33.4M speakkers
- Magahi [mag] → 21.0M speakkers
- Chhattisgarhi [hne] → 16.3M speakkers
- Pedi [nso] → 13.7M speakkers
- Chittagonian [ctg] → 13.0M speakkers
- Dyula [dyu] → 12.8M speakkers


All 7 of these languages are spoken either in Africa or South Asia.

May 12, 2025 May 12, 2025 at 3:38:40 PM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

sharptoothed sharptoothed May 11, 2025 May 11, 2025 at 7:12:12 AM UTC flag Report link Permalink

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

May 8, 2025 May 8, 2025 at 5:22:50 AM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

May 6, 2025 May 6, 2025 at 11:54:49 AM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

atitarev atitarev May 6, 2025, edited May 6, 2025 May 6, 2025 at 5:16:06 AM UTC, edited May 6, 2025 at 7:33:28 AM UTC flag Report link Permalink

Hi,

Pls help me unlink https://tatoeba.org/en/sentences/show/13203796
This Korean sentence "이 문장을 설명해 주십시오." (i munjang-eul seolmyeonghae jusipsio.)
from https://tatoeba.org/en/sentences/show/2213956 ("Please translate this.")

It should only link to
https://tatoeba.org/en/sentences/show/60278 ("Please explain this sentence to me.")

I don't have the privilege to link/unlink sentences

{{vm.hiddenReplies[41055] ? 'expand_more' : 'expand_less'}} hide replies show replies
araneo araneo May 6, 2025 May 6, 2025 at 6:57:17 AM UTC flag Report link Permalink

I have unlinked it :]

{{vm.hiddenReplies[41056] ? 'expand_more' : 'expand_less'}} hide replies show replies
atitarev atitarev May 6, 2025 May 6, 2025 at 7:34:07 AM UTC flag Report link Permalink

Thank you, @araneo!

May 4, 2025 May 4, 2025 at 9:45:06 AM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

May 4, 2025 May 4, 2025 at 9:41:09 AM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

May 2, 2025 May 2, 2025 at 11:25:40 AM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.