Wall (5,570 threads)

MisterTrouser MisterTrouser 7 hours ago 2019-11-14 14:05:23 link permalink

Topic: More Structure for tatoeba - Your opinion.


I'd like to hear your opinion on my following thought.
I created headings, to make it easier to jump through the points of interest only.

Problem, that I see:
Tatoeba is a very good source for example sentences. Also audio provision grows slowly. But it's rather a huge database with a frontend, than a source for structured learning. From my perspective, this makes its use for beginners quite difficult.

My Idea:
Give tatoeba more structure. Give a learner the opportunaty to say, "I am level 0, show me appropriate sentences" or, "I am level intermediate, show me advanced sentences".

Thoughts to have
- At the moment I'm thinking of doing this with lists.
- It's important, that this "common thread" does not end up in a list of random sentences. Reusing words and expressions from previous lists is important for making sure the learning-steps are small enough.
- One list of example sentences must not end up too long to tire the user. There should be just enough different versions of a sentence/expression to ensure the concept is understood.
- It's important to find the borderline from "sentences, that focus on grammarly aspects" and "sentences, that focus on expressional aspects" (feelings, complex expressions of complex situations)

Could I have your opinion on this subject?
Not like "I'm in" or "let's do this!", but rather "this is a good idea, but don't forget, that ..." or "this is a bad idea, because ...".

First steps I took:

As a first step I created four lists for people who know absolutely nothing of a language (here: German):

Deutsch / German: Basic Sentences ( 1 ): Just started a language.
Deutsch / German: Basic Sentences ( 2 ): adjectives
Deutsch / German: Basic Sentences ( 3 ): More nouns
Deutsch / German: Basic Sentences ( 4 ): I my, he his, ...


My idea with these lists would be:
(1) Translate the sentences inside it, to create those basic sentences in other languages. - Although as an ultimate goal it might be worth having different lists for different languages, if other examples make more sense in a language.
(2) Create more lists like this to provide the user with a common thread. I know, that giving a good direction here is very difficult.

Features needed:
Sortable list would be suuuch a nice2have for the structuring.


Thanuir Thanuir 5 hours ago, edited 5 hours ago 2019-11-14 15:17:02, edited 2019-11-14 15:18:16 link permalink

(The links do not work. You have only copied the shortened link displayed somewhere, rather than the actual link.)

* Maybe @CK has some insight, given the curation he has done with English sentences. He has avoided some kinds of near-duplicates in his curation.

* There are methods of sorting sentences by difficulty, and doing it automatically. The method of this Anki deck sorts sentences by frequency of words, for example; not perfect, but reasonably good:

Duolingo has made an AI tool for some languages: ; Some of it can be tested here, but it is probably not available as such for us mere mortals, Duoling being a company that aims for profit:

Similar automatic methods of sorting sentences by difficulty might be worth investigation. If there exists a database of sentences and difficulties, than it might a reasonably straightforward machine learning task to have a neural network learn to classify more sentences. (Reasonably straightforward for an expert.)

Maybe clozemaster also does something like this?

* Curated lists of sentences can be useful regardless.

* I use Tatoeba as a dictionary and a collection of sentences I can translate. I will learn a bit by translating sentences herein. If there was a curated list of, say, beginner sentencs in French (a language I know poorly but am learning at the moment), then I could try to translate sentences from that list. That would be useful.

AlanF_US AlanF_US 4 hours ago 2019-11-14 16:13:47 link permalink

> Maybe clozemaster also does something like this?

Yes, Clozemaster sorts sentences by frequency of the word that it wants you to guess within a sentence.

My opinion regarding MisterTrouser's proposal is that this is the kind of project that can use Tatoeba data, with our blessing, but should be implemented and managed outside the Tatoeba Project.

Yorwba Yorwba an hour ago 2019-11-14 19:45:40 link permalink

I think Clozemaster also only blanks out the least frequent word in a given sentence, so it ends up sorting sentences first where the frequency of the least frequent word is highest.

I've found that kind of minimax approach to be quite good compared to other variants I've tried in my own practice system. Taking the average frequency puts long sentences with a few rare words too early, and trying to estimate the frequency of a sentence by treating every word as independently identically distributed puts too much weight on one-word sentences (or even non-sentences).

I agree that it's better for that kind of project to simply use Tatoeba data, with Tatoeba staying a huge database with frontend. If specific improvements to features like collaborative lists etc. are required to support that use, I'm sure they can be implemented.

I can see one advantage of adding features for learners directly to Tatoeba, which is that it would make it more obvious how to correct mistakes one spots while learning. In theory the attribution required by copyright should be enough for that, but in practice it doesn't seem to be presented as a way to contribute improvements to the underlying data.

CK CK 9 hours ago 2019-11-14 11:17:55 link permalink

** Sentences with Audio Linked to Other Sentences with Audio **

49,934 sentences

17,763 sentences

14,979 sentences

9,847 sentences

7,207 sentences

6,168 sentences

4,482 sentences

4,044 sentences

3,527 sentences

3,403 sentences

1,375 sentences

Mandarin Chinese
1,350 sentences

1,225 sentences

884 sentences

Note that not all of these are bilingual audio pairs, since some of the sentences with audio are linked to other sentences in the same language with audio.

I only checked some of the languages that had a lot of audio files. You can check other languages you may be interested in by changing the language at the top of the "More search criteria" on the right side of the page of any of the above searches.

Thanuir Thanuir 7 hours ago 2019-11-14 13:31:53 link permalink

119,065 sentences

deniko deniko 3 hours ago 2019-11-14 17:19:32 link permalink

Most of them seem to be linked to other English sentences with audio.

Pfirsichbaeumchen Pfirsichbaeumchen 6 days ago, edited 6 days ago 2019-11-08 07:06:29, edited 2019-11-08 07:06:46 link permalink


[ENG] Corpus Maintainer Candidate for English.

Shekitten has applied to become a corpus maintainer for English to help make necessary corrections in sentences owned by inactive members. As usual, we invite you all to give us your comments in a private message (simply click on the link below).

[EPO] Kandidato por iĝi bontenanto de la angla frazaro.

Shekitten kandidatas por iĝi bontenanto de la angla frazaro kaj do por helpi la korektadon de eraraj frazoj de anoj ne plu aktivaj. Laŭ la kutimo ni invitas ĉiun komenti pri tio en privata mesaĝo al ni (simple alklaku la suban ligilon).

[DEU] Korpuspflegerkandidatin für Englische.

Shekitten hat sich als Korpuspflegerin für das Englische beworben, um bei der Korrektur fehlerhafter Sätze nicht mehr aktiver Mitglieder zu helfen. Wie immer ist jeder eingeladen, sich hierzu in einer Privatnachricht an uns äußern (einfach auf die folgende Verknüpfung klicken).


Pfirsichbaeumchen Pfirsichbaeumchen 9 hours ago, edited 9 hours ago 2019-11-14 11:18:21, edited 2019-11-14 11:22:08 link permalink


[ENG] Shekitten is now a corpus maintainer for English.

[DEU] Shekitten ist jetzt Korpuspflegerin für das Englische.

[EPO] Shekitten nun estas bontenanto por la angla frazaro.

AlanF_US AlanF_US 8 hours ago 2019-11-14 13:01:43 link permalink


CK CK 14 hours ago 2019-11-14 06:29:22 link permalink

** New German Voice **

MisterTrouser has contributed 585 audio files.

Impersonator Impersonator 2 days ago, edited 2 days ago 2019-11-12 14:03:06, edited 2019-11-12 16:32:01 link permalink


First, the policy to encourage 'native'-language contributions is only useful for a handful of big languages. For smaller languages, where people have limited access to education and media in their parents' language, being a native doesn't mean your language is better.

Second, it often just makes no sense! Native language can be defined as:

* language of one's parents,
* language of one's ethnic group,
* first language learnt,
* language one considers 'native',
* language whose community accepts you as a 'native speaker'
* language one knows best,
* language one uses most, etc.

For speakers of big languages, those things often overlap, so it's easy to establish who is 'native'. For speakers of smaller languages, those criteria usually *don't* overlap. Finding a 'fully' 'native' speaker of smaller languages is usually next to impossible.

Third, it's unfair. Power dynamics between language speakers often reflect historical injustice: conquered people were forced to speak the language of the conquerors. What native speaker policy does is that it suggests that we need to imitate the conquerors.

Fourth, it's not even useful. Many native speakers only speak their own dialect and might not have a strong grasp of the standard language — which puts them in a position that is similar to second-language learners.


I think promoting native speaker policy is very harmful to Tatoeba and the world in general.


Here's a suggestion.

If you ever have a question 'is X a native speaker?', ask a question 'is X proficient enough for what they are doing?'.

You don't need to speak a language since childhood to get a sentence 'The apple is red' right.

And speaking a language since childhood won't help you with writing highly technical sentences with scientific terms.

So, instead of asking 'have you been speaking Y since childhood?', ask 'do you know enough Y to get this right?'.

shekitten shekitten 2 days ago 2019-11-12 15:47:50 link permalink

Agreed! It's telling that the "native language" policy, which it needs to be stated *does not* exist on this site, is promoted mostly by native speakers of English.

GuidoW GuidoW 2 days ago 2019-11-12 19:17:53 link permalink

This made me laugh. :) Forget this more than naive idea - if you look at the behavior of some admins here you can only think of a Facepalm.

There are non-natives with basic knowledge evaluating sentences of bi-lingual educated persons and those are the same ones who later point out that you are only requested to translate into your native Language… totally Nonsense.

One of the most encouraged corpus maintainers for German for example is a native English person (and the fun fact is that he does his job clearly better than lots of the so-called German Natives :)

Thanuir Thanuir 2 days ago 2019-11-12 19:52:55 link permalink

I think people should be encouraged to post sentences they are confident about, or ones that would not otherwise be added. When someone is new to the website, it is usually a good idea for them to add sentences in their strongest language.

I think people should be hesitant to add sentences in languages they are learning, and only add ones they are confident about. If the language has sentences or active contributors, it can be learned by translating their sentences to one's stronger languages.

MisterTrouser MisterTrouser yesterday 2019-11-13 03:28:27 link permalink

I very much agree to Thanuir.

In general I think the wording / phrase "native speaker" should be thought of ( / defined as ) "proficient enough", in the sense that Impersonator used the latter. Or perhaps "educated speaker" in the sense of a teacher.

One should not forget, that *wrong information is worse, than no information*. And it seems, that we have a couple of language learners adding sentences, that are simply wrong. This might - of course - happen as well with people speaking the language since childhood, but the probability is rather less.

AlanF_US AlanF_US yesterday 2019-11-13 15:26:53 link permalink

I agree with most of what has been posted in this thread. Basically, Tatoeba wants to (1) create a high-quality corpus of sentences and (2) avoid overwhelming the community with excessive effort required to achieve it. That means that we need people to be very proficient at the languages in which they write sentences. Since we need to be concise, both in the user interface and in our documentation, we need to settle on a term for the required skill level, and "native" is just about the best we can find. Clearly, it won't cover everyone's situation, but on the whole, it's better than the alternatives.

If you do contribute in a language that is not your strongest, please follow the guidelines here:

Thanuir Thanuir yesterday 2019-11-13 15:54:35 link permalink

What would the bad effects be if "native language" was replaced by "strongest language" everywhere?

Impersonator Impersonator 23 hours ago, edited 23 hours ago 2019-11-13 21:43:06, edited 2019-11-13 21:43:19 link permalink

This would discourage bilingual speakers from contributing in smaller languages.

Speakers of smaller languages usually grow up in a culture dominated by other languages. E.g. in Belarus, most people speak Russian, most institutions and media is in Russian. People are much more likely to receive education in Russian, and have much more opportunities to practice Russian than Belarusian.

If you force people to use 'strongest language', people would end up contributing in Russian, leaving almost no one to contribute in Belarusian.

AlanF_US AlanF_US 22 hours ago 2019-11-13 22:21:59 link permalink

> If you force people to use 'strongest language'...

We do not force people to use their strongest language. We recommend that they do, but if they have a solid reason to do otherwise, then they can. They do need to understand, however, that if they are weak in the language in which they are contributing, they might add bad sentences to the corpus, which will either cause work for someone else who needs to correct their work, or go unnoticed. They should not try to contribute complicated sentences, and they should avoid contributing large numbers of sentences.

The only place in the user interface where the concept of native or strongest language comes up is in the advanced search dialog, which allows you to restrict your search to sentences by self-identified natives. I think that's fair. When setting up profiles, users are also asked to rank their ability in the languages in which they want to contribute, but that is done only by choosing a number of stars (one to five).

Thanuir Thanuir 15 hours ago 2019-11-14 05:41:07 link permalink

Good point, thanks.

AlanF_US AlanF_US 23 hours ago, edited 23 hours ago 2019-11-13 22:07:43, edited 2019-11-13 22:10:35 link permalink

"Native" isn't always used to refer to a language. Sometimes it's used to refer to a person. For instance, the advanced search dialog contains a checkbox with the caption "Owned by a self-identified native". I can't think of a good concise way to say "Owned by someone whose strongest language is the one in which the sentence was written".

Orava Orava 2 days ago 2019-11-12 16:25:49 link permalink

Any good lists of funny/witty/interesting sentences? Although I enjoy many simple sentences too, for example "Do you have any pets?". But yeah sometimes... When I get that feeling, I need unusual sentences.

Thanuir Thanuir 2 days ago, edited 2 days ago 2019-11-12 16:41:38, edited 2019-11-12 17:52:36 link permalink

Joitakin ehdotuksia:

1. Etsi alkaen pisimmästä lauseesta.

2. Älä käännä englannista. Jos käännät, niin älä ainakaan rajoita hakujasi nauhoitettuihin lauseisiin. Myös epänatiivien lauseet ovat usein monipuolisempia. Orvoissa lauseissa on välillä jänniä kielikuvia, joista osa on oikein.

3. Kun törmäät kiinnostavaan sanaan, etsi sitä sanaa ja käännä lauseita, kunnes törmäät uuteen kiinnostavaan sanaan.

4. Jos joku tietty aihepiiri nappaa, katso löytyisikö siihen viittaava tunniste.

5. Aloita suomentaminen lyhyistä lauseista. Siellä on usein huudahduksia, haukkumasanoja, slangia jne. jota harvemmin löytyy pitemmistä lauseista. Ja helpot lauseet ovat nopeita kääntää...

Thanuir Thanuir 2 days ago 2019-11-12 16:48:08 link permalink

Kengurut Itävallassa on ihan hauska tunniste.

Orava Orava yesterday 2019-11-13 15:40:17 link permalink

Kiitos kaikille vastanneille.

soliloquist soliloquist 2 days ago 2019-11-12 18:36:16 link permalink (Turkish)

cojiluc cojiluc 27 days ago, edited 27 days ago 2019-10-18 12:56:52, edited 2019-10-18 12:57:48 link permalink

feature request: button for copying an entire sentence

Sometimes for diverse reasons one would like to quickly copy a sentence. If there exists a button (like the one for play sound) to quickly copy the sentence, it would be very handy.

Especially, when one would like to copy a sentence from devices without mouse (like many phones or tablets) the copying task is not very easy.

Guybrush88 Guybrush88 27 days ago 2019-10-18 13:51:40 link permalink

this feature already exists. You have to activate it from the settings page

Thanuir Thanuir 27 days ago 2019-10-18 14:21:28 link permalink

Yes, settings -> experimental options -> button for copying.

cojiluc cojiluc 27 days ago 2019-10-18 14:33:32 link permalink

Thanks Guybrush, Thanuir. That was exactly what I was looking for.

cojiluc cojiluc 14 days ago 2019-10-31 20:06:47 link permalink

Unfortunately, this button does not work with the default browser in iPad.

rumpelstilzchen rumpelstilzchen 13 days ago 2019-11-01 06:03:44 link permalink

What's the version of your browser on the iPad?

cojiluc cojiluc 13 days ago 2019-11-01 08:36:43 link permalink

Safari 13

rumpelstilzchen rumpelstilzchen 13 days ago, edited 13 days ago 2019-11-01 09:40:51, edited 2019-11-01 16:22:45 link permalink

We use a rather old library version for that feature. There's a demo with the latest version here:

I've also setup a simple test page which simulates the way the code currently works on Tatoeba but uses the latest version:

Does any of these work with an iPad/iPhone? If not I'm afraid we need to look for an alternative.

PaulP PaulP 13 days ago 2019-11-01 14:53:09 link permalink

> Does any of these work with an iPad/iPhone?

Both work on iPhone!

cojiluc cojiluc 13 days ago 2019-11-01 15:43:42 link permalink

Copying and cutting text work in the boxes in the first link.

But in the second link copying does not work (Safari/ iPad). No matter what I put in the text box, the only thing copied is "Sample text".

rumpelstilzchen rumpelstilzchen 13 days ago 2019-11-01 16:28:48 link permalink

Sorry for the confusion but the copy button just copies "Sample text" to the clipboard. I've just added the text box to that page to provide a place for pasting the clipboard content.

Thanks to both you and PaulP for testing. It's clear now that we need to update the library code. :-)

rumpelstilzchen rumpelstilzchen 3 days ago 2019-11-11 16:31:06 link permalink

The code for the clipboard feature was updated. Does it now work on iOS?

PaulP PaulP 3 days ago 2019-11-11 17:18:06 link permalink

> The code for the clipboard feature was updated. Does it now work on iOS?

No. Doesn’t work here.

rumpelstilzchen rumpelstilzchen 2 days ago 2019-11-12 04:06:14 link permalink

What iPhone version do you use?

PaulP PaulP 2 days ago 2019-11-12 06:05:05 link permalink

8 plus. IOS 13.1.2 (the latest)

rumpelstilzchen rumpelstilzchen 2 days ago 2019-11-12 07:33:41 link permalink

Interesting. Since cojiluc says it works on an iPad with Safari 13 I'm afraid I'm out of ideas at the moment. :-(

I may have a chance to get access to an iPhone in the next days for further investigation.

PaulP PaulP 2 days ago 2019-11-12 09:09:12 link permalink

So sorry for the confusion. Now it works on Safari and on Firefox. I don’t know why it didn’t work an hour ago. Maybe I forgot to reload the page? Or I was still sleepy?? Thanks for the solution!

cojiluc cojiluc 3 days ago 2019-11-11 18:50:04 link permalink

For me it works.

sharptoothed sharptoothed 3 days ago 2019-11-11 12:02:34 link permalink

** Stats & Graphs **

Tatoeba Stats, Graphs & Charts have been updated:

Guybrush88 Guybrush88 2 days ago 2019-11-11 21:38:16 link permalink

thanks :)

MisterTrouser MisterTrouser 3 days ago, edited 2 days ago 2019-11-10 22:57:02, edited 2019-11-12 00:36:23 link permalink


Bugreport: Setting option "has audio = yes" will be reset after clicking the (advanced) "search" button.
Effect: It's not possible to filter for "has audio = yes".

Tested on: Firefox, Chrome

CK CK 3 days ago, edited 2 days ago 2019-11-10 23:36:37, edited 2019-11-12 04:06:57 link permalink

[not needed anymore - removed by CK]

MisterTrouser MisterTrouser 3 days ago, edited 3 days ago 2019-11-11 00:14:04, edited 2019-11-11 00:14:40 link permalink

I'd like to test, but since I wrote that wall entry, all results I get are:

"Search error

An error occurred while performing the search. If the problem persists, please let us know."

(also with the two search forms on your page)

Guybrush88 Guybrush88 3 days ago 2019-11-11 06:58:06 link permalink

I get the same bug also with audio set to "any"

GuidoW GuidoW 3 days ago 2019-11-11 08:04:05 link permalink

Yup, can confirm it (Chrome & Safari)

TRANG TRANG 3 days ago 2019-11-11 19:22:02 link permalink

It should be fixed now.

(Special thanks to rumpelstilzchen for the pull request.)

MisterTrouser MisterTrouser 2 days ago 2019-11-12 00:36:02 link permalink

I thank everybody involved!

gillux gillux 3 days ago 2019-11-11 10:35:39 link permalink

Search is back. Thanks for reporting the problem!