When should I use "@check" and when "@needs native check"? I understand how the meanings are different, but it is not clear why should I ever use @NNC, when presumably natives also check the sentences marked with @check.
I do this, too. The alternative would be to make sure that all of the different variants are used every now and then, so that they all appear somewhere in the database. But for me it is easier to add the variations that happen to cross my mind when translating a sentence.
I had some similar issues within a day.
Jos ymmärsin viestin oikein käännösohjelman avulla...

Tatoebaan mahtuu paljon erilaisia lauseita. Minusta on hyödyllistä kirjoittaa lauseita, jotka kuvastavat kielen koko ilmaisuvoimaa. Tietokanta on tietokanta myös lauseille, ei pelkästään niiden käännöksille.

Erityisesti, toisten tuottamaa sisältöä ei kannata kritisoida, ellei se ole kieliopillisesti väärin tai muuten objektiivisen huonoa; esimerkiksi yksittäisiä sanoja, joita ei voi pitää lauseina hyvällä tahdollakaan. Tatoeba on tarpeeksi suuri erilaisille lauseille. Jokaisen voi itse päättää, minkälaista sisältöä tuottaa ja kääntää.

Ehkä joku pitää sanaleikeistä, runollisesta ilmaisusta tai kirosanoista. Siitä vaan. Hänellä luultavasti on suurempi innostus niiden kuin muiden lauseiden kirjoittamiseen, ja niinpä hänen tuottavuutensa vähenisi, jos hänet pakotettaisiin kirjoittamaan tyypillisempiä lauseita.


Working on a machine translation of the original post...

Tatoeba has enough space for several types of sentences (and "sentences"). As long as they are grammatically ok and complete sentences by some reasonable and broad definition of sentence, they should be fine. I think sentences which illustrate the richness of the language are welcome. It is a database of sentences, not only of translations.

In general, I do not recommend the input of others. In a do-it-yourself volunteer project the only way of getting the kind of content you prefer is to create it. If someone else is inspired to do different things, within a reason, then let them. Tatoebe is big enough for many people contributing different kinds of material. Everyone can still choose what kind of content they create or translate.

Maybe some enjoys puns, poetic expression or swear words. If forced to create other kinds of input, maybe they would lose motivation or even stop. So let them produce what they are motivated to.
I've been learning Danish, and more recently Norwegian, via Anki decks drawing completely or partially from Tatoeba. All the material has been of high quality. Maybe there has a been creative translation here and there.

Incidentally, in the beginning I was confused and amused by the drama going between Tom and Mary, and about why the French language is so important when studying Danish, and where are Mette, Bjørn, Anders, and all the other nice Danish names. Initially I thought that the deck was based on some corpus, and Danes have surprising close connections to France.
Then confusion was dispelled when I encountered Tatoeba and noticed the prevalence of Tom and Mary.


I imagine that with French there might also be regional differences. I know Spanish has some.
Slow and sometimes gives an error that the website is not available, please check the blog or something else for updated.
This might be more visible as a separate discussion.
Yes, it is annoying. The problem will become severe if some language is overwhelmed by such bad terms. With español this does not seem to be the case quite yet, as the desired vocabulary looks extensive.
Not that I know of, unless whoever placed the sentence into their vocabulary removes it.

But if the sentences are fine, why not add them to Tatoeba while you are at it?

(If the sentences were short, then one could add them as parts of dialogues or such, but those did not seem to be short.)
I did not know that asterisk could be used. Thanks. Similar tips seem to apply to Finnish as to Turkish.
Lisäsin juuri sanan "differenciálegyenletek" sanastooni ja se näkyy myös toivotuissa sanoissa. Ehkäpä joku kirjoitti lauseita tai tyhjensi sanastoaan. En tiedä miten käyttäjien jäädyttäminen vaikuttaa sanastoon, jos mitenkään.

I just added "differenciálegyenletek" to my vocabulary and I see it as something that needs sentences. Maybe someone wrote sentences or cleaned up their vocabulary. I do not know how freezing users affects the vocabulary, if it does.
As far as I know, this feature does not currently use stemming at all.

It is not clear if it should. On the other hand, that would prevent the same words appearing multiple times in the wanted sentences -vocabulary, which might be nice, and it will allow for much more diverse sentences even when everyone adds words in infinitive or other fixed form.
On the other hand, maybe someone wants examples of a particular form of a word. At least in Finnish, some words have taken a life of their own in phrases or otherwise and no longer have the same meaning as one would expect, and sometimes there are quite clear distinctions that stemming removes.
A person can delete the words they have added via (replacing 'user' with their own username). I do not know if it possible to see who has added the sentences there.
You are welcome.

My main feature request is for someone to add sentences to the words I have added, but I know it is challenging to implement.
*Why and how to add sentences with words others want to see*

This is a collection of my thoughts about this subject; maybe someone else is also interested.

1. Go to and choose your native (or very strong) language.

2. You have basically two options: Start from words that are part of no or few sentences (the beginning of the list) or from those that are part of a bit less than ten sentences (the end of the list).

3. Write a sentence or several sentences that use the words.


a. You are adding content that someone wants to see, or at least wanted to see.

b. You are adding to the diversity of the corpus. Since these words are not present (in great quantities), adding more sentences that use them makes the corpus more diverse, by one reasonable measure of diversity.
Furthermore, you are probably adding sentences that you would not otherwise have added; this is a collaboration between two people, one requesting a word and another adding a sentence. This creates more diverse sentences.
If several people add sentences that use a particular word, they are likely to add different types of sentences, hence increasing the diversity even more.

c. You are adding original sentences in the language. In some smaller languages, many sentences are translations. Translated sentences typically have simpler or at least restricted grammar and other distinguishing features. (When searching for more on this, "translationese" is a good keyword to begin with.) Having original sentences in all languages is valuable for the corpus and contributes to the diversity and the quality of the corpus. Of course, having translations is valuable, too, for obvious reasons.


A. Starting from the beginning of the list

Pros: You will be adding sentences that use words which are almost non-existent in the corpus.

Cons: You will meet the same word again and again. If the word appears in no sentences and you add one, then that word now has one sentence and you will face it again when contributing sentences to words with a single sentence only. And again when adding sentences to words with two sentences, etc.
(The page is not dynamic, but if you refresh it or open the next page, the words might have changed positions. Unless you open all the pages at once, this is likely to happen.)
Some languages have poor words - even phrases in other languages - at the beginning of the list. One has to scroll past them.

B: Starting from the end of the list, from words that appear in nine, eight, seven, etc. sentences

Pros: When you add a sentence that uses a word used nine times, it will (later and after a refresh) vanish from the page, since it is used in ten or more sentences. This makes the list shorter, which feels nice. Measurable progress.
The words with some sentences are less likely to be in the wrong language, misspelled, very obscure, etc., so it is easier to contribute sentences that use them.
You will not be meeting the same words again and again when going through the list, since you will increase the number of sentences that use the word, but will move towards translating words that are less and less used.

Cons: You are not adding as much to the diversity as you could be, at first.

Notes and tips:

All of the following are considered as different "words" for this feature: "horse", "horses", "can lead a horse to". I doubt capital letters matter, but I have not checked this with any rigour. Adding sentences with declined words will still contribute to the corpus and have all the other benefits explained above, but will not be counted by this feature of Tatoeba's user interface.

Simply ignore words that you do not want to add sentences to, for whatever reason. Sometimes I put in a bit of effort to figure out unknown words in my native language, sometimes not.

It is not terribly useful to add phrases that do not give any indication about the meaning of the word. Thus, adding a sentence like "What is teratology?" is not very helpful, as it only suggests that "teratology" is a noun. Likewise, "Jacob likes teratology." is not terribly helpful in this context. "Jacob likes teratology and other oddities." is better.

Similarly, adding variations of a sentence where the word in question has the same role is not very useful in this context. E.g. "Teratology is related to deformities." and "Teratology has to do with deformities." would both be reasonable sentences, but adding both of them as sentences that use the word "teratology" does not reveal more about that word. Thus, it might not be the ideal thing to do. (Adding both of them as translations of a suitable sentence would, on the other hand, be completely fine and a useful thing to do.)

I think it is healthy for a single member to add a sentence or few to any given word, but trying to clear the entire list by oneself might not be helpful. One is likely to start contributing similar sentences to any given word after the initial inspiration is exhausted. Having several members contribute a sentence or two each would be better for diversity. But if there is only a single member who adds sentences in this way for a given language, then so be it.

If it is difficult to come up with a sentence, I often take a look at a small group of words in the list and try to create a sentence that uses two or even three of them. Constraints breed creativity.

If there is a foreign word on the list, one can (but need not) add a sentence about its pronunciation, etymology, register, social position, meaning, language, script, etc. A single sentence like "Lentokonemekaanikko is a Finnish word." might be okay, but it is better to add more interesting sentences, such as "Lentokonemekaanikko is a Finnish word that combines three different words: "lento" means flight, "kone" machine and "mekaanikko" mechanic (as in a profession or person). The word means someone who repairs, maintains or builds aeroplanes."
Please excuse my English, but I hope the idea is clear: If there is a misspelled word, a foreign word, etc., it might be possible to contribute a meaningful sentence that uses it.
Another example: ""Thier" is a typical misspelling of "their"." would be a reasonable sentence, in my opinion. A sentence about why the misspelling is ubiquitous would make an even more interesting sentence, again in my opinion.
I agree, though also short phrases can be valuable, when they are set expressions or use grammar that is specific to the language.

I especially agree about field-specific vocabulary. There is far too little of it, and such terms are difficult to find translations for. Examples of use are also very helpful.
I do agree that following the guidelines would likely improve the corpus. The cost is the greater barrier to entry. I see this mostly on various Stack exchange sites, where there are lots of community norms, many of which are easy to not encounter before adding questions or answers. Not everyone is friendly when introducing new people to the best practices. In fact, SE has done work to improve this, like adding a notice that someone is new to the website.

Would it appropriate to have a list of best practices about what kinds of sentences to add, with the name issue as one thing there?

What more could be done when someone new starts using the website? Some kind of mentoring system, informal or official?
I am not sure any policing of names in sentences is worth the trouble. It always adds friction to new people when there are more rules, or, even worse, unwritten community norms like not creating the kinds of sentences one sees all the time in the database already.

If the problem is CK creating too many sentences with Tom and Mary, then discussing the matter with them might be the most constructive course of action.
Quite unrelated, but I had to check what a concordance or compiling one means. Could you add a sentence or two to this effect to Tatoeba? These would be precisely the kind of material that an advanced learner finds useful.

I also added "compile" and "concordance" to my vocabulary.
These all are very common words with huge numbers of sentences. Is there a particular reason for actively adding many more to get the same frequencies as the opensubtitles database?