gillux gillux 2019-02-27 04:49 2019-02-27 04:49:40 link permalink

Dear Tatoeba contributors,

From this Friday, I will be working on Tatoeba again, thanks to our collaboration with Mozilla (thank you!). I will work to facilitate the use of sentences by Common Voice, but also to improve Tatoeba in general.

One of the ways I would like to achieve this goal is to first ask you what Tatoeba you are dreaming of. I think we focus too much on concrete details and forget to let ourselves dream. Yet dreams are one of the major forces driving us forward. Our Github is full of very concrete "little suggestions" and "little problems", but what are we really aspiring to in order to make Tatoeba a project useful to humanity?

I think that we, who are involved in Tatoeba in one way or another, all of us have in our heads a Tatoeba of our dreams, some big ideals, some big crazy ideas, a personal vision of how it should be, but that we refrain from expressing. So I am asking you to forget about the details, to forget about the quarrels, to think big and far, to let go a little and tell me frankly: which Tatoeba do you dream of?


Chers contributeurs de Tatoeba,

À partir du vendredi qui arrive, je vais travailler à nouveau sur Tatoeba, grâce à notre collaboration avec Mozilla (merci à eux !). Je vais travailler à faciliter l’utilisation des phrases par Common Voice, mais aussi à améliorer Tatoeba de manière générale.

Une des façons dont j’aimerais atteindre ce but, c’est de commencer par vous demander de quel Tatoeba vous rêvez. Je pense que nous nous attardons trop sur des détails concrets et que nous oublions de nous laisser aller à rêver. Pourtant, les rêves sont une des forces majeures qui nous poussent à aller de l’avant. Notre Github est rempli de « petites suggestions » et de « petits problèmes » très concrets, mais à quoi aspirons-nous vraiment pour que Tatoeba devienne un projet utile à l'humanité ?

Je pense que nous, qui nous impliquons de près ou de loin dans Tatoeba, nous avons tous dans notre tête un Tatoeba de nos rêves, de grands idéaux, de grandes idées folles, une vision personnelle de comment ça devrait être, mais que nous nous retenons d’exprimer. Alors je vous demande d’oublier les détails, d’oublier les querelles, de penser grand et loin, de vous lâcher un peu et de me dire franchement : de quel Tatoeba rêvez-vous ?

CK CK 2019-02-27 05:36, edited 16 days ago 2019-02-27 05:36:56, edited 2019-11-01 03:17:42 link permalink

gillux gillux 2019-02-27 05:49 2019-02-27 05:49:00 link permalink

Yes, but see, Trang is probably the only person who expressed that, and yet it's not a dream, it’s a concept.

What do YOU think, CK? Which Tatoeba do you dream of?

CK CK 2019-02-27 07:36, edited 16 days ago 2019-02-27 07:36:54, edited 2019-11-01 03:17:48 link permalink

shekitten shekitten 2019-02-27 11:10 2019-02-27 11:10:57 link permalink

Isn't that what the system of approve/don't know/reject[/don't vote] is meant to solve?

PaulP PaulP 2019-03-02 09:40 2019-03-02 09:40:21 link permalink

Yes, but almost nobody is using it. I see a lot of bad sentences without the red mark.

shekitten shekitten 2019-03-04 11:10 2019-03-04 11:10:34 link permalink

This is something I've mentioned, but it would be very helpful to be able to select sentences by whether they *have* been approved rather than whether they haven't been unapproved. I imagine this is in the works. I think it will be very helpful, though it still requires that people approve sentences in their languages.

CK CK 2019-03-05 00:30, edited 16 days ago 2019-03-05 00:30:50, edited 2019-11-01 03:14:28 link permalink

PaulP PaulP 2019-03-05 10:45 2019-03-05 10:45:03 link permalink

Very interesting, CK. Thanks! But I would like to see if any member rated one of my sentences "not OK". If they just put "not OK", but don't add a comment, we don't know, do we?

shekitten shekitten 2019-03-05 10:50 2019-03-05 10:50:50 link permalink

And it would also be nice to search for sentences that have been marked "OK", for my own learning purposes.

e.g., a list of sentences in Turkish that have been marked "OK"

or, a list of sentences in Turkish that are either by a native speaker or marked "OK"

Ricardo14 Ricardo14 2019-02-27 08:26 2019-02-27 08:26:08 link permalink

My dream: A Tatoeba which we could add as much content as possible at once and also "organize" sentences as easier as we do to organize our flash drives or hard disks, for example (assuming they belong to the same group). That way I believe Tatoeba'd be an "awesomer" resource to study languages (my dream).

By that I think it'd be great if we could a mass of sentences (maybe we'd have a maximum of sentences allowed at a time) and mass tagging sentences. In Portuguese for example, a sentence which begins gy "Eu" is always "1st Person Singular)

Impersonator Impersonator 2019-02-27 08:32 2019-02-27 08:32:43 link permalink

I dream of Tatoeba that makes a conscious effort to support smaller, less described languages. These face unique challenges, completely different from the challenges of larger languages. I hope those challenges will be thought about from the beginning, and not added as an afterthought.

I dream of Tatoeba that provides not just translations but other grammatical information and perhaps glosses.

I dream of Tatoeba that allows to add some context for the sentences: perhaps descriptions. It would make sense to have some free-form descriptions to specify the status of speakers and their gender (if the language distinguishes between them).

Or maybe images and videos! Sentences about foreign clothes or food would make so much more sense. Something that would make it simpler to incorporate uncommon phrases that might have no direct translation.

As a totally far-fetched example example, there is a language Guugu Yimithirr that is known for using 'north' and 'east' instead of 'forward' or 'left'. First, I dream we'll have this language in Tatoeba. Second, I dream we'll have a way to specify the geographical position of speaker to map spacial position of items described in the sentences, to show whether 'left' corresponds to 'east' or to 'west'.

gillux gillux 2019-02-27 10:13, edited 2019-02-27 10:17 2019-02-27 10:13:30, edited 2019-02-27 10:17:19 link permalink

I dream that Tatoeba is a project I can be proud of when I’m showing it to my friends: "Do you know this website, Tatoeba?" "No, let me check it out." The homepage loads instantly. Everything’s localized, neat, beautiful, self-explanatory and easy to use from a smartphone or a computer. It shows some inspiring and featured example sentences. My friend tries to makes a search. The results are very relevant and show up almost instantly.

I dream that Tatoeba is a worldwide reference among language enthusiasts. Most professional translators prefer it over closed-source solutions because the results are more diverse and accurate, and all of their colleagues are on it too. Popular dictionaries all include Tatoeba’s sentences to illustrate their definitions. Whenever people want to make a point whether a particular expression is correct or not, widely used or not, they don’t argue by showing Google’s number of results; they show Tatoeba’s results instead. Tatoeba no longer relies on the ISO to include a new language. It’s like the other way around: having a language listed on Tatoeba is a point that may convince the ISO folks to include it too.

I dream that Tatoeba is a key tool for most language teachers around the world to prepare their lessons. Just give Tatoeba a few grammatical concepts and vocabulary items to study, and it gives you the materials you need.

I dream that Tatoeba’s community is huge, diverse and everyone’s equal. There are many active members from all Asian countries, the Global South, and all the minorities on Earth are well represented. Countries that are threatening certain language minorities are constantly trying to block Tatoeba because they can’t stand that these languages are being listed as such on something as famous as Tatoeba. Tatoeba is regularly mentioned on the news whenever a language minority is being threatened.

Guybrush88 Guybrush88 2019-02-27 11:23 2019-02-27 11:23:15 link permalink

I agree, in particular, on the part of language teaching. What I'd like to see is the possibility to see enhancements on tags, such as mass tagging to quickly provide more new metadata about sentences, so that users can easily find all the sentences with a given tag with the advanced search, and, by consequence, fetch more sentences regarding a specific grammar topic (I'm aware that some people, myself included, already tag sentences with grammar information, such as the verb tense). This would work also with quotes (the ones that comply with Tatoeba's licenses), since people, when contributing quotes, might want to have a quick way to provide the necessary attribution to the quotes they submit.

Ricardo14 Ricardo14 2019-02-27 10:20 2019-02-27 10:20:54 link permalink

I also dream that *all* sentences - whether posted by natives or not (since they're proofread) - have the same "status".

shekitten shekitten 2019-02-27 11:15 2019-02-27 11:15:50 link permalink

I think marking sentences by native speakers is a good idea. Proofreading is technically possible as well and it'll hopefully eventually be possible to get data (in regular search) of proofread sentences, not simply native sentences. This leaves the choice to whoever is using the data.

Proofreading is a great thing but it also requires a lot of people to take on the work of proofreading, and this skill isn't the same as translating, so we might want to reach out to people with copy editing skills who aren't necessarily translators.

AlanF_US AlanF_US 2019-02-27 13:50, edited 2019-03-02 15:35 2019-02-27 13:50:49, edited 2019-03-02 15:35:25 link permalink

Thanks for this question, gillux.

My dream is, to paraphrase the Biblical prophet Micah, a Tatoeba where each member can sit under their own vine and fig tree, and work with language speedily and in peace. In other words, each person can customize Tatoeba in such a way that they can do what they want -- search, submit and answer queries, contribute sentences, annotate, proofread, and correct -- in a way that helps themselves and the community, but that does not get in anyone else's way. For instance, people who are strong in a minority language but less strong in a majority language can contribute sentence pairs without worrying about sentences in the weaker language showing up in the list of results of someone who is looking for 100% correct sentences.

Another part of my vision is integration with other communities and tools that are particularly good at their own specialties.

My vine and fig tree orchard would be something like this:

(1) Use any of the following kinds of sites without leaving Tatoeba:
- a dictionary (like Morfix for Hebrew, or Wiktionary for Russian) that can take an inflected form and give me its accented form and etymology, as well as its dictionary form
- a real-world collection of sentence pairs (like Reverso Context)
(2) Make it easy and quick to get from the sentence search stage all the way to submitting a chosen sentence to a flashcard program (like Anki).
(3) Make it easy to get back from Anki to sentence search so I can see other sentences like ones I selected earlier.

(4) Run the following preconfigured search:

Look for sentences containing word A, favoring but not limited to sentences that:
- contain an exact match for word A
- are four to eight words in length
- are owned by users B, C, and D
- are arranged/"randomized" so that sentences with slight variations (for instance, pronoun changes) are less likely to appear near each other
- have a direct translation into language L

(5) Ask for sentences containing a particular word in a particular sense, with the ability to engage in a dialogue (Can it be used in this way? What about this way?).

(6) Contribute a sentence with a link to the query that it was intended to answer.

(7) Let me add accent marks to a Russian sentence, or vowels to a Hebrew sentence, at the time I encounter them, knowing that others can choose to suppress the accents/vowels, once or always, and they won't interfere with searches. Or find a tool that adds them automatically with a very high degree of accuracy (and gives us the ability to edit bad automatic suggestions).

(8) Make it easy for me to change a sentence that is already linked to others:
- allow me to automatically submit comments on the linked sentences ("If sentence A is changed to B, would this sentence still match?")
- allow me to break existing links and create new links easily

(9) Allow me to exclude layout elements, like "Tips" and the wide margins on the Wall, that take up room in my browser.

(10) Let me use Tatoeba from a mobile device easily.

AlanF_US AlanF_US 2019-03-02 16:03 2019-03-02 16:03:01 link permalink

To elaborate on item 4:

Currently, we can only specify that search results satisfy a criterion. We can't specify that we want results that may or may not satisfy that criterion but that are sorted so that the ones that do satisfy it occur at the top of the list. This means that I need to guess beforehand how many hits I'm likely to get. If I guess too low, I have to either page through lots of results that are not what I'm looking for, or follow it with a more restrictive search. If I guess too high, I have to follow it with a less restrictive search.

Our sorting engine, Sphinx, does have sorting modes:

I submitted an enhancement ticket about this:

cojiluc cojiluc 2019-02-28 07:56 2019-02-28 07:56:54 link permalink

Améliorer le moteur de recherche de Tatoeba et enrichir ses options.

Thanuir Thanuir 2019-02-28 18:38 2019-02-28 18:38:06 link permalink

More specialist vocabulary would be extremely useful. Mathematics vocabulary is the most useful for me, and there is a decent amount of relevant sentences, but this is probably a happy accident more than a trend. For many languages it is pretty easy to figure out the common vocabulary, but the more specialized, the harder it becomes.

Translating these sentences is also tricky, since it needs someone proficient with at least two languages and with the field in question.


Process-wise, it would be nice to have sentences in smaller languages translated with some frequency. As is, the sentences more peculiar to such languages and cultures can easily remain untranslated for very long times, for understandable reasons. But still, it would be nice if this was not the case.

soliloquist soliloquist 2019-03-01 19:42 2019-03-01 19:42:16 link permalink

A forum section with subforums for different communities/languages where users can discuss matters and ask questions would be nice.

CK CK 2019-03-03 01:09, edited 16 days ago 2019-03-03 01:09:51, edited 2019-11-01 03:14:40 link permalink

cojiluc cojiluc 2019-03-13 16:35 2019-03-13 16:35:13 link permalink

I think this is a good suggestion. A forum has many advantages over the current wall. By contrast I suggest a forum like stackexchange forums. A free and open source version is available :

The format of stackexchange forums is very nice and they have many advantages over the old traditional forums.

shekitten shekitten 2019-03-03 02:15 2019-03-03 02:15:23 link permalink

The ability to translate tags would be nice, and a good way to keep Tatoeba language-neutral while avoiding a large number of tags in different languages with the same meaning.

e.g. be able to translate English "proverb" as Esperanto "proverbo" and Interlingua "proverbio", and it would show up differently depending on the user's interface language.

PaulP PaulP 2019-03-03 09:40 2019-03-03 09:40:34 link permalink

I couldn't agree more, Shekitten.

Thanuir Thanuir 2019-03-04 08:14 2019-03-04 08:14:15 link permalink

Tag synonyms would be a potential way of doing this, since there are at the moment tags "mathematics", "maths" and probably also "math", and they are not translations, but still redundant. I am sure other similar situations exist.

On the downside, having tag synonyms or translations would create a need to discuss them more carefully and decide which tags are actually synonyms and which are not; tags like "seven syllables" would be a problem, as well as concepts which do not translate one-to-one between languages.

shekitten shekitten 2019-03-04 11:04 2019-03-04 11:04:12 link permalink

> On the downside, having tag synonyms or translations would create a need to discuss them more carefully and decide which tags are actually synonyms and which are not; tags like "seven syllables" would be a problem, as well as concepts which do not translate one-to-one between languages.

This is a problem that also occurs with sentences themselves, but you're right that it would be especially difficult here. I still think it's worth it.