clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search
gillux
22 days ago
Dear Tatoeba contributors,

From this Friday, I will be working on Tatoeba again, thanks to our collaboration with Mozilla (thank you!). I will work to facilitate the use of sentences by Common Voice, but also to improve Tatoeba in general.

One of the ways I would like to achieve this goal is to first ask you what Tatoeba you are dreaming of. I think we focus too much on concrete details and forget to let ourselves dream. Yet dreams are one of the major forces driving us forward. Our Github is full of very concrete "little suggestions" and "little problems", but what are we really aspiring to in order to make Tatoeba a project useful to humanity?

I think that we, who are involved in Tatoeba in one way or another, all of us have in our heads a Tatoeba of our dreams, some big ideals, some big crazy ideas, a personal vision of how it should be, but that we refrain from expressing. So I am asking you to forget about the details, to forget about the quarrels, to think big and far, to let go a little and tell me frankly: which Tatoeba do you dream of?

══════════════════════════════════════════════════════

Chers contributeurs de Tatoeba,

À partir du vendredi qui arrive, je vais travailler à nouveau sur Tatoeba, grâce à notre collaboration avec Mozilla (merci à eux !). Je vais travailler à faciliter l’utilisation des phrases par Common Voice, mais aussi à améliorer Tatoeba de manière générale.

Une des façons dont j’aimerais atteindre ce but, c’est de commencer par vous demander de quel Tatoeba vous rêvez. Je pense que nous nous attardons trop sur des détails concrets et que nous oublions de nous laisser aller à rêver. Pourtant, les rêves sont une des forces majeures qui nous poussent à aller de l’avant. Notre Github est rempli de « petites suggestions » et de « petits problèmes » très concrets, mais à quoi aspirons-nous vraiment pour que Tatoeba devienne un projet utile à l'humanité ?

Je pense que nous, qui nous impliquons de près ou de loin dans Tatoeba, nous avons tous dans notre tête un Tatoeba de nos rêves, de grands idéaux, de grandes idées folles, une vision personnelle de comment ça devrait être, mais que nous nous retenons d’exprimer. Alors je vous demande d’oublier les détails, d’oublier les querelles, de penser grand et loin, de vous lâcher un peu et de me dire franchement : de quel Tatoeba rêvez-vous ?
hide replies
CK
CK
22 days ago - 22 days ago
I like what TRANG said in 2009.

So the concept is : we gather a lot of data, try to organize it, ensure it is of good quality and make it freely accessible, downloadable and redistributable, so that anyone who has a great idea for a language learning application (or a language tool) can just focus on coding the application and rely on us to provide data of excellent quality.

http://blog.tatoeba.org/2009_11_01_archive.html
hide replies
gillux
22 days ago
Yes, but see, Trang is probably the only person who expressed that, and yet it's not a dream, it’s a concept.

What do YOU think, CK? Which Tatoeba do you dream of?
hide replies
CK
CK
22 days ago - 22 days ago
I dream that the data would be of good quality so that anyone who had a great idea for a language learning application (or a language tool) could just focus on coding the application.

The way it is now, to use the sentences for language study requires someone (the developer or a trusted colleague) to proofread and choose sentences for the target language to be learned. Otherwise, students are exposed to bad examples of language usage.
hide replies
shekitten
21 days ago
Isn't that what the system of approve/don't know/reject[/don't vote] is meant to solve?
hide replies
PaulP
19 days ago
Yes, but almost nobody is using it. I see a lot of bad sentences without the red mark.
hide replies
shekitten
16 days ago
This is something I've mentioned, but it would be very helpful to be able to select sentences by whether they *have* been approved rather than whether they haven't been unapproved. I imagine this is in the works. I think it will be very helpful, though it still requires that people approve sentences in their languages.
hide replies
CK
CK
16 days ago - 16 days ago
157 members (less than 0.4% of our members) have added 939,602 OK ratings to 937,368 sentences (about 13% of our sentences).

929,871 (99%) of these ratings were added by the following 20 members.

CK (713,571)
PaulP (142,170)
Guybrush88 (26,706)
bill (24,238)
Selena777 (4,355)
Pfirsichbaeumchen (36,19)
alexmarcelo (2,626)
tulin (2,520)
tornado (2,410)
Wezel (1,459)
soliloquist (1,397)
Raizin (1,289)
Thanuir (1,271)
odexed (1,259)
raggione (852)
umano (809)
Bilmanda (772)
Aiji (765)
Impersonator (734)
Scorpionvenin14 (668)



There are 8,202 non-OK ratings.

You can see a list of members who have added "not OK" ratings.

http://tatoeba.byethost3.com/outdated_ratings.html

Many of these are "outdated ratings", since corrections have been made.


hide replies
PaulP
15 days ago
Very interesting, CK. Thanks! But I would like to see if any member rated one of my sentences "not OK". If they just put "not OK", but don't add a comment, we don't know, do we?
hide replies
shekitten
15 days ago
And it would also be nice to search for sentences that have been marked "OK", for my own learning purposes.

e.g., a list of sentences in Turkish that have been marked "OK"

or, a list of sentences in Turkish that are either by a native speaker or marked "OK"
Ricardo14
22 days ago
My dream: A Tatoeba which we could add as much content as possible at once and also "organize" sentences as easier as we do to organize our flash drives or hard disks, for example (assuming they belong to the same group). That way I believe Tatoeba'd be an "awesomer" resource to study languages (my dream).

By that I think it'd be great if we could a mass of sentences (maybe we'd have a maximum of sentences allowed at a time) and mass tagging sentences. In Portuguese for example, a sentence which begins gy "Eu" is always "1st Person Singular)
Impersonator
22 days ago
I dream of Tatoeba that makes a conscious effort to support smaller, less described languages. These face unique challenges, completely different from the challenges of larger languages. I hope those challenges will be thought about from the beginning, and not added as an afterthought.

I dream of Tatoeba that provides not just translations but other grammatical information and perhaps glosses.

I dream of Tatoeba that allows to add some context for the sentences: perhaps descriptions. It would make sense to have some free-form descriptions to specify the status of speakers and their gender (if the language distinguishes between them).

Or maybe images and videos! Sentences about foreign clothes or food would make so much more sense. Something that would make it simpler to incorporate uncommon phrases that might have no direct translation.

As a totally far-fetched example example, there is a language Guugu Yimithirr that is known for using 'north' and 'east' instead of 'forward' or 'left'. First, I dream we'll have this language in Tatoeba. Second, I dream we'll have a way to specify the geographical position of speaker to map spacial position of items described in the sentences, to show whether 'left' corresponds to 'east' or to 'west'.
gillux
21 days ago - 21 days ago
I dream that Tatoeba is a project I can be proud of when I’m showing it to my friends: "Do you know this website, Tatoeba?" "No, let me check it out." The homepage loads instantly. Everything’s localized, neat, beautiful, self-explanatory and easy to use from a smartphone or a computer. It shows some inspiring and featured example sentences. My friend tries to makes a search. The results are very relevant and show up almost instantly.

I dream that Tatoeba is a worldwide reference among language enthusiasts. Most professional translators prefer it over closed-source solutions because the results are more diverse and accurate, and all of their colleagues are on it too. Popular dictionaries all include Tatoeba’s sentences to illustrate their definitions. Whenever people want to make a point whether a particular expression is correct or not, widely used or not, they don’t argue by showing Google’s number of results; they show Tatoeba’s results instead. Tatoeba no longer relies on the ISO to include a new language. It’s like the other way around: having a language listed on Tatoeba is a point that may convince the ISO folks to include it too.

I dream that Tatoeba is a key tool for most language teachers around the world to prepare their lessons. Just give Tatoeba a few grammatical concepts and vocabulary items to study, and it gives you the materials you need.

I dream that Tatoeba’s community is huge, diverse and everyone’s equal. There are many active members from all Asian countries, the Global South, and all the minorities on Earth are well represented. Countries that are threatening certain language minorities are constantly trying to block Tatoeba because they can’t stand that these languages are being listed as such on something as famous as Tatoeba. Tatoeba is regularly mentioned on the news whenever a language minority is being threatened.
hide replies
Guybrush88
21 days ago
I agree, in particular, on the part of language teaching. What I'd like to see is the possibility to see enhancements on tags, such as mass tagging to quickly provide more new metadata about sentences, so that users can easily find all the sentences with a given tag with the advanced search, and, by consequence, fetch more sentences regarding a specific grammar topic (I'm aware that some people, myself included, already tag sentences with grammar information, such as the verb tense). This would work also with quotes (the ones that comply with Tatoeba's licenses), since people, when contributing quotes, might want to have a quick way to provide the necessary attribution to the quotes they submit.
Ricardo14
21 days ago
I also dream that *all* sentences - whether posted by natives or not (since they're proofread) - have the same "status".
hide replies
shekitten
21 days ago
I think marking sentences by native speakers is a good idea. Proofreading is technically possible as well and it'll hopefully eventually be possible to get data (in regular search) of proofread sentences, not simply native sentences. This leaves the choice to whoever is using the data.

Proofreading is a great thing but it also requires a lot of people to take on the work of proofreading, and this skill isn't the same as translating, so we might want to reach out to people with copy editing skills who aren't necessarily translators.
AlanF_US
21 days ago - 18 days ago
Thanks for this question, gillux.

My dream is, to paraphrase the Biblical prophet Micah, a Tatoeba where each member can sit under their own vine and fig tree, and work with language speedily and in peace. In other words, each person can customize Tatoeba in such a way that they can do what they want -- search, submit and answer queries, contribute sentences, annotate, proofread, and correct -- in a way that helps themselves and the community, but that does not get in anyone else's way. For instance, people who are strong in a minority language but less strong in a majority language can contribute sentence pairs without worrying about sentences in the weaker language showing up in the list of results of someone who is looking for 100% correct sentences.

Another part of my vision is integration with other communities and tools that are particularly good at their own specialties.

My vine and fig tree orchard would be something like this:

INTEGRATION:
(1) Use any of the following kinds of sites without leaving Tatoeba:
- a dictionary (like Morfix for Hebrew, or Wiktionary for Russian) that can take an inflected form and give me its accented form and etymology, as well as its dictionary form
- a real-world collection of sentence pairs (like Reverso Context)
(2) Make it easy and quick to get from the sentence search stage all the way to submitting a chosen sentence to a flashcard program (like Anki).
(3) Make it easy to get back from Anki to sentence search so I can see other sentences like ones I selected earlier.

SEARCH:
(4) Run the following preconfigured search:

Look for sentences containing word A, favoring but not limited to sentences that:
- contain an exact match for word A
- are four to eight words in length
- are owned by users B, C, and D
- are arranged/"randomized" so that sentences with slight variations (for instance, pronoun changes) are less likely to appear near each other
- have a direct translation into language L

QUERIES:
(5) Ask for sentences containing a particular word in a particular sense, with the ability to engage in a dialogue (Can it be used in this way? What about this way?).

(6) Contribute a sentence with a link to the query that it was intended to answer.

ANNOTATION:
(7) Let me add accent marks to a Russian sentence, or vowels to a Hebrew sentence, at the time I encounter them, knowing that others can choose to suppress the accents/vowels, once or always, and they won't interfere with searches. Or find a tool that adds them automatically with a very high degree of accuracy (and gives us the ability to edit bad automatic suggestions).

CORRECTION:
(8) Make it easy for me to change a sentence that is already linked to others:
- allow me to automatically submit comments on the linked sentences ("If sentence A is changed to B, would this sentence still match?")
- allow me to break existing links and create new links easily

LAYOUT:
(9) Allow me to exclude layout elements, like "Tips" and the wide margins on the Wall, that take up room in my browser.

MOBILITY:
(10) Let me use Tatoeba from a mobile device easily.
hide replies
AlanF_US
18 days ago
To elaborate on item 4:

Currently, we can only specify that search results satisfy a criterion. We can't specify that we want results that may or may not satisfy that criterion but that are sorted so that the ones that do satisfy it occur at the top of the list. This means that I need to guess beforehand how many hits I'm likely to get. If I guess too low, I have to either page through lots of results that are not what I'm looking for, or follow it with a more restrictive search. If I guess too high, I have to follow it with a less restrictive search.

Our sorting engine, Sphinx, does have sorting modes:

http://sphinxsearch.com/docs/cu...#sorting-modes

I submitted an enhancement ticket about this:

https://github.com/Tatoeba/tatoeba2/issues/1804
cojiluc
21 days ago
Améliorer le moteur de recherche de Tatoeba et enrichir ses options.
Thanuir
20 days ago
More specialist vocabulary would be extremely useful. Mathematics vocabulary is the most useful for me, and there is a decent amount of relevant sentences, but this is probably a happy accident more than a trend. For many languages it is pretty easy to figure out the common vocabulary, but the more specialized, the harder it becomes.

Translating these sentences is also tricky, since it needs someone proficient with at least two languages and with the field in question.

...

Process-wise, it would be nice to have sentences in smaller languages translated with some frequency. As is, the sentences more peculiar to such languages and cultures can easily remain untranslated for very long times, for understandable reasons. But still, it would be nice if this was not the case.
soliloquist
19 days ago
A forum section with subforums for different communities/languages where users can discuss matters and ask questions would be nice.
hide replies
CK
CK
18 days ago - 18 days ago
The standard forum format might also be a good replacement for the Wall.

Often, when many comments are added to the same Wall post, it gets difficult to follow.

I think that wouldn't happen if we had a standard forum.

To maintain a somewhat similar feel to the website, it might be possible to show the forum titles on the right side of the home page, in much the same way that the current Wall messages are shown.

I wonder if it would be possible to adapt one of the open source online forums, using the same login usernames and passwords as tatoeba.org. Or, a CakaPHP plugin . https://github.com/CakeDC/cakephp-forum .

[EDIT]

Here is a demo of the CakePHP-Forum.

http://cakephp-forum.herokuapp.com/forum
hide replies
cojiluc
7 days ago
I think this is a good suggestion. A forum has many advantages over the current wall. By contrast I suggest a forum like stackexchange forums. A free and open source version is available : https://www.question2answer.org/

The format of stackexchange forums is very nice and they have many advantages over the old traditional forums.
shekitten
18 days ago
The ability to translate tags would be nice, and a good way to keep Tatoeba language-neutral while avoiding a large number of tags in different languages with the same meaning.

e.g. be able to translate English "proverb" as Esperanto "proverbo" and Interlingua "proverbio", and it would show up differently depending on the user's interface language.
hide replies
PaulP
18 days ago
I couldn't agree more, Shekitten.
Thanuir
17 days ago
Tag synonyms would be a potential way of doing this, since there are at the moment tags "mathematics", "maths" and probably also "math", and they are not translations, but still redundant. I am sure other similar situations exist.

On the downside, having tag synonyms or translations would create a need to discuss them more carefully and decide which tags are actually synonyms and which are not; tags like "seven syllables" would be a problem, as well as concepts which do not translate one-to-one between languages.
hide replies
shekitten
16 days ago
> On the downside, having tag synonyms or translations would create a need to discuss them more carefully and decide which tags are actually synonyms and which are not; tags like "seven syllables" would be a problem, as well as concepts which do not translate one-to-one between languages.

This is a problem that also occurs with sentences themselves, but you're right that it would be especially difficult here. I still think it's worth it.