Wall (6,004 threads)
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
2 hours ago
2 hours ago
4 hours ago
There will be a polyglot conference (starting tomorrow - Oct 16), and I posted something about Tatoeba - https://forum.polyglotconferenc...p-people/1072.
= Story =
For the past few years, I’ve been regularly asking myself about the meaning of Tatoeba. What is Tatoeba? What’s the purpose? Why do people contribute? What kind of sentences should be on Tatoeba?
I’ve discovered that members have very different answers to these questions. Some of us teach kids, some are here for audio recordings, some for data science, some just to support the idea of free access to knowledge… Since we work collaboratively, I thought we’ll just have to somehow keep going on despite our differences. After all, this diversity of individuals might be one of Tatoeba’s strongest point.
However the problem comes back when one wants to introduce Tatoeba to new users (or otherwise make Tatoeba newcomers-friendly). What "slogan" text should we put on the home page? Should we keep the random sentence on the home page or not? How to agree on the right-pane help text of the "add new sentences" page? How to offer guidance to new contributors without being influenced by your own interests? Corollary: how can new users find their own meaning in Tatoeba (and not quit because of the lack of it)? All this is very controversial and has always been an unsolvable puzzle to me.
(To see an example of how new members question the purpose of Tatoeba, I suggest reading the debrief section of this UX test: https://en.wiki.tatoeba.org/art...test-3#debrief )
= Idea =
But here is an idea. First, let’s think of Tatoeba as a platform. I like the word platform because instead of implying a purpose, it only implies the idea of a place. A place to gather with tools to work together. (If you agree, I’d like to see the word platform used on the home page for guests.)
Then, we introduce the concept of projects. Let’s say the main value to me in Tatoeba is translations of dialogs from Japanese to French. I create a project named "Traduisons les dialogues japonais" with a description, why doing that, how to go about it, some guidelines etc. Now other members, if interested, can join my project, and we can share tips, look after each other, celebrate our own milestones etc. We can do our stuff without ever imposing a particular vision to the rest of the community. Everyone’s free to create or join any project as they see fit. All projects are equal, whether big or small. It’s up to every project team to attract new members. New members can start contributing by simply joining a particular project. I can see at a glance how a particular member is involved in Tatoeba by looking at the projects he or she’s into.
This could be the foundation of a whole new way to contribute. Maybe we don’t need to implement specialized contribution functionalities like word requests any more. Just create a new project.
By the way, Wikipedia uses the concept of projects too, see for instance the Music project: https://en.wikipedia.org/wiki/W...iProject_Music
This sounds like a great idea, as long as:
- there are enough developers interested in the idea
- there are enough willing project leaders and participants
- it takes into account that many projects will eventually run out of energy, while some will not even get started
Let's take the worst case and make sure it isn't a step down from what we already have. Let's say that we put effort into developing the infrastructure for a platform for projects, and then into advertising it, but we get very little interest. Will the site then look more dead, and will that end up working against us? It all depends on how we go about it.
Most of the "projects" that I've gotten involved in so far, or have just considered, have been pretty modest, especially in terms of how many other participants they've involved: usually three at the most. Either I've wanted people to provide something (original sentences, translations, proofreading, audio), or I've responded to a call to provide something (original sentences, translations, proofreading). Maybe I've collected a list of 100 English sentences that I'd like translated into one of the languages I'm learning, and then sent private messages to three native speakers who are active in that language to ask them to do it. Or someone has provided a list of several hundred underrepresented vocabulary words (for instance, via Tatominer [ https://tatominer.imfast.io/eng.html ]), and I've tried to write sentences for them. I would have certainly welcomed infrastructure that made these tasks easier. But it's an open question as to whether I could have motivated people to participate in these projects on a large scale, in such a way that our pooled energy built up and served to keep us motivated.
Maybe in the same way that you're conducting interviews with new people to see how they use the site, it would make sense to do it with longtime community members to find out how they could imagine using this functionality.
Allow "hidden" sentences / boost collaborative work
For contributing translations I regularly face the following problems:
(1) I know* a translation of a sentence to a non-native language.
E.g. My native language: German; I know* the Japanese translation of an English sentence.
* know a translation = probability of correct translation is around 80% - 100%.
- I'm not native, so I can never be 100% sure.
- Even if it's correct, it's not trusted, as I'm not native in that language
(2) I have an idea of a translation, but I can't decide for a full sentence.
(3) I have an idea of a translation, but it really needs to be discussed first, before it gets published.
My idea would be to add some kind of intermediate step - an "intermediate sentence". It's published for review and for discussion, but it's not published for search results.
E.g. A Japanese native could click on Menu -> Contribute -> Verify translations to see non-published translations of sentences.
Evaluation / Implementation / Discussion
This enhancement of tatoeba is probably rather big - the exact design and implementation will be very time consuming. So a cost-benefit calculation would be essential.
Such a mechanism might also increase overall quality of sentences, as they could first go through a non-public review process. I.e. garbage sentence don't get published before they get deleted.
At this time, you can put your suggested translation as a comment.
A number of us have done this and perhaps are still doing this.
When a native speaker sees your translation, he/she can copy and paste it and edit it if needed as the translation.
Some members also go ahead and make the translation, and then release the sentence, allowing a native speaker to adopt the sentence, making necessary corrections if needed.
"At this time, you can put your suggested translation as a comment."
I know a Hungarian contributor who posted hundreds of possible German translations to his Hungarian sentences in the comment section. Sadly, he nearly never got any reactions to those suggestions.
Don't worry, just add the translation and use the 'needs native check' tag.
If it's not perfect, then it will be corrected.
"Tévedni emberi dolog."
It looks like a good idea to me. I suggest you post it on Github too, because Wall threads tend to sink into oblivion.
Using comments to suggest translations is clearly inefficient because the information is likely to go unnoticed without ever reaching any potentially interested members (and the original sentence owner gets unnecessarily notified).
While better, using actual sentences marked as 'needs native check' has also many drawbacks. The biggest in my opinion is that a lot of sentences produced like that by non-natives are not bad enough to be deleted, yet not good enough to get native’s approval. And when we don’t know how to improve them, they just stay like that for years.
** Stats & Graphs **
Tatoeba Stats, Graphs & Charts have been updated:
Is there a way to look for a word/sentence in a certain language translated into multiple languages as done in https://nicetranslator.com/?
1. Go to your settings.
2. Go to the "languages" option and enter the languages you are interested in.
"Enter ISO 639-3 codes, separated with a comma (e.g.: jpn,epo,ara,deu). Tatoeba will then only display translations in the languages you indicated. You can leave this empty to display translations in all languages."
3. And then make your searches.
Note that tatoeba.org is a collection of sentences and translations and not machine translations like the site you mentioned, so there will be times on tatoeba.org that a translation will not exist, as opposed to machine translation that will always attempt to create a translation.
I'm not sure if this has been brought up. It would be good if reviews were removed from deleted sentences, because it does not seem to be possible manually. It may be useful sometimes if corpus maintainers were able to edit that field just like the tags.
Saluton! Ĉu estas maniero serĉi Esperantajn frazojn indikante nur la radikon de vorto? (Simile al serĉado de anglaj frazoj.) Ekzemple, mi ŝatus tajpi "parol" kaj trovi frazojn kun "parolas", "paroli" kaj "parolado". Ĉu tio eblas?
Søgemotoret gør det med nogle språk, men ikke alle.
Du kan prøve stjernesymbolet, hvis esperanto egner sig til det: søg efter parol*.
Die Suchfunktion benutzt Snowball https://snowballstem.org/ um bei der Suche in Englisch oder anderen unterstützten Sprachen die Suchbegriffe auf den Wortstamm zu reduzieren und auch andere Formen desselben Wortes zu finden. Leider wird Esperanto von Snowball noch nicht unterstützt. Um das zu ändern, müsste sich jemand finden, der programmieren kann und sich auch mit Esperanto auskennt, um ein Snowball-Modul für Esperanto zu implementieren. Nach einem Blick in dein Profil scheinst du beide Voraussetzungen zu erfüllen. Wenn du also Zeit und Lust hast ...
Hi Yorwba! Lust ja, Zeit aber keine. :) Aber vielen Dank für die Erklärungen. :)
Just a note: the list of languages for which we support stem search is listed on the "How to Search for Text" page ( https://en.wiki.tatoeba.org/art...w/text-search# ), which can be found by clicking on "Help" next to the search bar. Currently, the list is as follows:
Arabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Lithuanian, Nepali, Norwegian (Bokmål), Portuguese, Romanian, Russian, Spanish, Swedish, Tamil and Turkish
** Many Sentences on Each Page **
The 10,000 Shortest English Sentences with Audio and No Links
The 8,000 Longest English Sentences with Audio and No Links
You can quickly browse these lists of sentences once the pages load.
These work well on my computer, but may not work so well on less powerful computers or smartphones.
There are over 31,000 English sentences with audio that do not yet have any translations.
You can find them with tatoeba.org's random search.
** Tatoeba.org Native Speakers with Native Language Sentences **
Updated: October 3, 2020
6,413 = Native Speaker Usernames with Native Speaker Sentences
145 = The Number of Languages with Identified Native Speaker Contributions
6,731,996 = The Number of Sentences These Members Own in Their Native Languages
** Stats & Graphs **
Tatoeba Stats, Graphs & Charts have been updated:
De nada :-D