clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search

Wall (5310 threads)

sharptoothed
2019-03-18 10:38
** Stats & Graphs **

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/
hide replies
Guybrush88
2019-03-18 10:48
thanks :)
Ricardo14
2019-03-18 23:07
Thank you so much!
TRANG
2019-03-16 11:43
**Traduction des Conditions Générales d'Utilisation (CGU)**

Nous avons mis à jour nos CGU et elles ne sont pour l'instant qu'en français. Si vous comprenez suffisament bien le français et pouvez nous aider à les traduire vers d'autres langues, je vous invite à nous rejoindre sur Transifex:

https://www.transifex.com/tatoe...-terms-of-use/

C'est la plateforme que nous utilisons pour gérer les traductions de l'interface du site web. Si vous ne comprenez pas très bien comment l'utiliser, n'hésitez pas à poser vos question ici sur le Mur ou en m'envoyant un message privé.

Vous pouvez sinon vous référer à notre wiki[1] ou à la documentation de Transifex[2].

Merci!

---

[1] https://en.wiki.tatoeba.org/art...ce-translation
[2] https://docs.transifex.com/
hide replies
JeanM
2019-03-16 17:16
J'ai essayé de rejoindre l'équipe pour l'anglais, mais je n'arrive pas à sélectionner "English" .
hide replies
TRANG
2019-03-17 19:41
Je pense qu'il fallait d'abord que quelqu'un accepte ton inscription au groupe. Est-ce que tu peux réessayer maintenant?
hide replies
JeanM
2019-03-17 23:23
Ça marche, merci !
CK
CK
2019-03-17 03:07
The 178 Languages with Registered Native Speakers

http://tatoeba.byethost3.com/19...-speakers.html
danepo
2019-02-28 12:46 - 2019-03-01 01:37
Mozilla hat seine Sprachdatensammlung Common Voice öffentlich freigegeben
https://www.heise.de/newsticker...i-4323042.html

https://voice.mozilla.org/en/datasets
hide replies
shekitten
2019-02-28 19:02
Al ĉiuj: Ĉu nomoj krom 'Tomo' ne estus tre utilaj por tiu projekto? Ŝajnas, ke oni maljuste ĉikanas uzantojn, kiuj faras tiajn frazojn.

To everyone: Wouldn't names other than "Tom" be very useful for that project? It seems like users who make those sorts of sentences are unjustly hounded about it.
hide replies
Ricardo14
2019-03-01 20:45
I agree. Tom has been used too much. I know it has been adopted to avoid repetition and it is not against Tatoeba rules but I truly beliveve there are other ways to achieve such a thing.
hide replies
shekitten
2019-03-02 17:32
I'm not sure Tom has been used too much, but it would be nice — and useful to projects that use our data — if we had other names, especially non-Western ones. OsoHombre has used a lot of Arab names in his sentences, and ended up being repeatedly told to use the name "Tom" over and over after already giving his reasons for not doing so. I think using other names couldn't possibly hurt and is actually very helpful to some use cases, like Mozilla Common Voice.
hide replies
AlanF_US
2019-03-02 18:06
I believe there was only one person in favor of restricting sentences to this small number of names, and Trang explicitly said that this was not a Tatoeba policy. However, that person is a very prolific contributor, and is not about to start adding sentences with other names. Basically, this comes down to the following: If you want to see sentences with proper nouns other than "Tom", "Mary", "Boston", and "October", add them yourself -- but don't be surprised if you continue to see many more sentences that only use those nouns.
hide replies
PaulP
2019-03-03 09:47
When I started learning Romanian I used the Anki app based on the Tatoeba corpus. I remember that I got a row of sentences similar to "Tom is sleeping", "Mary is sleeping", "Fadil is sleeping" etc. It is very annoying to translate them all. In my opinion it is quite OK to use other names than Tom, but before adding a sentence please make sure that you don't create near duplicates.
hide replies
shekitten
2019-03-03 13:22
Is there a standard program that generates these decks? If so, it would likely be possible to change this program in a way that doesn't add near-duplicates to the deck.
Aiji
2019-03-03 14:43
As shekitten suggested, and as frustrating as it can be, the problem lies in the tool, not in the source.

Having near-duplicates sentences could be very useful to a tool trying to extract information from sentences, as a training set, for example.

I think it is harder for the maker of the second tool to create near-duplicates out of nothing than for the one of the first to avoid near-duplicates that already exist.

As always, most of use-case issues lie in poorly designed tools and not in the source (Surely, the source could be improved, but you got the point I think).
Impersonator
2019-03-03 16:27
> If you want to see sentences with proper
> nouns other than "Tom", "Mary", "Boston",
> and "October", add them yourself

This doesn't really work due to the native speaker policy. We're discouraged from adding sentences in languages that are likely to be translated, so the impact we can have is very limited. English native speakers have unfair advantage of forcing their names (and not just names) on others.

Tatoeba badly needs decolonisation.
hide replies
Thanuir
2019-03-04 08:09
I agree about decolonization.

Some constructive things to do:

1. Add sentences with varied names of people and places.
2. Add other sentences characteristic of less present cultures - food, clothes, politics, idioms, nature, etc.
3. When you see such sentences in other languages, translate them.
4. When you see someone discouraging others from adding sentences with varied names, for example, reply with a list of reasons why adding varied names is good and desirable.

Tatoeba is a volunteer effort, so the best way of creating change is to simply do it. It is a pity that certain cultures dominate the website, but I do not see a constructive way of preventing that. The Tom, Mary and Boston -sentences do add value to the website, so they should not be discouraged, either.
hide replies
shekitten
2019-03-04 10:20 - 2019-03-04 10:22
I think there are constructive ways of preventing that. It's like something George Orwell said on a different topic (political language): "A man may take to drink because he feels himself to be a failure, and then fail all the more completely because he drinks."

The main way colonial attitudes perpetuate themselves is passive. I don't need to make any personal effort to continue cultural genocide against the Seneca; it's enough that I do nothing about the fact that the English language and English cultural attitudes are the supreme currency on their land. I don't need to steal the land of Indigenous Americans personally; it's enough that my great-grandparents came here and settled on their land after it was already stolen, and that I do nothing to remedy the material harm that has resulted from this theft.

Something we could do, apart from what you're suggesting, is to make it a violation of policy to discourage people away from adding non-Western names, places, etc.. Such discouragements are assertions of long-existing hierarchies, no matter how politely they are phrased and regardless of whether the user is conscious of this. I am sure there are other things we could do, but if we begin from the point that "nothing constructive can be done," it will be a self-fulfilling prophecy.
hide replies
Thanuir
2019-03-04 10:48 - 2019-03-04 15:42
My comment was about Tatoeba in particular, in case that was not clear.

I do not know enough about how policy on Tatoeba is settled to comment on that. I would not oppose your suggestion, if it was up to me.

Below is a comment, slightly edited, I left on a sentence when the contributor was discouraged from adding a sentence with non-standard name. Maybe you find it useful when you see this kind of counterproductive behaviour:

...

1. I would not discourage people from contributing, even if they contribute sentences that fit a pattern. They will probably also contribute other things.

2. Writings names in different scripts is non-trivial and language-dependent. The famous mathematician Tikhonov, Tihonov, Tychonoff, etc., and the philosopher Plato or Platon.

3. Declension of names is non-trivial in some languages. For example, in Finnish, Tomi-Tomin-Tomilla, Johannes-Johanneksen-Johanneksella. As such, having different names adds actual linguistic content. This tends to be especially true of foreign names.

4. It is most natural to write sentences that use names of the culture in question.

5. If there would be default names, which culture would they be from? I would prefer Väinö and Aino, personally, as they are good and traditional names with ties to Finnish mythology. I am sure everyone else would have different favourites

6. Different names suggest different genders in different languages. Kari is male in Finland and female in Norway, for example, by default.

7. It takes a lot of effort to police patterns. One can use the same to add new sentences and to translate sentences instead. Furthermore, this would be something one would have to teach most contributors one-by-one. Adding such requirements for contributing is not a good idea.

So: Several such sentences are not needed, but they also do not cause harm. It would take work and be highly impolite to police them. Unifying names would, in general, lead to loss of linguistically relevant content.
hide replies
shekitten
2019-03-04 10:59
Thanks, those are useful points. And especially #3 is a thing that English speakers can easily miss, and it really shows how useful and even necessary it is to give sentences with multiple names. If the primary common language used on Tatoeba were Russian instead of English, our default proper nouns would probably reflect the language's different declensions. If it were Turkish, our default proper nouns would probably reflect the different types of vowel harmony and consonant changes.
soliloquist
2019-03-03 13:05
> OsoHombre has used a lot of Arab names in his sentences

Actually, the way OsoHombre builds up his corpus isn't much different from CK's. He has his own standard names, too.

Sami <-> Tom
Layla <-> Mary
Fadil <-> John
Salima <-> Alice
Cairo <-> Boston

I think users adding original sentences in large numbers tend to adopt this wildcard policy one way or another. It has its advantages. The question is, if this policy is useful for users individually, will extending it to all original sentences in a language bring more good than harm, or vice versa?

Btw, there's a phone-number search site generating thousands of spam pages by using the patterns of Tom sentences here with different names for SEO purposes. It's interesting to see how many derivations can be done just from a single pattern.

https://www.google.com/search?q...w=1277&bih=538
JeanM
2019-03-15 13:23
I have seen the "do not insert annotations into sentences" policy here: https://en.wiki.tatoeba.org/art...into-sentences

While I am aware of that, I wonder if adding annotations *on top* of sentences (i.e. separately) would be a partial solution here. Below every sentence there could be an extra field, perhaps only displayed to advanced contributors, that allows marking proper nouns, and maybe even other things such as dates (in the simplest form, picture something like the "highlighter" feature of PDF annotation software). This would not have any of the drawbacks listed on the page linked above, as the annotation would be a completely optional separate field that's hidden by default.

The advantages would be that downstream users of the data (e.g. Memrise-style study deck apps, or translation software) could then attempt to replace proper nouns to add some variety to the data. This is obviously not as trivial as I make it sound, as one would have to contend with phenomena such as inflection, but it's certainly a starting point – and inflection could be dealt with downstream, or partially handled by more sophisticated annotation schemata (which could be used to mark gender, declension, etc.).
hide replies
Thanuir
2019-03-16 07:15
Currently there exists sentence-specific tags, but nothing word-specific, as far as I know.
Ricardo14
2019-03-01 20:45
I agree. Tom has been used too much. I know it has been adopted to avoid repetition and it is not against Tatoeba rules but I truly beliveve there are other ways to achieve such a thing.
JeanM
2019-03-15 12:59
As a researcher, I have found this project's data to be really useful for machine learning. Is there a list, somewhere on the website, of research papers that have used Tatoeba data? I think it would be a fantastic way of showcasing the project's impact.

A good starting point: https://scholar.google.co.uk/sc...t=1,5&as_vis=1
hide replies
CK
CK
2019-03-16 00:49 - 2019-03-16 00:54
http://bit.ly/tatoebalinks

This doesn't have links to research papers yet, but there are links to projects that use the data.
I haven't updated it in a long time, so there are likely others by now.
Pfirsichbaeumchen
2019-03-11 08:37 - 2019-03-11 08:38
Corpus Maintainer Candidate for Danish
Kandidato por iĝi frazara bontenanto por la dana lingvo
Korpuspflegerkandidat für das Dänische

♦ Danepo: https://tatoeba.org/user/profile/danepo.

As usual, please feel free to share your opinions. Send us a message using the link below. Corpus maintainers can make necessary changes where the owner of a sentence no longer responds.

Kiel ĉiam ni petas, ke vi sen hezito diru al ni vian opinion, sendante mesaĝon per la suba ligilo. Frazaraj bontenantoj povas fari necesajn ŝanĝojn en frazoj, kies posedanto ne plu reagas.

Wie immer bitten wir euch, uns ohne Zögern eure Meinung mitzuteilen. Schickt uns dazu mit Hilfe der folgenden Verknüpfung eine Nachricht. Korpuspfleger können notwendige Änderungen an Sätzen vornehmen, deren Besitzer nicht mehr reagiert.

Link (ligilo): http://tatoeba.org/private_mess...rsichbaeumchen
hide replies
Pfirsichbaeumchen
2019-03-14 10:49 - 2019-03-14 12:18
After much positive feedback, Danepo is now a corpus maintainer.

Post multaj pozitivaj mesaĝoj Danepo nun estas frazara bontenanto.

Nach vielen positiven Rückmeldungen ist Danepo jetzt Korpuspfleger.

😊
hide replies
PaulP
2019-03-14 18:53
Gratulon, Danepo!
hide replies
danepo
2019-03-14 21:35
Dankon!
Ricardo14
2019-03-14 19:55
Congratulations, Danepo!
Ricardo14
2019-03-14 19:55
Congratulations, Danepo!
hide replies
danepo
2019-03-14 21:35
Thanks!
AlanF_US
2019-03-10 17:02
Recently ([1]), I suggested an experiment where a native speaker of language X who is learning language Y could try leaving comments on selected sentences in language X. These comments would contain proposed translations into language Y, together with a request that a native Y speaker submit corrected versions of those sentences as translations. It's been working very well for me, thanks to @odexed, who has been kind enough to respond to my requests. Not only can I increase the chance of getting translations for the sentences that are useful to me, I am also able to compare my proposed translations against a native speaker's. This gives me practice and lets me see where I can improve. odexed also likes contributing sentences that are guaranteed to help someone. So it's more fun for everyone involved. :)

[1] https://tatoeba.org/eng/wall/sh...#message_31468
hide replies
soliloquist
2019-03-10 20:16
This method may work well with some languages and between friends, but I think English is a special case and this would likely create a bottleneck as it's the 'language Y' for most users.

Another and bigger problem is the difficulty of finding such comments, especially after some time passes. People would leave comments under sentences in different languages. One request would be under, say, a Turkish sentence, and another would be under a Hungarian sentence. One request would be for an English translation while the other would be for German. Categorizing and finding them would be difficult as they begin to pile up.

If there was a search function within comments, it would be easier to find them using some keywords or tags.
hide replies
AlanF_US
2019-03-11 11:50
> This method may work well with some languages and between friends

Exactly. An approach doesn't need to work for everybody in order to be useful to somebody. Tatoeba is a social medium -- let's be social! You can cultivate relationships with native speakers of the languages you want to translate into. If you don't already have one, you can look up who has been contributing recently in your language Y and has your language X in their profile, and send a private message asking them whether they'd be interested in translating into your language. Then, if they're willing, you can flag that person with an @ sign in your comments. If they have e-mail notifications enabled, they'll see your comments without having to search for them (though I agree that the ability to search within comments would be helpful for various purposes).

When you say that you think English would create a bottleneck, do you mean that you think that explicit demands for people to translate into English would overwhelm the number of native speakers in English? I'm not sure at all that that would be the case, but there's nothing to lose by trying the experiment.
hide replies
soliloquist
2019-03-11 17:23
Of course, as long as we mention some users on comments for translation requests, they likely get notifications and respond. However, mentioning specific users might reduce the chance of other users' noticing and participation. I thought you were referring to a more general concept addressing the whole community, like the @NNC tag.

I'm not against this btw. It would work just fine. But a broader concept as I mentioned on the previous message (keywords/tags to be used on translation request comments + a search function on comments) could be more effective in the long-term.

> When you say that you think English would create a bottleneck, do you mean that you think that explicit demands for people to translate into English would overwhelm the number of native speakers in English?

Yes, that was my point.
CK
CK
2019-03-12 01:04 - 2019-03-12 02:14
I used to leave such comments for English-Japanese translations and tommy_san and others would leave Japanese-English translations for me that way.

Here are a few examples.

https://tatoeba.org/sentences/s...comment-764923 ちょっと待ってくれる?
https://tatoeba.org/sentences/s...comment-766130 トムの言ったことは的はずれだった。
https://tatoeba.org/sentences/s...comment-752252 トムはハーバードで法律の教育を受けた。
https://tatoeba.org/sentences/s...comment-744791 トムは時々会いに来る。
https://tatoeba.org/sentences/s...comment-716108 トムは泳げない。
https://tatoeba.org/sentences/s...comment-578157 トムは演説をしました。
https://tatoeba.org/sentences/s...comment-578158 トムは演説をしました。
https://tatoeba.org/sentences/s...comment-751721 ボストンまでどうやって行くつもり?
https://tatoeba.org/sentences/s...comment-411752 もう時間だ。

There seem to be 22 such example comments by me left. I usually delete the comments as soon as I notice someone has added the translations, so we did a lot more this way quite successfully.

Another method, is to go ahead and contribute the translation, if you are sure it's correct, and then release it (unown it). This is what several members do with the English. I often adopt such sentences when I proofread sentences.
hide replies
soliloquist
2019-03-12 18:20
I guess you're collecting sentences with such comments in a list. Otherwise it would be difficult to check them. I would rather prefer non-native speakers to create their Turkish sentences and then leave comments saying @NNC. It's possible to filter comments by languages so it would be easier for native speakers to notice and respond. Leaving comments for Turkish translations on non-Turkish sentences may go unnoticed as they move to back pages of the comment feed (which happens pretty fast).
hide replies
Guybrush88
2019-03-13 05:57
along with this, what about creating sentences, unadopting them and then letting native speakers adopting them? There's already a long list of orphan sentences in many languages ready for adoption
hide replies
Thanuir
2019-03-13 08:13
Are native speakers adopting them? I adopted all but two Finnish sentences, and I am uncertain about the correctness of those two, but there are about 50 thousand unadopted English sentences, while only 1000 French ones; checking some other languages, this seems to depend mostly on whether anyone has taken the time to go through and adopt the sentences recently.

Maybe some English natives would like to start checking and adopting the English sentences and others should do the same in their native languages, if there is a large amount of sentences without owners. It would make this approach a lot more encouraging.
hide replies
Guybrush88
2019-03-13 09:44
that would be a way also to proofread them and, therefore, improve the quality of the corpus
hide replies
AlanF_US
2019-03-13 12:45
Personally speaking, adopting sentences is one of my least favorite ways to contribute to the site, and the number of unadopted sentences in a language is one of my least favorite measures of unfinished business. I prefer tasks in which progress can be measured easily, especially when I feel like they are very useful to at least one person in particular. For instance, I find it productive to go through sentences that are tagged "@needs native check" (=@NNC) because I can do something with most of them:

- leave a comment suggesting a particular change, delete the @NNC tag, and add the @change tag
- mark it OK and delete the @NNC tag
- delete the sentence
- mark the sentence not OK (for instance, by using the rating feature) and delete the @NNC tag

It's easy to measure progress because I am taking sentences out of the @NNC bin as I go along. I also feel as though the authors of the sentences have a chance to learn from my comments.

But adopting sentences is different. Often, I come across a sentence that I don't want to adopt, but there's no way for me to mark that decision so that I can skip that sentence in the future. Sometimes my reason for not wanting to adopt the sentence is that the sentence is not necessarily wrong but cannot easily be transformed into something a native speaker would say. Sometimes it's because it expresses a sentiment that I don't share. Sometimes it's because there are a whole bunch of sentences that vary in only a small way (for instance, a pronoun), and I don't see the value in taking the time to adopt all of them.

Furthermore, sentences end up in the unadopted bucket for a variety of reasons. Sometimes it's because the owner leaves after contributing a bunch of bad sentences, while sometimes it's because the owner wrote decent sentences in a non-native language, but wanted a native speaker to take ownership. There's no easy way to determine, at first glance, which sentences fall into which category. With @NNC sentences, it's usually obvious why someone gave them the tag.

Also, the number of unadopted sentences in English is larger by orders of magnitude than the number of sentences tagged @NNC. Going through @NNC sentences takes time, but is something I can manage. Going through the unadopted sentences in English would take a large number of people AND a large amount of time -- on the order of years. I've never been convinced that it's worth it, especially considering the other ways to contribute to the corpus.

However, the situation changes when we talk about arrangements between specific people. If person X agrees to unadopt their non-native sentences with the understanding that person Y will adopt them, then you don't have the "undifferentiated bucket of sentences" problem. I think this points to the same basic attitude that you can see with my discussion with soliloquist above: I favor approaches that involve one-on-one communication, even though I can see the value of infrastructure that allows people to act in ways that are less firmly coupled to specific individuals.
Aiji
2019-03-13 14:05
There are 1000 orphan French sentences left because I adopted several thousand of them already. I don't have time recently to check them so the counter does not go down.

The system of writing translations and then unadopt them have several flaws, most of them explained by Alan. But for me, the main flaw is that unpractical sentences are added to the corpus and it becomes very hard to deal with them. Typically sentences that nobody would ever pronounce (or write) but that are correct grammatically (and correct translations). Also, sentences that have plenty of translations although incorrect are a pain :) (but that is due to the French corpus history mainly ^^)
soliloquist
2019-03-13 17:00
Sure, it would work, too. It's just that when an unowned sentence is adopted and edited by a native speaker, they may not find it necessary to leave a comment about the correction or improvement they made (and they're right about that). However, if the creator of the sentence wants to be involved and informed, they may prefer not to unown the sentence. Personally, I wouldn't encourage non-native speakers to unown their Turkish sentences as long as they're cooperative with suggestions, but I respect other policies preferred by CMs of other languages. If I want to add an Italian sentence, I'll keep your preference in mind. :-)
Ricardo14
2019-03-10 23:24
I'd like to help you too. Please ping me if you want a sentence translated into Portuguese :)
hide replies
AlanF_US
2019-03-11 11:51
Thank you, Ricardo. I can only concentrate on one language at a time (currently Russian), but if I ever go back to focusing on Portuguese, I'll definitely keep your offer in mind. :)
gillux
2019-02-27 04:49
Dear Tatoeba contributors,

From this Friday, I will be working on Tatoeba again, thanks to our collaboration with Mozilla (thank you!). I will work to facilitate the use of sentences by Common Voice, but also to improve Tatoeba in general.

One of the ways I would like to achieve this goal is to first ask you what Tatoeba you are dreaming of. I think we focus too much on concrete details and forget to let ourselves dream. Yet dreams are one of the major forces driving us forward. Our Github is full of very concrete "little suggestions" and "little problems", but what are we really aspiring to in order to make Tatoeba a project useful to humanity?

I think that we, who are involved in Tatoeba in one way or another, all of us have in our heads a Tatoeba of our dreams, some big ideals, some big crazy ideas, a personal vision of how it should be, but that we refrain from expressing. So I am asking you to forget about the details, to forget about the quarrels, to think big and far, to let go a little and tell me frankly: which Tatoeba do you dream of?

══════════════════════════════════════════════════════

Chers contributeurs de Tatoeba,

À partir du vendredi qui arrive, je vais travailler à nouveau sur Tatoeba, grâce à notre collaboration avec Mozilla (merci à eux !). Je vais travailler à faciliter l’utilisation des phrases par Common Voice, mais aussi à améliorer Tatoeba de manière générale.

Une des façons dont j’aimerais atteindre ce but, c’est de commencer par vous demander de quel Tatoeba vous rêvez. Je pense que nous nous attardons trop sur des détails concrets et que nous oublions de nous laisser aller à rêver. Pourtant, les rêves sont une des forces majeures qui nous poussent à aller de l’avant. Notre Github est rempli de « petites suggestions » et de « petits problèmes » très concrets, mais à quoi aspirons-nous vraiment pour que Tatoeba devienne un projet utile à l'humanité ?

Je pense que nous, qui nous impliquons de près ou de loin dans Tatoeba, nous avons tous dans notre tête un Tatoeba de nos rêves, de grands idéaux, de grandes idées folles, une vision personnelle de comment ça devrait être, mais que nous nous retenons d’exprimer. Alors je vous demande d’oublier les détails, d’oublier les querelles, de penser grand et loin, de vous lâcher un peu et de me dire franchement : de quel Tatoeba rêvez-vous ?
hide replies
CK
CK
2019-02-27 05:36 - 2019-02-27 05:39
I like what TRANG said in 2009.

So the concept is : we gather a lot of data, try to organize it, ensure it is of good quality and make it freely accessible, downloadable and redistributable, so that anyone who has a great idea for a language learning application (or a language tool) can just focus on coding the application and rely on us to provide data of excellent quality.

http://blog.tatoeba.org/2009_11_01_archive.html
hide replies
gillux
2019-02-27 05:49
Yes, but see, Trang is probably the only person who expressed that, and yet it's not a dream, it’s a concept.

What do YOU think, CK? Which Tatoeba do you dream of?
hide replies
CK
CK
2019-02-27 07:36 - 2019-02-27 07:40
I dream that the data would be of good quality so that anyone who had a great idea for a language learning application (or a language tool) could just focus on coding the application.

The way it is now, to use the sentences for language study requires someone (the developer or a trusted colleague) to proofread and choose sentences for the target language to be learned. Otherwise, students are exposed to bad examples of language usage.
hide replies
shekitten
2019-02-27 11:10
Isn't that what the system of approve/don't know/reject[/don't vote] is meant to solve?
hide replies
PaulP
2019-03-02 09:40
Yes, but almost nobody is using it. I see a lot of bad sentences without the red mark.
hide replies
shekitten
2019-03-04 11:10
This is something I've mentioned, but it would be very helpful to be able to select sentences by whether they *have* been approved rather than whether they haven't been unapproved. I imagine this is in the works. I think it will be very helpful, though it still requires that people approve sentences in their languages.
hide replies
CK
CK
2019-03-05 00:30 - 2019-03-05 06:28
157 members (less than 0.4% of our members) have added 939,602 OK ratings to 937,368 sentences (about 13% of our sentences).

929,871 (99%) of these ratings were added by the following 20 members.

CK (713,571)
PaulP (142,170)
Guybrush88 (26,706)
bill (24,238)
Selena777 (4,355)
Pfirsichbaeumchen (36,19)
alexmarcelo (2,626)
tulin (2,520)
tornado (2,410)
Wezel (1,459)
soliloquist (1,397)
Raizin (1,289)
Thanuir (1,271)
odexed (1,259)
raggione (852)
umano (809)
Bilmanda (772)
Aiji (765)
Impersonator (734)
Scorpionvenin14 (668)



There are 8,202 non-OK ratings.

You can see a list of members who have added "not OK" ratings.

http://tatoeba.byethost3.com/outdated_ratings.html

Many of these are "outdated ratings", since corrections have been made.


hide replies
PaulP
2019-03-05 10:45
Very interesting, CK. Thanks! But I would like to see if any member rated one of my sentences "not OK". If they just put "not OK", but don't add a comment, we don't know, do we?
hide replies
shekitten
2019-03-05 10:50
And it would also be nice to search for sentences that have been marked "OK", for my own learning purposes.

e.g., a list of sentences in Turkish that have been marked "OK"

or, a list of sentences in Turkish that are either by a native speaker or marked "OK"
Ricardo14
2019-02-27 08:26
My dream: A Tatoeba which we could add as much content as possible at once and also "organize" sentences as easier as we do to organize our flash drives or hard disks, for example (assuming they belong to the same group). That way I believe Tatoeba'd be an "awesomer" resource to study languages (my dream).

By that I think it'd be great if we could a mass of sentences (maybe we'd have a maximum of sentences allowed at a time) and mass tagging sentences. In Portuguese for example, a sentence which begins gy "Eu" is always "1st Person Singular)
Impersonator
2019-02-27 08:32
I dream of Tatoeba that makes a conscious effort to support smaller, less described languages. These face unique challenges, completely different from the challenges of larger languages. I hope those challenges will be thought about from the beginning, and not added as an afterthought.

I dream of Tatoeba that provides not just translations but other grammatical information and perhaps glosses.

I dream of Tatoeba that allows to add some context for the sentences: perhaps descriptions. It would make sense to have some free-form descriptions to specify the status of speakers and their gender (if the language distinguishes between them).

Or maybe images and videos! Sentences about foreign clothes or food would make so much more sense. Something that would make it simpler to incorporate uncommon phrases that might have no direct translation.

As a totally far-fetched example example, there is a language Guugu Yimithirr that is known for using 'north' and 'east' instead of 'forward' or 'left'. First, I dream we'll have this language in Tatoeba. Second, I dream we'll have a way to specify the geographical position of speaker to map spacial position of items described in the sentences, to show whether 'left' corresponds to 'east' or to 'west'.
gillux
2019-02-27 10:13 - 2019-02-27 10:17
I dream that Tatoeba is a project I can be proud of when I’m showing it to my friends: "Do you know this website, Tatoeba?" "No, let me check it out." The homepage loads instantly. Everything’s localized, neat, beautiful, self-explanatory and easy to use from a smartphone or a computer. It shows some inspiring and featured example sentences. My friend tries to makes a search. The results are very relevant and show up almost instantly.

I dream that Tatoeba is a worldwide reference among language enthusiasts. Most professional translators prefer it over closed-source solutions because the results are more diverse and accurate, and all of their colleagues are on it too. Popular dictionaries all include Tatoeba’s sentences to illustrate their definitions. Whenever people want to make a point whether a particular expression is correct or not, widely used or not, they don’t argue by showing Google’s number of results; they show Tatoeba’s results instead. Tatoeba no longer relies on the ISO to include a new language. It’s like the other way around: having a language listed on Tatoeba is a point that may convince the ISO folks to include it too.

I dream that Tatoeba is a key tool for most language teachers around the world to prepare their lessons. Just give Tatoeba a few grammatical concepts and vocabulary items to study, and it gives you the materials you need.

I dream that Tatoeba’s community is huge, diverse and everyone’s equal. There are many active members from all Asian countries, the Global South, and all the minorities on Earth are well represented. Countries that are threatening certain language minorities are constantly trying to block Tatoeba because they can’t stand that these languages are being listed as such on something as famous as Tatoeba. Tatoeba is regularly mentioned on the news whenever a language minority is being threatened.
hide replies
Guybrush88
2019-02-27 11:23
I agree, in particular, on the part of language teaching. What I'd like to see is the possibility to see enhancements on tags, such as mass tagging to quickly provide more new metadata about sentences, so that users can easily find all the sentences with a given tag with the advanced search, and, by consequence, fetch more sentences regarding a specific grammar topic (I'm aware that some people, myself included, already tag sentences with grammar information, such as the verb tense). This would work also with quotes (the ones that comply with Tatoeba's licenses), since people, when contributing quotes, might want to have a quick way to provide the necessary attribution to the quotes they submit.
Ricardo14
2019-02-27 10:20
I also dream that *all* sentences - whether posted by natives or not (since they're proofread) - have the same "status".
hide replies
shekitten
2019-02-27 11:15
I think marking sentences by native speakers is a good idea. Proofreading is technically possible as well and it'll hopefully eventually be possible to get data (in regular search) of proofread sentences, not simply native sentences. This leaves the choice to whoever is using the data.

Proofreading is a great thing but it also requires a lot of people to take on the work of proofreading, and this skill isn't the same as translating, so we might want to reach out to people with copy editing skills who aren't necessarily translators.
AlanF_US
2019-02-27 13:50 - 2019-03-02 15:35
Thanks for this question, gillux.

My dream is, to paraphrase the Biblical prophet Micah, a Tatoeba where each member can sit under their own vine and fig tree, and work with language speedily and in peace. In other words, each person can customize Tatoeba in such a way that they can do what they want -- search, submit and answer queries, contribute sentences, annotate, proofread, and correct -- in a way that helps themselves and the community, but that does not get in anyone else's way. For instance, people who are strong in a minority language but less strong in a majority language can contribute sentence pairs without worrying about sentences in the weaker language showing up in the list of results of someone who is looking for 100% correct sentences.

Another part of my vision is integration with other communities and tools that are particularly good at their own specialties.

My vine and fig tree orchard would be something like this:

INTEGRATION:
(1) Use any of the following kinds of sites without leaving Tatoeba:
- a dictionary (like Morfix for Hebrew, or Wiktionary for Russian) that can take an inflected form and give me its accented form and etymology, as well as its dictionary form
- a real-world collection of sentence pairs (like Reverso Context)
(2) Make it easy and quick to get from the sentence search stage all the way to submitting a chosen sentence to a flashcard program (like Anki).
(3) Make it easy to get back from Anki to sentence search so I can see other sentences like ones I selected earlier.

SEARCH:
(4) Run the following preconfigured search:

Look for sentences containing word A, favoring but not limited to sentences that:
- contain an exact match for word A
- are four to eight words in length
- are owned by users B, C, and D
- are arranged/"randomized" so that sentences with slight variations (for instance, pronoun changes) are less likely to appear near each other
- have a direct translation into language L

QUERIES:
(5) Ask for sentences containing a particular word in a particular sense, with the ability to engage in a dialogue (Can it be used in this way? What about this way?).

(6) Contribute a sentence with a link to the query that it was intended to answer.

ANNOTATION:
(7) Let me add accent marks to a Russian sentence, or vowels to a Hebrew sentence, at the time I encounter them, knowing that others can choose to suppress the accents/vowels, once or always, and they won't interfere with searches. Or find a tool that adds them automatically with a very high degree of accuracy (and gives us the ability to edit bad automatic suggestions).

CORRECTION:
(8) Make it easy for me to change a sentence that is already linked to others:
- allow me to automatically submit comments on the linked sentences ("If sentence A is changed to B, would this sentence still match?")
- allow me to break existing links and create new links easily

LAYOUT:
(9) Allow me to exclude layout elements, like "Tips" and the wide margins on the Wall, that take up room in my browser.

MOBILITY:
(10) Let me use Tatoeba from a mobile device easily.
hide replies
AlanF_US
2019-03-02 16:03
To elaborate on item 4:

Currently, we can only specify that search results satisfy a criterion. We can't specify that we want results that may or may not satisfy that criterion but that are sorted so that the ones that do satisfy it occur at the top of the list. This means that I need to guess beforehand how many hits I'm likely to get. If I guess too low, I have to either page through lots of results that are not what I'm looking for, or follow it with a more restrictive search. If I guess too high, I have to follow it with a less restrictive search.

Our sorting engine, Sphinx, does have sorting modes:

http://sphinxsearch.com/docs/cu...#sorting-modes

I submitted an enhancement ticket about this:

https://github.com/Tatoeba/tatoeba2/issues/1804
cojiluc
2019-02-28 07:56
Améliorer le moteur de recherche de Tatoeba et enrichir ses options.
Thanuir
2019-02-28 18:38
More specialist vocabulary would be extremely useful. Mathematics vocabulary is the most useful for me, and there is a decent amount of relevant sentences, but this is probably a happy accident more than a trend. For many languages it is pretty easy to figure out the common vocabulary, but the more specialized, the harder it becomes.

Translating these sentences is also tricky, since it needs someone proficient with at least two languages and with the field in question.

...

Process-wise, it would be nice to have sentences in smaller languages translated with some frequency. As is, the sentences more peculiar to such languages and cultures can easily remain untranslated for very long times, for understandable reasons. But still, it would be nice if this was not the case.
soliloquist
2019-03-01 19:42
A forum section with subforums for different communities/languages where users can discuss matters and ask questions would be nice.
hide replies
CK
CK
2019-03-03 01:09 - 2019-03-03 01:36
The standard forum format might also be a good replacement for the Wall.

Often, when many comments are added to the same Wall post, it gets difficult to follow.

I think that wouldn't happen if we had a standard forum.

To maintain a somewhat similar feel to the website, it might be possible to show the forum titles on the right side of the home page, in much the same way that the current Wall messages are shown.

I wonder if it would be possible to adapt one of the open source online forums, using the same login usernames and passwords as tatoeba.org. Or, a CakaPHP plugin . https://github.com/CakeDC/cakephp-forum .

[EDIT]

Here is a demo of the CakePHP-Forum.

http://cakephp-forum.herokuapp.com/forum
hide replies
cojiluc
2019-03-13 16:35
I think this is a good suggestion. A forum has many advantages over the current wall. By contrast I suggest a forum like stackexchange forums. A free and open source version is available : https://www.question2answer.org/

The format of stackexchange forums is very nice and they have many advantages over the old traditional forums.
shekitten
2019-03-03 02:15
The ability to translate tags would be nice, and a good way to keep Tatoeba language-neutral while avoiding a large number of tags in different languages with the same meaning.

e.g. be able to translate English "proverb" as Esperanto "proverbo" and Interlingua "proverbio", and it would show up differently depending on the user's interface language.
hide replies
PaulP
2019-03-03 09:40
I couldn't agree more, Shekitten.
Thanuir
2019-03-04 08:14
Tag synonyms would be a potential way of doing this, since there are at the moment tags "mathematics", "maths" and probably also "math", and they are not translations, but still redundant. I am sure other similar situations exist.

On the downside, having tag synonyms or translations would create a need to discuss them more carefully and decide which tags are actually synonyms and which are not; tags like "seven syllables" would be a problem, as well as concepts which do not translate one-to-one between languages.
hide replies
shekitten
2019-03-04 11:04
> On the downside, having tag synonyms or translations would create a need to discuss them more carefully and decide which tags are actually synonyms and which are not; tags like "seven syllables" would be a problem, as well as concepts which do not translate one-to-one between languages.

This is a problem that also occurs with sentences themselves, but you're right that it would be especially difficult here. I still think it's worth it.
Ricardo14
2019-03-09 05:49 - 2019-03-09 19:43
**5 years on Tatoeba!**

In 2013 I was invited to join Tatoeba by Shishir (Rocío) and On March 3rd,2014 I've created this account. I can't express how happy and excited I am every single day I spend here
On these 5 years I got promoted to CM, I have joined that Tatoeba language team with cueyayotl and Sabretou, I helped to created tickets on GitHub during Tatoeba days, posted almost 50,000 sentences, I learned a bit about programming.... whoa, so much in a short period of time. Thank you!
I'don't like to thank everyone who have helped me to be a good member and to learn to manage a community and languages- Rocío, Lisa, Trang, Alex, cueyayotl, Sabretou, odexed, gillux, Paul, Guybrush, Alan, carlosalberto, Amastan, mraz, CK and so many others. Really thank you.
I know I am far from being a good member but I hope I can do more to help this awesome website to grow up. I've been inviting some people to join like Ergulis and MarinKjp and hope more join us

Thank you all!!!!!
hide replies
mraz
2019-03-09 06:03
Kedves Ricardo14!

Szívből gratulálok, jó egészséget, sok szerencsét és minden jót kívánok!

Éljen a Tatoeba!

Üdvözlet Budapestről (Magyarország): mraz

(u.i. Mindennek jelentősége és miértje van >m> mraz)
PaulP
2019-03-09 10:23
Gratulon, Ricardo, kaj dankon ankaŭ al vi!!
mraz
2019-03-10 06:47
Kedves Ricardo14!

Köszönöm figyelmességedet.

Igen! Üdvözlettel: mraz
sofwath
2019-03-10 03:24
Dhivehi (https://en.wikipedia.org/wiki/Maldivian_language) is missing from the languages and I would like to request to add the language.
hide replies