Muro (3881 fadenoj)

<<< 1234567 >>
TRANG
antaŭ 16 horoj
** New feature: Reviewed sentences / Users corpus **

I've been working on a new feature the past couple of months and it's now finally ready to be tested on the dev website (https://dev.tatoeba.org). I think this is a feature that a lot of users will be happy to have, since it's a step towards improving the quality of Tatoeba's data.


# What's new?

1) You can mark sentences as "correct", "incorrect" or "unsure". http://prntscr.com/7zprx9
2) Each user has a corpus. When you mark a sentence, it is added to your corpus. http://prntscr.com/7zps39
3) On the sentence's details page, you will be able to see who marked which sentence. http://prntscr.com/7zps9d


# What does this imply?

A lot of tags are going to become useless, such as "OK", "@change", "@check", "@needs native check", or basically any tag that is suggesting the correctness of a sentence.

Everyone will be able to mark sentences for verification because the new feature is available to everyone, as opposed to tags which are available only to advanced contributors and corpus maintainers.

We're going to shift towards a system that is more tolerant to "wrong" sentences. Not right now, but we'll slowly get there.

Based on how everyone is evaluating the correctness of a sentence, we will be able to calculate a score for each sentence. With this score, we can display an icon next to each sentence so that users can have a quick idea whether or not the sentence can be trusted.


# I need your feedback

What I've implemented so far is only the minimum needed for the feature to be useful. There are still a lot of things I have in mind, but before going further, I'd like to make sure we're starting on solid foundations.

So please test the feature on the dev website (https://dev.tatoeba.org), and let me know if there is anything that you find confusing, or if there's any improvement that you think is very necessary before this feature gets released.

I'm planning to include this feature in the update on August 10th.

Thank you!
kaŝi la respondojn
alexmarcelo
antaŭ 15 horoj - redaktita antaŭ 15 horoj
I personally find these resources very useful, but I would limit people to only "mark" sentences in the language assigned as "native" in their profile, except for Latin, Ancient Greek, Esperanto and alike.
kaŝi la respondojn
TRANG
antaŭ 13 horoj
I don't think such limitations will benefit the project in the long term.

I prefer to go for a more open policy, and let everyone use the feature as they wish, to collect as much data as possible.

We'll keep an eye on how people are marking sentences, we'll find solutions to help/encourage people to mark sentences more accurately, and we'll figure out how to detect false positives (i.e. people who mark correct sentences as incorrect or vice versa).
bandeirante
antaŭ 3 horoj
That is a sensible idea, but I'm still opposed to it. Sometimes, not often but occasionally, a well-versed outsider is just as competent, if not more competent, than a native speaker. Happens from time to time. Of course, one should be humbole about their own ignorance.
alexmarcelo
antaŭ 15 horoj
Keeping people from marking their own sentences would be a good thing, too.
kaŝi la respondojn
Ooneykcall
antaŭ 15 horoj
Automatically mark sentences as verified by the user who posted them (if they're a native, anyway) would make more sense, I think.
Ooneykcall
antaŭ 14 horoj
What are we going to do with language varieties? Sentences from one variety (e.g. Australian English) may seem wrong to a person who uses a different variety (e.g. American English). I mean, if most speakers consider a sentence to be 'wrong', that doesn't necessarily mean it's bad. It could be that it is only used by a certain group of speakers. Assigning it a low score based on majority opinion, though, may make it look like it's a bad quality sentence that shouldn't be trusted/used at all, wouldn't it?
kaŝi la respondojn
TRANG
antaŭ 14 horoj
> What are we going to do with language varieties? Sentences from one variety
> (e.g. Australian English) may seem wrong to a person who uses a different variety
> (e.g. American English).

We'll have to trust users to not mark sentences as incorrect when the sentence may be correct in another variety of a language.

Obviously, some people will still do it (on purpose or not). But I think we can manage to implement a system that is smart enough to detect when such cases happen.


> Assigning it a low score based on majority opinion

I'm stopping you right here. The score is not going to be based on the "majority" but rather on the trustworthiness of each user. It's still too soon to elaborate any score anyway. I don't expect us to be able to calculate a score efficiently before another 6 months, maybe not even before another year, because we'll need first to elaborate some sort of trust system.

In other words, let's say your trust score in Russian is 100, and you mark a Russian sentence as correct. Then 30 people mark the same sentence as incorrect, but all of them have a trust score of 0. Then your opinion will prevail over the ones of these 30 people.
kaŝi la respondojn
Ooneykcall
antaŭ 13 horoj
Fine, but then before the verification system is running, we'll need to have a section somewhere explaining what the verification labels are supposed to mean so that people are more likely to use them the intended way.
---
I see, so you seek to establish a proper trustworthiness formula before proceeding with implanting it, which I can only welcome. Then this is a question for a later time, alright.
kaŝi la respondojn
TRANG
antaŭ 13 horoj
> before the verification system is running, we'll need to have a section somewhere
> explaining what the verification labels are supposed to mean so that people are more
> likely to use them the intended way

This is why I need feedback :)

Most people don't read instructions, so I'd like to make the feature as clear enough as possible, so that even without detailed explanation about what each choice means, users will still choose correctly.
brauchinet
antaŭ 3 horoj
And what about the quality of the translation? A sentence may be fine, but still not accurate as a translation. Sometimes this is difficult to verify, sometimes very easy.
Well, there's always the option of unlinking such sentences, but if we have a rating system should translation issues be handled in it or separately?
kaŝi la respondojn
TRANG
antaŭ unu minuto
The quality of translations has to be handled separately, and it's not something that is covered yet.
tommy_san
antaŭ 13 horoj
"Correct" and "incorrect" sound objective, but I think we'd be able to gather more valuable data by collecting subjective judgments. For example, when an Australian member knows that a particular English expression is used by some, but s/he'd never use it her/himself, we'd miss a precious piece of information if s/he simply voted "correct".

Something like these might be more meaningful.
1. I'd use this myself
2. I'm not sure if I'd use this myself
3. I wouldn't use this myself
4. I'm sure this is wrong
Only the last option would lower the "trustworthiness" of the user who added the sentence.
kaŝi la respondojn
pullnosemans
antaŭ 3 horoj - redaktita antaŭ 3 horoj
I like this idea. is this inspired by the problem of japanese orphans which sound inadequate to you, but you don't think they're grammatically wrong, if I recall correctly?

anyway, this could be useful to indicate that a sentence may be used e.g. to illustrate a certain grammatical pattern, but with the caveat that the sentence it its entirety sounds weird or unnatural. then users would know which sentences to use for mere examples of a certain construction they may be newly acquiring, and which sentences to actually remember as things you could say in a conversation.
pullnosemans
antaŭ 3 horoj
are you sure the @change tag is going to become unnecessary? what about sentences that are completely fine, but have missing punctuation or a typo in them?

which actually leads to a more general question: how should we go about such sentences? should they be marked good, or bad, or not at all? what would happen to a sentence that has been voted down because it has an orthographic mistake, then changed to be perfectly good? would it retain its low score?

which again leads to the question: what happens to sentences generally after they have been changed? will their score be reset to 0?
pullnosemans
antaŭ 3 horoj - redaktita antaŭ 3 horoj
and by the way, I would still like to see this combined with a request feature. :)

there could be a list of sentences asked to be rated, and a sentence would disappear from the list when it has received a certain number of opinions (maybe 3 or 5).

this could especially bring benefits during the beginning times of the feature, when there are millions of sentences to be evaluated, so the probability of coming across one that has been evaluated is very low. if people can request sentences they want to use to be evaluated, there would be some orientation for the users which sentences to evaluate, because they can just go to a list of sentences where there is a need for evaluation right at the moment.
Pfirsichbaeumchen
antaŭ 4 horoj
►New Advanced Contributor
►Nova progresinta kontribuanto
►Neuer fortgeschrittener Mitarbeiter

Lipao:
https://tatoeba.org/user/profile/lipao
kaŝi la respondojn
pullnosemans
antaŭ 3 horoj
gratuluji!

es ist wirklich schön, zu sehen, dass in letzter zeit viele sprecher schlecht repräsentierter sprachen diesen status bekommen. halt dich ran, lipao!
mraz
antaŭ 3 horoj - redaktita antaŭ 3 horoj
Minden jót kíván: mraz

Mi deziras cxion bonan: mraz
ravas
antaŭ 2 tagoj
When I search for: What are you doing for dinner?
using: From Any - To Any
I get: No results found

yet here it is:
http://tatoeba.org/eng/sentences/show/2849900
kaŝi la respondojn
tommy_san
antaŭ 2 tagoj
That's because "dinner?" means "dinner" followed by one letter (such as "dinners").
https://tatoeba.org/sentences/s...und&to=und

You can get the sentence if you remove the question mark.
https://tatoeba.org/sentences/s...und&to=und

See this page to learn more about advanced search.
http://en.wiki.tatoeba.org/arti...ow/text-search

I agree this is somewhat counterintuitive.
kaŝi la respondojn
ravas
antaŭ 2 tagoj
Ahh... Thank you!
ravas
hieraŭ
I suggest creating an option to toggle between two search modes:

mode 1:
- considers punctuation as part of the sentence
- is case insensitive
- no advanced features

mode 2:
- the current advanced search syntax

Mode 1 would be the default. This would mitigate confusion for new users.
cueyayotl
antaŭ 2 tagoj
Questions about "tags"

We have established that the "OK" tag may ONLY be added by a native of the language of the sentence.

How about "slang"? If we are told by the owner to tag their sentences, are we allowed to do so, despite the language not being our native language?
kaŝi la respondojn
AlanF_US
hieraŭ
Yes.
DostKaplan
antaŭ 2 tagoj - redaktita antaŭ 2 tagoj
There is clearly a sentence (#129586) with this text:

Il venait à peine

But if you search for this string (from French to English), you get no results. If you search (without quotation marks so that it is less restrictive) for:

venait à peine
or just
venait peine

you get 6 results but none of them is sentence #129586.

Why?
kaŝi la respondojn
tommy_san
antaŭ 2 tagoj
See here.
http://prntscr.com/7z1fv8

Unowned sentences and unapproved sentences are excluded by default.
kaŝi la respondojn
DostKaplan
antaŭ 2 tagoj
Shouldn't it say "Is approved" instead of "Is unapproved"?

Is unapproved: No

means it IS approved.
kaŝi la respondojn
AlanF_US
antaŭ 2 tagoj
The labels have the intended meaning.

"Is orphan: No" means that you are excluding orphans, which are likely to be incorrect. Similarly, "Is unapproved: No" means that you are excluding sentences which have been actively marked as unapproved.

If the label said "Is approved: Yes" (that is, if the wording were switched and its meaning was also flipped), it might give the misleading impression that only sentences that have been actively approved will be selected.

You could think of the label as saying "Is deprecated: No". But "deprecated" is a word that fewer users (especially non-native English speakers who are using the interface because it hasn't yet been translated into their languages) are likely to know.

Pfirsichbaeumchen
antaŭ 2 tagoj
►New Advanced Contributor Request
►Neue Bewerbung (fortgeschrittener Mitarbeiter)
►Nova peto pri la statuso de progresinta kontribuanto

Vertigo93:
https://tatoeba.org/deu/user/profile/vertigo93

[ENG] Rauf (vertigo93) would like to be an advanced contributor. You can learn more about him by clicking on the link to their profile. As always, we ask that you feel free to share your opinion by sending us a private message using the link below.

Advanced contributors can link and tag sentences.

[DEU] Rauf (vertigo93) möchte fortgeschrittener Mitarbeiter werden. Ihr könnt mehr durch einen Blick in sein Profil über ihn erfahren. Wie immer bitten wir euch, nicht zu zögern, uns eure Meinung mitzuteilen. Laßt uns mit Hilfe der untenstehenden Verknüpfung eine Privatnachricht zukommen.

Fortgeschrittene Mitarbeiter können Sätze verknüpfen und etikettieren.

[EPO] Rauf (vertigo93) volas iĝi progresinta kontribuanto. Vi povos ekscii pli multe, se vi rigardos lian profilon. Kiel ĉiam ni petas, ke vi ne hezitu sciigi nin pri via opinio. Sendu al ni privatan mesaĝon per la ligilo sube trovebla.

Progresinta kontribuanto povas ligi kaj etikedigi frazojn.

[1] http://tatoeba.org/private_mess...sichbaeumchen.
Buzulkusu
antaŭ 3 tagoj
Who did me block and why?
kaŝi la respondojn
Impersonator
antaŭ 3 tagoj
It seems you've been posting sentences from other sites, without giving proof the author of the sentence allowed to use it on Tatoeba. This may violate copyright and jeopardize Tatoeba. See some of my comments on your sentences for examples.
kaŝi la respondojn
Buzulkusu
antaŭ 3 tagoj
So, when will remove my block or will remove my block?
kaŝi la respondojn
pullnosemans
antaŭ 3 tagoj
maybe if you promise to really, really, really only contribute turkish sentences from now on, though this is not up to me to decide.

I think you haven't fully understood how tatoeba works. you're supposed to post sentences you, yourself, have thought of. because tatoeba allows all of its sentences to be used freely by anyone, by posting a sentence on tatoeba, you're telling anyone who reads it "please feel free to use this sentences wherever you want". therefore, if you post a sentence that is not your own, you're infringing someone's copyright, because they might not want this sentence to be distributed freely.

if turkish is the only language in which you can think of natural, grammatically acceptable sentences, then this language is the only one in which you should contribute. that is how this site works. if you keep refusing to accept this, maybe it's better for the site to keep you blocked, even though we very happily and with gratitude accept your turkish sentences.
kaŝi la respondojn
Buzulkusu
antaŭ 3 tagoj
Ok, I accept.
kaŝi la respondojn
Buzulkusu
antaŭ 2 tagoj
?
vertigo93
antaŭ 2 tagoj - redaktita hieraŭ
Dear "pullnosemans" I'm so thankful that finally you make our turk contributor understand the rules... i was so sick of his/her "contributions" in Azerbaijani ))
TRANG
antaŭ 2 tagoj
You can contact Pfirsichbaeumchen (https://tatoeba.org/eng/user/pr...rsichbaeumchen), or send an email to community-admins@tatoeba.org if you would like to have more information about why you were blocked and when/if you will be unblocked.
kaŝi la respondojn
Buzulkusu
antaŭ 2 tagoj
I sent it. Can my contribution block remove, do you know?
kaŝi la respondojn
TRANG
antaŭ 2 tagoj
I don't know exactly, sorry. You will have a bit patient, because it can take a few days.
kaŝi la respondojn
vertigo93
antaŭ 2 tagoj
But who is going to delete his (buzulkusu) contributions in Azerbaijani?
DostKaplan
antaŭ 2 tagoj
I like being notified by email whenever someone posts a comment to one of my sentences or comments. Why don't I get any email notification when someone sends me a private message on tatoeba?
kaŝi la respondojn
TRANG
antaŭ 2 tagoj
We have an issue about this: https://github.com/Tatoeba/tatoeba2/issues/92

The reason why it's not implemented is simply because no one has taken the time to work on it. It was never high priority enough, or nobody was inspired to work on it.
pullnosemans
antaŭ 6 tagoj - redaktita antaŭ 6 tagoj
adding this as a seperate thread (instead of as an answer to one of the already exisiting ones about the topic) because I actually think it's kind of important and I don't want it to be overlooked.

a lot of people are asking for a wishlist feature for certain words, which I think is a good idea.

in addition, I would also suggest, especially for japanese with its tons and tons of orphan sentences, a wishlist for orphans to be adopted, and changed if they're not good.

this way, when someone wants to use an orphan sentence because they cannot find a specific word in a non-orphan, or because all non-orphans are very long or complicated, that orphan can be verified.
kaŝi la respondojn
pullnosemans
antaŭ 6 tagoj - redaktita antaŭ 6 tagoj
this could even be extended to a general verification wishlist, for all cases in which people for whatever reason are not sure a sentence is okay. we would have to see whether that would end up in a vast sea of requests too big for natives to handle, though.
Guybrush88
antaŭ 6 tagoj
i agree with this. Personally I'd like to see someghing similar also for orphan sentences in other languages, like English and French
kaŝi la respondojn
pullnosemans
antaŭ 6 tagoj
yeah, I'd like it to be universal for all languages. I just mentioned japanese because it has probably the worst orphan-nonorphan ratio on tatoeba.
ricardo14
antaŭ 5 tagoj
By TRANG on (2015-04-12 15:36) - https://tatoeba.org/eng/wall/sh...#message_22294

"You may want to fill this form:
https://docs.google.com/forms/d...LbK_8/viewform "
AlanF_US
antaŭ 5 tagoj
There's a difference between a wishlist of words and a wishlist of existing sentences. In the case of existing sentences, you can create a list and add the sentences to them. Then you can send someone a link to the list. (You could use a tag to accomplish the same goal.) There's no way to do the same thing with a list of words. However, what I do as a workaround is to create a list of sentences in my own language, English, that use the English equivalents of the target words. Then I send someone a link to that list and request that they translate the sentences in it. This has the advantage of increasing the number and coverage of sentences in English as well as in the target language. Of course, there are disadvantages to this approach. One is that the translator might not use the word that you're interested in. For this reason and others, I agree that a wishlist of words would be useful. However, until we have one, this is at least a temporary solution.

> ... this could even be extended to a general verification wishlist, for all cases in which people for whatever reason are not sure a sentence is okay. we would have to see whether that would end up in a vast sea of requests too big for natives to handle, though.

We already have a tag to serve this purpose: @needs_native_check.
TRANG
antaŭ 5 tagoj
The main problem, I think, is simply that we don't have enough active Japanese contributors. It won't matter if we implement a feature to request sentences to be checked and corrected, if there is no contributor with the knowledge to do it, and willing to do it.
kaŝi la respondojn
pullnosemans
antaŭ 4 tagoj
I'm responding to trang because her comment is the farthest down the thread, but this is also a response to alan.

I think a seperate feature for this specific purpose could yield much better results than simply using tags or creating lists. maybe it's just me, but I don't have the impression that tags are perceived as very central on tatoeba, apart from the @change tag, maybe (though I have seen some really, really old @change tags that have not been taken care of so far). there are just so many of them that individual tags get lost in the mass.
as for lists, myes, maybe, but I wouldn't like to be constantly bugging people by sending them messages saying "hey, here's a list again for xth time, happy adopting!". if there were a specific feature, people on here (in case it all works out like I imagine) would see that there is a need for verification among users on tatoeba, and could just once in a while look over a few.

of course, if there are no contributors taking care of the stuff, then there's no helping it, but we'd have to see how that works out. and again, I'm only using japanese as an example, but even if it wouldn't work for japanese in the beginning, it might work for other languages, and it could also change for japanese over time, maybe if just one new contributor registers who is really willing to do it, at least for a while.

as I said, it might be just me, but personally I don't have the impression that tags and lists are very prominent features on tatoeba. maybe this could be changed by encouraging contributors (especially advanced contributors?) to help by checking sentences tagged @needs_native_check and so on, but right now, I just feel like the "verification on demand" feature could be much more prominent on tatoeba.
kaŝi la respondojn
TRANG
antaŭ 4 tagoj
How do you imagine this "verification on demand" feature exactly?

Just so you know, I have some work in progress, which basically tackles the problem of quality in Tatoeba.
Cf. https://tatoeba.org/eng/wall/sh...#message_23646
kaŝi la respondojn
pullnosemans
antaŭ 3 tagoj - redaktita antaŭ 3 tagoj
I remember reading that you want to "have a dedicated system" for verifying sentences on tatoeba, but I had forgotten it again. anyway, I think it's a great goal. maybe this will help a little on the way:

for verification on demand, I would basically add a button on the top bar (maybe under the "contribute" button, but I actually think a separate, always visible button would be cool) that says something like "check/verify sentences". by clicking it, you get to a page where you can choose to display, in a specific language, sentences that have been asked to be verified (all users can do this by clicking on a button next to the "mark a favourite" and the other buttons). there you can verify sentences as good or bad (or rate them on a scale), and the user who asked for verification gets a message saying the sentence has been evaluated. there could also be an exhortation message telling people to adopt orphans on the list if and only if they're in their native language and they think they are good, or adopt and then change them if they think they're bad.

alternatively, the "sentences to be verified" button could also be grouped under a primary "wishlist" button, along with "words to be added in sentences (can't think of a good name right now)".

also, I would consider limiting the verification feature (or in fact both the verification and the "word wishlist" feature) to contributors with one of the three advanced statuses, or introducing some other "trusted user / verified native language(s)" status.
another possibility would be to introduce a more open "upvote/downvote" feature, where you can leave one up- or downvote on each sentence, and your vote weighs more heavily if you have the sentence's language as a native language in your profile? this would obviously be easy to abuse, but assuming that most people would be mature about it, this could work as well. anyway, there could either be thresholds where a sentence gets deleted at, say, a vote score of -10, and gets a "verified" mark at a score of +10 or something, but we could also simply have the score displayed and leave it at that.

(edit: added later)
you could also use the above mentioned "trusted user / etc." status to exclude their sentences from the "ask to be verified" feature, so that people won't be just clicking the button on every sentence because they want to be extra-sure, or include an "are you sure?" step on non-orphans, with a message saying that you should only ask for sentences to be verified if you have a specific reason to doubt their reliability because it creates work for other users.

these are just my ideas, but I hope you might find some of them useful.
maydoo
antaŭ 3 tagoj - redaktita antaŭ 3 tagoj
For now I have 5 English words I'd like to see on Tatoeba although I have learnt what they mean.

To sanitize, throng, to fend sb/sth off, sleek, to debunk
Thanks in advance.
kaŝi la respondojn
Objectivesea
antaŭ 3 tagoj
Sure thing, maydoo. Here are five sentences for you:

#4418141 Many municipalities have reduced chlorination and now also use ultraviolet light to sanitize the public supply of drinking water.

#4418143 Even the popular senator was surprised that thousands of people would throng to his political rallies.

#4418145 Some musicians and actors hire security guards to help fend off their over-eager fans.

#4418147 After a dentist killed Zimbabwe's most famous lion, animal-rights activists were angry that he planned to mount the lion's head and sleek, bushy mane on an office wall.

#4418149 Although scientists and skeptics have many times managed to debunk conspiracy theories, believers nevertheless maintain an unreasoning attachment to the discredited ideas.

<<< 1234567 >>