Menu
Has there been any progress made on implementing a voting system for reliability.
https://blog.tatoeba.org/2010/0...w-will-we.html
I read this article from a decade ago and it mentioned a timescale of months and the need for at least 20 advanced contributors, as of writing there are 147 advanced contributors with 25 in german and 15 in english alone.
Is the problem that no one is willing to work on the problem from a programming sense? The desire to get it done in french first (given french only has 13 Adv contributors rather than 20)? or is there some other reason.
This is the feature I would most like to see on Tatoeba and I would be willing to try my hand at implementing a programming solution for it.
I would be suprised if I was the first to feel this way in the last 10 years, So i would like to know has anyone attempted this before me? What happened to their submission if they made one. Given how widely voting systems are used on other websites I can't imagine this being too difficult unless there is something fundamental in the design of Tatoeba that would prevent it. Does anyone know?
My questions
1. Why has there seemingly been no progress on a voting system?
2. Has anyone attempted this?
2.1 Why was their attempt rejected/not implemented?
3. Is anyone working on this now? (perhaps i could assist)
4. Is there some reason why such a thing would never work on Tatoeba?
@hamsolo - anyone can join the dev team. That's what Trang posted recently:
"The very first thing you can do is to try and set up Tatoeba on your machine and let us know if you've faced any issue along the way, if there's anything we could simplify and if there's anything we should reword in our documentation. The more simple and easy to understand we can make this onboarding process, the better it is :)
The starting point is our GitHub repository: https://github.com/Tatoeba/tatoeba2
Once you're all set, we can move on to more concrete issues. Just let me know when you're ready!"
Because there are pros and cons. The pros are very constraining, while the cons aren't.
First of all, please think carefully of what would be the advantage of a voting system for you? And then, for all of us? And then, for all that are not us?
And first of all, what would a voting system look like?
Here are some cons (with the current system):
- A voting system needs a threshold. What is a good threshold? A majority of some sort? Even if you are in the minority, it doesn't mean that you're wrong.
- Some people could silently sabotage the corpus to defend a stance.
- We have corpus maintainers to take care of correcting sentences. If a sentence is correct, it is correct. If not, we can simply notify them and they will check if it needs to be corrected. There is no need for a voting system in this situation. If they cannot deal with a sentence, they can search the Internet or ask some help, that's what we signed for :P
- We have a "review" feature with which you can mark a sentence as "OK", "unsure", or "not OK". The current system isn't very well integrated with the proofreading process for now, but we have some ideas to improve it and make proofreading less resource-consuming and more efficient. This system relies on the community to point out sentences that may be wrong, but it does not exclude sentences. We're working towards inclusion, not exclusion of contributions.
- The majority of contributions are correct, and should be treated as such.
First of all, please think carefully of what would be the advantage of a voting system for you?
- I get nicer search results
And then, for all of us?
- Better search result rankings where the most useful results showed appeared higher in the list is what originally separated google from it's competitors. Tatoeba is an amazing resource and we all benefit if search results are improved and it becomes more popular.
And then, for all that are not us?
- Same as above.
And first of all, what would a voting system look like?
- A did you find this useful button. 1234 other people did.
Pros
- ranking of search results
- - Triage for popular sentences
- - Most common uses are likely to be ranked first
- - Near duplicates are likely to be filtered by the community
Cons
- Slight overlap in functionality
I ran into a problem yesterday when I was looking up sentences, I found a few sentences that were correct and understandable but ultimately archaic. They fit the binary mold of technically correct or incorrect but they were poor examples of good, modern, intelligible english.
Furthermore, I understand that only advanced contributors can tag. With a voting system you could triage the neverending stream of daily sentences so advanced contributors knew which sentences were most valuable to the community, then they could apply their tagging and fixing efforts in the areas it would make the greatest difference.
Corpus works could be sabotaged, but the votes don't necessarily indicatate sentence validity but instead percieved usefulness.
Let's take the word know for example
The most common uses of know are probably in a sentence such as:
"I don't know"
"I know, don't remind me"
Then less common
"I know kung fu"
"I know algebra"
Then less common, perhaps a sentence where someone claims he knows goats in a more biblical sense.
How does a new learner distinguish the difference in meaning between knowing kung fu and biblically knowing goats
Assuming I've guessed the order of use correctly, knowing goats should be on the bottom of the list indicating to a new learner that while this is technically correct, It may not be the best sentence to put in your Anki Deck.
It should also prevent direct duplicates or near duplicates from showing up next to each other. For example "I know!" and "I know." or "I know the Jacksons" and "I know the Smiths". There are obviously differences between these sentences but a voting system could allow the community to decide between seeing lots of near duplicates on the first page then having to search further for a variety of uses or simply voting up (the presumably wider variety of) sentences they found most useful.
I don't understand why a voting system needs an upper theshold. More votes doesn't make a sentence more correct, but merely indicates it's usefullness to people. We could allow sentences to drop below zero, as an indicator that the sentence is not only not useful but perhaps as a flag that something is wrong with it. Then it would be easily searchable and fixed, then it could be reset to zero upon manual review but I am not married to the concept of a down vote system.
We could also grant votes (for example maybe 10) for manually reviewed sentences that are correct, I know there is already functionality for this sort of feature but this would be affecting the ranking of search results rather than just the visibility.
Your cons
- Voting system needs threshold
Why? Is this solved by only having upvotes?
- Potential for corpus sabotage
Is this solved by only having upvotes?
- Overlap with existing functionality
True, but this should aid integration with proofreading by allowing for triage.
- We already have Corpus maintainers
Triage makes their job easier.
- The majority of contributions are correct and should be treated as such.
"Knowing goats" and "knowing kung fu" are both correct sentences but I'm sure one is more useful than the other.
You have good points, but most of them show a very common bias. You want Tatoeba to fit a personal use case. Let me explain.
From what you wrote, I could extract one fundamental problem: Help beginner learners of a language to more easily identify what sentences are more common / useful, or on a more general scale, what level of usefulness is provided by a sentence.
Now, that's indeed a very good problem. However, it lays on an erroneous assumption: Tatoeba is not a tool that aim to teach you language. Oh, it can be used as such, surely. But its fundamental mission is not that. Its fundamental mission is to provide good corpora of sentences (there's a problem on the meaning of "good" but we can discuss that another time). What you describe is a tool (for language learners), while Tatoeba is a source (of data). The tool uses the source in a twisted way to fit its need; but it cannot ask the source to bend to fit its vision.
Of course, I don't say that the problem you mentioned should be ignored, far from that. But the statement of the problem(s) to be solved shouldn't be biased by "it's easier for learners". Otherwise, you will get people saying: please do this to help my Natural Language Processing algorithm, please implement that because I could use it in the Japanese class I'm teaching.
Now, again, I don't say that those problems should be ignored. However, after listening to these problems, we should try to extract the fundamental issue, free of all biases. I think that is what Trang expressed when she described how we design features.
As a simple illustration, let's take your three pros and let me give exaggerate simple cons (for the sake of argument):
Pros
- ranking of search results
- - Triage for popular sentences
--> This would bring a vicious (not virtuous) circle of "what's popular get more popular because it's popular". Tatoeba is not about popularity. Every contribution is considered equal as long as its respect "quality-standard". Then, of course we could set "popularity" as optional, but in the end that would bring only more work compared as if the problem was tackled another way, a bias-free way, from the beginning
- - Most common uses are likely to be ranked first
--> Same problem as above. This implies a bias that shouldn't exist. Also, think about American / Australian / British / etc. or French / Canadian / Senegal / etc. Saying something is most commonly used because more users come from a particular region seems pretty unfair.
- - Near duplicates are likely to be filtered by the community
--> How so? We cannot consider one good enough, and the other not. And what if I do want near duplicates? They have values in themselves, even if they are a bother in some situations.
The last point (near-duplicate) actually depicts the best the point of view I try to defend in my post(s): If there is a problem inherent to Tatoeba, Tatoeba should try to solve it in a way that is completely independent of any particular use (language learning, NLP algorithm, translation tool, etc.).
Most of your other points could be answered in a similar way. In particular, "usefulness" is difficult concept to handle, and in your post itself I can see a potential contradiction between "the community would vote for what they think best" and "A is more useful than B".
And to summarize my ideas, let me answer quickly at one of your post below:
- Tags would not help you learn the language or distinguish what is for beginners and what is not, because that is not what they are made for (the functionality suffers from some flaws but hopefully work will be done soon to improve it).
- If you think the search doesn't allow you to find relevant results (many of us think so), please explain us why, and we could try improving the search functionality together.
I'm always surprised by the assumption that Tatoeba is primarily geared toward beginning language learners, since in my view, it's not particularly well suited to their needs. Beginning learners need guidance, a path to follow. There are much better places to find that elsewhere. I've always considered Tatoeba far more useful to learners who have had some time to get to know the basics of a language and now want to see examples of words or grammatical features in use.
In line with what Thanuir said, it doesn't take long to know what the basic meanings of "know" are, at which point sentences that contain them become much less useful to the learner than sentences with more advanced meanings. Understanding the biblical sense of "know" is not really a necessity for intermediate or even advanced learners, which means it's not a great example of a verb with a gradation of meanings, but since the example has already been used, I'll stick with it.
It doesn't take much familiarity with a language for someone to figure out that "know" is being used in an unfamiliar context, and therefore, the sentence may be of limited applicability. For instance, perhaps the sentence was this:
"For lo, the ewe goat was comely in appearance, hence the shepherd knew her twice upon the hill."
Even if I did not know English well, the presence of rare words like "lo" and "comely" would suggest to me that this was not an ordinary colloquial sentence and therefore, I would not assume that it's a typical example of an utterance I should expect to produce. Nor would it cause me to throw out what I've already learned about more standard usages of the word and conclude that knowing is something one only does to an attractive goat on a hill.
Another point I wanted to raise is that voting for the usefulness of sentences would be an extremely tedious exercise. It would be hard to convince most people to do it, and for good reason, since, as has been said, the vast majority of sentences on Tatoeba are already useful. Therefore, you'd get a small number of people voting, and hence a biased vote. I would try to convince people to put their effort into more constructive areas.
Mainitsemistasi käyttötavoista ainoastaan ”knowing goats” on uusi ja kiinnostava minulle. Tietokanta ei ole pelkästään aloittelevia kielenopiskelijoita varten.
> Has there been any progress made on implementing a voting system
> for reliability.
If you ask about progress since the blog article has been published: yes, there has been progress. In 2015, we introduced a feature to review sentences: https://github.com/Tatoeba/tatoeba2/pull/738
It was initially called "Collections" but we recently renamed it to "Reviews". Besides of the name change, this feature did not evolve at all ever since its introduction. It was introduced as "experimental" and still is today.
> Is the problem that no one is willing to work on the problem from a
> programming sense?
Well, there's a bit of that, but we are not just lacking developers.
There has been a shift on how we design features. It used to be that people would suggest things to change on Tatoeba and if we felt it is a good idea, we would implement what they suggested. Over time, we learned it's not a good practice. From that realization, we started trying to first understand the problems, and then design and implement then solutions.
So this idea of having some sort of voting system is just a solution. But a solution to which problem exactly? Is the problem really a problem? Does the solution really solve the problem? Maybe, maybe not. We actually don't have clarity on that.
> I would be suprised if I was the first to feel this way in the last 10 years,
> So i would like to know has anyone attempted this before me?
Besides my implementation of the reviews feature, no one else attempted anything. But I would be more than happy if you could assist in pushing this feature out of its "experimental" status.
Before that though, you said that this voting system is the feature you would like to see the most in Tatoeba. Could you elaborate why is that? What is the problem/frustration that you are facing when using Tatoeba, that you think a voting system would solve?
>So this idea of having some sort of voting system is just a solution. But a solution to which problem exactly? Is the problem really a problem? Does the solution really solve the problem? Maybe, maybe not. We actually don't have clarity on that.
That's exactly that : a solution without a problem. Because, then the question still remains : who's going to decide what is right and what is wrong ? The supposed "wisdom of the crowd" ? Which crowd ? educated crowds or uneducated ones ?
If the majority rules, then we will have to accept that uneducated crowd rules. Is it OK ?
So what ?!?
Perhaps I wasn't clear in my original post. I agree that the "uneducated crowd" shouldn't dictate what is correct or incorrect, but they could vote on what they found useful and choose not to vote on what they did not find useful. I don't think there is a need for a downvote feature, however an upvote feature could provide better ranking for search results.
The problems as I see it are
1. Presently we are getting sentences faster than we are tagging them. Triage sentences for the adv. contributors to tag so that the sentences most popular with the community are ensured to be correct.
2. Uncommon uses are right next to common uses and for beginners it's hard to know which is which as there is no indication of usefulness of the phrase. (See my comment to Aiji above, specifically the bit about the goats.)
3. When i search for common words I find lots of duplicates and near duplicates of words. For example, "I know the Jacksons", "I know the Smiths". Filtering duplicates and near duplicates.
4. Inefficient layout of search results. A voting system would show people the things other people found particularly useful first.
Would you have a problem with a button that said
1234 people found this useful, Did you?
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
Honestly, I'm learning chinese and many of the characters have multiple meanings, heavily based on context. When I search for an example sentence I don't know which sentences are common uses of the word or phrase that I'm interested in and which sentences are uncommon, but still correct uses of the word or phrase I'm interested in.
I see the problems as
1. Presently we are getting sentences faster than we are tagging them. Triage sentences for the adv. contributors to tag so that the sentences most popular with the community are ensured to be correct.
2. Uncommon uses are right next to common uses and for beginners it's hard to know which is which as there is no indication of usefulness of the phrase. (See my comment to Aiji above, specifically the bit about the goats.)
3. When i search for common words I find lots of duplicates and near duplicates of words. For example, "I know the Jacksons", "I know the Smiths". Filtering duplicates and near duplicates.
4. Inefficient layout of search results. A voting system would show people the things other people found particularly useful first.
I'll download it and take a look.
> so that the sentences most popular with the community are ensured to be correct.
Err...precisely NO. A majority of the population makes on and on the same mistakes. That’s why education was invented...
“Most popular” = most wrong, in most cases.
> A voting system would show people the things other people found particularly useful first.
So what is actually correct would actually disappear from sight...
I see your point.
> I'm learning chinese and many of the characters have multiple meanings, heavily based on context. When I search for an example sentence I don't know which sentences are common uses of the word or phrase that I'm interested in and which sentences are uncommon, but still correct uses of the word or phrase I'm interested in.
Could you give some examples of Chinese words you've searched where the search results didn't make clear to you which uses were the common ones?
Maybe you wanted to give an English example to make your problem clearer to non-Chinese speakers, but currently there don't seem to be any sentences involving biblically knowing goats https://tatoeba.org/cmn/sentenc...rom=eng&to=und (I half expected someone to have rectified that as a result of this discussion.) so it's not a very good example.
With a specific instance of the problem to look at, finding a solution should be easier. Maybe that solution will involve some kind of voting, maybe we can come up with something else.
> I half expected someone to have rectified that as a result of this discussion.
Fixed. See:
https://tatoeba.org/eng/sentences/show/8639180
Thanks for explaining your problems, hamsolo.
One thing I can say is that the various problems you mentioned are unlikely to be solved with one single solution. I'll go through them one by one and I will have to interrogate you a bit more on some points, if you don't mind.
> When I search for an example sentence I don't know which sentences are
> common uses of the word or phrase that I'm interested in and which
> sentences are uncommon, but still correct uses of the word or phrase
> I'm interested in.
I'll ask you the same thing as Yorwba on this one. Could you give some examples of Chinese words you've searched where the search results didn't make clear to you which uses were the common ones?
> 1. Presently we are getting sentences faster than we are tagging them.
> Triage sentences for the adv. contributors to tag so that the sentences
> most popular with the community are ensured to be correct.
It is true that Tatoeba doesn't provide any way to find sentences based on popularity and if your preferred way to contribute would be to proofread the most popular sentences, then we wouldn't be able to fulfill your needs at the time being. I'm wondering what is your definition of popularity though.
We have a feature that somehow measures popularity already: the favorites (users can favorite a sentence by clicking on the heart icon in the sentence menu).
My questions are:
- Does this "favorite" feature correspond to your definition of popularity, or do we have a different definition of what is popular?
- If this does not measure popularity the way you wished it was measured, then how exactly would you measure popularity?
- And what difference would it make for you to proofread the most favorited sentences compared to the most popular sentences?
> 2. Uncommon uses are right next to common uses and for beginners it's
> hard to know which is which as there is no indication of usefulness of the
> phrase. (See my comment to Aiji above, specifically the bit about the goats.)
I'm not sure if I could clearly understand your problem with your example about knowing goats.
In the end my interpretation is that you, as an English speaker who is learning Chinese at beginner level, when you browse/search Tatoeba for sentences to add to your Anki deck, you are often having trouble figuring out which sentences would be the most useful to add to your deck.
If that is a correct interpretation of your situation, then perhaps you could explain to us what is your workflow on using Tatoeba to build your Anki deck?
> 3. When i search for common words I find lots of duplicates and near
> duplicates of words. For example, "I know the Jacksons",
> "I know the Smiths". Filtering duplicates and near duplicates.
On the issue of finding lots of near-duplicates, I recommend you set the sort option to "Random" rather than "Relevance" when you search sentences. It can happen that two near-duplicates appear on the same page, but common words usually have 1000+ results. For common words, it would be extremely unlucky for you to have two near duplicates on the same page.
> 4. Inefficient layout of search results. A voting system would show
> people the things other people found particularly useful first.
Assuming we use a voting system to measure usefulness of search results, I think we would need to associate each vote to a specific search. A sentence cannot be universally more useful than another. Maybe the sentence "I know algebra" would be useless for someone who searched "know" but would be useful for someone who search "algebra".
But I think upvoting for useful sentences would be very inefficient compared to reporting bad search results. You would need millions of votes and you wouldn't really be sure that those votes will help. On the other hand, just one person reporting to us that a certain sentence was not useful for a certain search could help us make actual improvements.
I feel this 4th problem is in the end the same problem as your 2nd problem. The way sentences are ordered feels inefficient for your task of building an Anki deck.
But if your use case here isn't about trying to build an Anki deck, then it would be helpful to know what are the other contexts in which you have experienced inefficient search results. What did you search exactly and for what purpose did you need to search this? Were you trying to understand the lyrics of a song? Were you trying to write a sentence in Chinese to a Chinese acquaintance?
Q: I'll ask you the same thing as Yorwba on this one. Could you give some examples of Chinese words you've searched where the search results didn't make clear to you which uses were the common ones?
A: 后来, Yorwba and I recently found out that we intrepreted this word in two different ways, In my dictionary and in the chinese grammar wiki this is a word that means afterwards. This is my understanding of it. However Yorwba told me that 后来 did mean afterwards but had additional connotations making it mean something closer to, "afterwards it was suprisingly revealed", pointing me to a sentence page on Tatoeba, where there were admitedly sentences with that connontation.
Fortunately I had studied this phrase on other websites such as the chinese grammar wiki as well as discussed it with my native chinese girlfriend and I knew that while this is a possible connotation of the word it is certainly not the most common meaning.
However my biggest problem is that I have only studied 后来 and a few hundred other words to the extent where can be confident in using them and knowing that others will understand them, leaving more than 90% of the other words in mandarin full of potential ambiguity. So while I have yet to learn something from Tatoeba that is wrong, then use it and be corrected, remember where I learned that use from, go back to the source and request it be corrected; I acknowledge that it is a definite possibility.
Q: - Does this "favorite" feature correspond to your definition of popularity, or do we have a different definition of what is popular?
A: This is likely a good solution to my problem, I will have to play around with this for a while.
Q: If this does not measure popularity the way you wished it was measured, then how exactly would you measure popularity?
A: A visible number next to each sentence saying this many people thought this was a useful sentence, perhaps we could even put natives vs second language votes.
Q: And what difference would it make for you to proofread the most favorited sentences compared to the most popular sentences?
A: At this point I'm not sure and I'll have to play with the favourite feature a bit.
Q: On the issue of finding lots of near-duplicates, I recommend you set the sort option to "Random" rather than "Relevance" when you search sentences. It can happen that two near-duplicates appear on the same page, but common words usually have 1000+ results. For common words, it would be extremely unlucky for you to have two near duplicates on the same page.
A: This is a good solution.
Q: But I think upvoting for useful sentences would be very inefficient compared to reporting bad search results. You would need millions of votes and you wouldn't really be sure that those votes will help. On the other hand, just one person reporting to us that a certain sentence was not useful for a certain search could help us make actual improvements.
A: As an outsider/newbie I honestly can't tell if relying on human maintainers to do this is efficient and I understand that a sentence/keyword search engine is different to the google search engine which looks for sentences/keywords on webpages. However google beat yahoo (who at the time had humans manually review and categorise pages) by relying on an algorithm that was constantly fed new data based on the relevance of their results. (They saw which results were the most popular for a given search and gave them priortiy ranking). Doing what google did is likely beyond me, but the first step is understanding what is popular.
Q: What did you search exactly and for what purpose did you need to search this? Were you trying to understand the lyrics of a song? Were you trying to write a sentence in Chinese to a Chinese acquaintance?
A: As I learn new words or phrases in chinese I try to make sentences with them, practice using them etc... and chinese grammar has a few hard to predict differences compared to english grammar. For example, in english we would say that red is almost always an adjective, for example a red ball, red hair, red paint, red car. In chinese colours are nouns, even when used to describe something like a red ball, red hair, red paint or red car. This changes the words you can use with such words, I personally have trouble remembering every word I learn as an adjective, noun, verb etc.. even in english I just remember context, can produce a few sentences and figure it out, but the fact that run or climb are verbs are not saved in my memory like they would be written on the page of a dictionary. I just know how to use them and understand the rules that define the grammar (in english at least). Perhaps it is a flawed approach but this is also how I am trying to learn chinese, learn enough sentences and attempt to gain an intuitive understand collocations the same way I do in english.
In short, Nothing in particular, I'm just trying to build my mental list of sentence examples so I can produce new sentences in the near future. Chinese is hard.