menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
Ricardo14 Ricardo14 May 22, 2019 May 22, 2019 at 12:45:50 AM UTC link Permalink

I think it has already been pointed out but maybbe it's a good idea to point it again

The vocabulary feature is a great tool. It helps us to study languages by adding words in languages we don't speak fluently and want to see it used in contexts.

However, looking for https://tatoeba.org/eng/Vocabul...sentences/por, I've found words that don't exist in Portuguese like "decoy", misspelled words like "libro de cabeceira" (it should be "liVro") and also things like "Posso colocar aqui verbos usamos em nossos dia." (I can add here verbs we use every day), "Caso precisem de alguma informação com relação ao Português Brasil posso ajuda-los." (In case you need any information about Brazilian Portuguese I can help you)

http://prntscr.com/nrnhj9

It'd be good if we can correct these requests or even delete some.

https://tatoeba.org/eng/Vocabul..._sentences/eng (English)

Make it up - Помириться (Russian)
communicative language teaching (not a word)
That's what this is all about. - Это то из-за чего это всё. (Russian again)
щзхлзщхлщзх (Can't belive this is in Russian or any other language)

http://prntscr.com/nrnj8o

{{vm.hiddenReplies[31912] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir May 22, 2019 May 22, 2019 at 6:57:57 AM UTC link Permalink

I agree with the problem of the list being cluttered with typos and other mistakes.

I do think that for example "communicative language teaching" is completely fine on the list, though. One might very well understand all the component words and have example sentences with them without understanding the whole term or having it in any sentence.

With English this is especially bad, since the language uses few compound words; "language teaching" rather than "languageteaching", in marked contrast with many Germanic languages and Finnish, for example.

At least in scientific English there are many terms that refer to a concrete thing, but due to features of the language, they are written as a collection of separate words. "magnetic resonance imaging", for example, or "inverse problem".

There are also cases like the Danish "rejse" (to travel) versus "rejse sig" (reflexive form, to get up e.g. from a sitting position). It should be completely valid to have "rejse sig" as a vocabulary item.

raggione raggione May 22, 2019 May 22, 2019 at 10:37:34 AM UTC link Permalink

Full supprt - I'd like to get my hands at the requests for German, which need to be sorted badly.

TRANG TRANG May 22, 2019 May 22, 2019 at 11:34:08 PM UTC link Permalink

Related issue on GitHub: https://github.com/Tatoeba/tatoeba2/issues/1473

To solve this, one possible approach is the one suggested in the GitHub issue: let corpus maintainers edit/delete any vocabulary items. This approach comes with a couple of problems however.

1) What if corpus maintainers edit/delete items that could have been valid?

This problem already shows up here in this thread, as you, Ricardo, said you would have deleted "communicative language teaching", while Thanuir would have kept it.

2) Users don't necessarily add vocabulary to get example sentences.

I can imagine that some users just want to make a list of words/phrases they want to learn and they find it useful to have the translations next to each word/phrase. That's probably why you have those items that contain both English and Russian words. Since the vocabulary feature doesn't allow to connect vocabulary between two languages, users will of course put the two languages into one item if they need to. They're not exactly doing something wrong, they're just adapting the feature to their needs. I feel that deleting or editing their items would not be the proper thing to do.

My suggestion would be to allow contributors to ignore items on the "Sentences wanted" page.
- Ignored items would no longer be displayed on the "Sentences wanted" page.
- Ignored items would be individual, which means that if I ignore an item, it is ignored only for me. If other people don't want to see an item, they would have to ignore it themselves.

In parallel to that, we could split the vocabulary items into two categories:
- A "regular" list where users just create a vocabulary list for themselves.
- A "needs sentences" list where users add vocabulary for which that they explicitly want example sentences. Only vocabulary items in this list would appear on the "Sentences wanted".

Those are my personal ideas. There is no clear decision yet on how we want to handle the "bad" vocabulary items and if you have better ideas, please do share.

{{vm.hiddenReplies[31919] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux May 23, 2019 May 23, 2019 at 8:20:27 AM UTC link Permalink

To me, the root of the problem is that members are unable to put the correct information in the "add vocabulary items" form. And no one can blame them for that. The process of turning an "I want sentences like that" idea into correct values for "language" + "searchable vocabulary item" is indeed difficult, and probably unclear at first. Members don’t exactly know that their vocabulary items are going to be interpreted as search queries, with the number of results displayed, all that listed on a public page. I can see some members are adding whole sentence pairs as vocabulary items, or a semicolon-separated list of items. I can imagine some members assume that what’s under 'my vocabulary items' is only visible to them. I guess some members do not care that much about the flag because they know the language already. After all, when writing down your own vocabulary list, does anyone bother writing the language name?

That is quite of a UX challenge in my opinion. I have no great idea to solve it but there is certainly room for improvement. I’m pretty sure most people do not read the bottom-right "Tips" block that says "Add vocabulary that you are learning. If your vocabulary does not exist yet in Tatoeba, other contributors can add sentences for it."

I don’t think a per-member black list would help. I can make out two categories of items people would like to hide: (1) items that are not meant to achieve 10 sentences (because they are garbage, completely invalid or personal) and (2) items that won’t achieve 10 sentences because of incorrect values. As for category 1, everyone wants to hide them from the public list. As for category 2, they are valid requests that anyone can be interested in, it’s just that Tatoeba isn’t smart enough to show the correct number of sentences.

Splitting the vocabulary items into personal vs. wanted sentences is an interesting idea, but in the end I don’t think it will solve the problem of having "forever 0 sentences" items cluttering the list.

Some other ideas:
How about allowing members to edit the language of their own items?
How about allowing corpus maintainers to edit the language of other members’ items?
How about somehow dividing the vocabulary items into two categories: the ones that can be readily used as search queries, and the ones that cannot. For the ones that cannot, instead of having a link to a search, we allow to directly add sentences that are assumed to match the item and are counted as such.

{{vm.hiddenReplies[31921] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG May 25, 2019 May 25, 2019 at 3:30:42 AM UTC link Permalink

> That is quite of a UX challenge in my opinion.

Oh yes, and we definitely won't be solving everything at once...

> How about allowing members to edit the language of their own items?

Yes, we should. There's an issue about this: https://github.com/Tatoeba/tatoeba2/issues/1238

> How about allowing corpus maintainers to edit the language of other
> members’ items?

There's the problem that corpus maintainers may have wrong assumptions about the language of the item.

> How about somehow dividing the vocabulary items into two categories: the ones
> that can be readily used as search queries, and the ones that cannot. For the ones
> that cannot, instead of having a link to a search, we allow to directly add sentences
> that are assumed to match the item and are counted as such.

I think this issue is related: https://github.com/Tatoeba/tatoeba2/issues/1281


Another idea: we could exclude old items from the "Sentences wanted". If an item has been added more than 30 days ago (could be more, could be less, could be customizable by the user who is adding sentences), we don't display it anymore. It would show up again only if another user would add the item, or if the same user would re-add the same item.

{{vm.hiddenReplies[31930] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir May 25, 2019 May 25, 2019 at 4:37:40 AM UTC link Permalink

Could the time limit be keyed to the activity of the user, instead? If a user has not contributed to the site for a year, no longer display their wanted vocabulary. Otherwise, one would have to continuously upkeep their vocabulary for it to survive, which sounds like busywork or a discouragement.

I would also suggest that one month is far too short. If a word is simple, sentences might be added in a month, but they might also be there already. Most legitimate words that linger for long would presumably be obscure or in languages where people do not actively contribute sentences to wanted words. In this case, even a year sounds like a short period of time.

{{vm.hiddenReplies[31931] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG May 25, 2019 May 25, 2019 at 10:59:43 PM UTC link Permalink

> Could the time limit be keyed to the activity of the user, instead? If a user has not
> contributed to the site for a year, no longer display their wanted vocabulary.
> Otherwise, one would have to continuously upkeep their vocabulary for it to survive,
> which sounds like busywork or a discouragement.

Well, we would also need to take into account the contributor's point of view (and by "contributor", I mean the person who is creating sentence based on vocabulary). If they have no idea what to do with a vocabulary item (because they're not inspired, or because the item is incorrect) and that item stays on the "Sentences wanted" page for until one year after the vocabulary owner becomes inactive, that would be pretty long.

I get your point though. Having to actively "bump" your vocabulary requests every once in a while can be annoying.

We could implement the time limit as an option rather than a default. By default we could still display vocabulary items of all dates, then the contributor can choose to only show the more recent ones if there are too many undesirable items.

But I'm starting to wonder if the approach of displaying vocabulary items is the best approach. Perhaps we should instead display users who are requesting vocabulary so that the focus is more about connecting members with each other. It would be more engaging and, I feel, more efficient for getting rid of "bad" vocabulary items.

Since I would browse the items of a specific user, I would know who to contact to let them know that something is wrong in their vocabulary list, and they can correct it. And if they never correct it, it probably wouldn't bother anyone because the items would be "isolated" in that user's vocabulary list, as opposed to being mixed up with everyone else's items like it is now.

{{vm.hiddenReplies[31937] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir May 26, 2019 May 26, 2019 at 5:55:24 AM UTC link Permalink

I use the vocabulary feature both ways. In case a user case in interesting:

1. When reading a text and encountering a word I do not understand, I sometimes add it to my vocabulary. (This depends on the context where I am reading, more than anything else. How much time I have, for example.) Likewise, if there is a word I will or would like to use, I add it to my vocabulary. When adding something to my vocabulary, I also check the sentences that already are there that the built-in search finds.

I very occasionally go through my vocabulary, removing words that I am satisfied with, and checking the entries Tatoeba finds with the search for those terms I do not yet quite understand.

2. As a contributor, I sometimes go through the wanted Finnish words and add sentences to those. The problem is that I seem to be the only one doing this with any regularity, so I tend to add similar sentences several times. Sometimes I come up with a clever idea for a sentence and find that it has already been added, often by me. There are several wanted Finnish words, so I can just skip the ones that I find hard, or occasionally I do a little bit of research to find how the word is actually used or to find out some facts about it, or even what it precisely means if the word is obscure enough.

I do not think there are any wrong Finnish vocabulary items at the moment. If there were, I would just have to ignore them, or add them once to a sentence that points out the misspelling or misunderstanding. Adding ten such sentences to get the word out of the vocabulary list would not be inspiring at all. But a single sentence would be fine.

...

In my experience, adding words to the vocabulary is like putting a postcard in a bottle and throwing it into the sea - maybe someone will find it and add the word at some point, but there are absolutely no guarantees, and it never happens fast.

Languages: Mostly Danish and both Norwegians, occasional scientific or obscure English, couple of obscure Finnish words, and maybe something else.

Likewise, the list of requested Finnish vocabulary seems to be fairly static, so I consider it as a source of inspiration and a puzzle game, as well as a way of getting to know some obscure words, rather than as actively helping someone in the moment. Actively helping someone would be nice, though, but it also works as is for me.