clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search

Wall (5415 threads)

footycouch
2019-06-01 20:56
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
sharptoothed
2019-05-27 12:48
* Tatoeba Most Translated Sentences Charts *

Tatoeba "250 Most Translated Sentences" and "Most Translated Sentences In Each Language" charts have been updated:

https://tatoeba.j-langtools.com/mosttranslated/
https://tatoeba.j-langtools.com...ed/bylang.html
hide replies
deniko
2019-05-28 08:09 - 2019-05-28 08:10
There's a weird thing - Translations Unique and Translations All Columns don't match in two different charts, I'd expect them to be the same, I guess.

For example, according to "Most Translated Sentences In Each Language", this sentence:

"One, two, three, four, five, six, seven, eight, nine, ten."

has 145 unique translations, and 172 translations in total.

According to "250 Most Translated Sentences", this same sentence has 109 unique and 129 translations in total.

This is just an example, it seems to be happening for most of the sentences listed.
hide replies
sharptoothed
2019-05-28 08:47
There should be a bug in the script that counts "250 Most Translated Sentences", I guess. I'll try to fix it but not very soon. :-(
sharptoothed
2019-05-29 06:42
Solved. There was no bug, actually. The charts were based upon different database dumps. :-)
hide replies
deniko
2019-05-29 07:47
Thanks a lot! The data looks good now.
Ricardo14
2019-06-01 20:08
Thank you!
cojiluc
2019-05-30 14:49 - 2019-05-31 00:07
Can I ask if some English speaking users to add some sentences with the words below? Thank you.

mollify, cognisant, histrionic, supine, extemporaneous, attune.

P.S.
I have added these words to the "Sentences Wanted". Several months have passed and nothing came out.
hide replies
TRANG
2019-05-30 21:13
You can try as well to contact directly some of the English contributors by private message.

You can find English speakers by browsing this page:
https://tatoeba.org/eng/users/for_language/eng

Or you can check who has contributed English sentences lately:
https://tatoeba.org/eng/sentenc...&sort=modified

But I'm taking note that it would be useful to have an option for contributors to say "I want to help others by creating sentences for requested vocabulary", and have page that lists these contributors, so that people like you can more easily find people to help you.

Right now you unfortunately need to go through many user profiles just to find a few people who would be potentially happy to help.
hide replies
deniko
2019-05-31 13:45
> You can try as well to contact directly some of the English contributors by private message.

I'm not the OP, but contacting native speakers via PMs seems to me to be too intrusive, personally I would never do it.

Leaving a message on the wall, which is basically an open forum, seems less intrusive.
hide replies
TRANG
2019-05-31 19:57
I am maybe pessimistic here, but my feeling was that cojiluc might not get any reply with this message on the Wall. I think many people don't pay much attention to the Wall. That's why I suggested the option to directly contact other members.

Contacting people via PM of course depends on how "shy" you feel about it. But I don't think it hurts more than asking a stranger for direction when you are lost in a new city.
hide replies
Ricardo14
2019-06-01 20:06
>I am maybe pessimistic here, but my feeling was that cojiluc might not get any reply with > this message on the Wall.

Agreed, I still believe it worths trying but yes, contacting members through PM seems to work much more. Perhaps we could have a "ping system" (@member, @members that speak x language/which has x language as the native). This system does work on Duolingo Incubator - the place we use to create course (I'm in charge to create French for Portuguese speakers and Portuguese from French speakers courses in there). That way people will get notified on PM and/on email and/on a system alert we can here depending which way(s) they authorized Tatoeba to contact them. Good idea, isn't it?

>Contacting people via PM of course depends on how "shy" you feel about it. But I
> don't think it hurts more than asking a stranger for direction when you are lost in a >new city.

Tatoeba is a peaceful place which people are heartwarming :D
Ricardo14
2019-06-01 20:00
I'd have another approach... Well, kinda another approach.

As far as I "want" translations into English, I'd go to Home --> Latest contributions --> Show more (https://tatoeba.org/eng/contributions/latest)

Ok. Now I can see users who participated in the last 200 contributions.

From there I'd browse for people that have posted in English. From the time I've posted this, only two people posted sentences in English as you can see here -> http://prntscr.com/nwe606

So there are two profiles. I'd go to each one of them since they just posted sentences on Tatoeba - which might mean that they're active members. I'd choose if I want translations from native people only or from whoever speak English. Then I'd ask him/her to translate the sentence for me via PM.
Thanuir
2019-05-31 13:14
I have not had much luck with people writing sentences for my vocabulary items (in English or otherwise), but maybe some English-speaker will be inspired.

Are you interested in the mathematical definition of "mollify" (in mathematical analysis, i.e. convolution with a smooth function of compact support) or other meanings?
hide replies
Ricardo14
2019-06-01 20:00
From my experience contacting people from (https://tatoeba.org/eng/contributions/latest) has helped me way much more...
Ricardo14
2019-06-01 19:53
Please, can everyone translate the following sentence

https://tatoeba.org/eng/sentences/show/2278

"You may be disappointed if you fail, but you are doomed if you don't try."

Thank you. :)
Dalyup
2019-05-26 21:20
Sometimes when I try to play audo files via the random sentence function, I cannot hear anything. It always works when I click on the sentence's entry and listen to it there.

For example, this happened just now with the English translation of this sentence: https://tatoeba.org/deu/sentences/show/6618773

Any idea why this might be happening? I am using Firefox 67.0 (64-Bit).
hide replies
Colbo
2019-05-27 16:01
Maybe you need drivers? For you sound card?
Have you installed the drivers? It's usually realtek hd audio.
deniko
2019-05-28 08:17 - 2019-05-28 09:23
I can confirm - this is also happening in Chrome and Edge, so probably not browser related.

The sound works fine if you click on a sentence, or if you see it in the search results, but it doesn't work in the random sentence widget.

So to reproduce you need to click "Show another" button on the main page a few times to get a sentence with audio. For example, I happened to see this sentence when I was experimenting:

https://i.imgur.com/T4FnEim.png

Then you click on the loudspeaker button and there's no sound.

Once you click on the sentence itself to proceed to its page (#6559124 in this case) the sound works fine.

So the problem is probably with the Random Sentence widget (or however you call this feature).

EDIT: I played a bit more with it and the pattern is really weird.

So if you open the main page, and the Random sentence you see first (or one of its translations) does have a sound, it plays fine, and all the subsequent sentences with sound that you get after you press "Show another" work fine too.

If you open the main page and the first sentence you see doesn't have a sound (and neither do all the translations of this sentence), then you press "Show another" enough times to get a sentence with a sound, the sound wouldn't play.
hide replies
Dalyup
2019-05-28 08:53 - 2019-05-28 08:53
@Colbo, my sound drivers are installed, so it's definitely not that.

@deniko, I just tried the pattern you outlined and got the same result (with Firefox and Chrome). It seems something is not loaded if the first random sentence does not contain audio.
hide replies
TRANG
2019-05-30 21:45
There is indeed a bug. I created an issue on GitHub:
https://github.com/Tatoeba/tatoeba2/issues/1898

Special thanks to @deniko for figuring out the steps to reproduce systematically.
Colbo
2019-05-28 10:32
Everything works fine on my side.
hide replies
deniko
2019-05-28 10:35
Well, I can reproduce it very reliably following the instructions I described above.

If it's not happening to everyone, this means Tatoeba works in mysterious ways.

Colbo
2019-05-30 13:44
Hey, peeps, remove irrelevant comments.
If the sentence was changed according to your indications, you can remove your comment.
hide replies
Colbo
2019-05-30 13:52
Also, please give me the right to link sentences with the same meaning.
It will improve the corpus a lot.
hide replies
jegaevi
2019-05-30 14:23
https://en.wiki.tatoeba.org/art...d-contributors
This is how you can apply to become an advanced contributor, and if you become one you will be able to link sentences.
Colbo
2019-05-28 12:17
Hey guys, how can I link sentences?
I find a lot of sentences that have similar meanings, and I would like to link them.
Though I'm not proficient in all languages, I can understand, but definitely, they have the same meaning.
hide replies
soliloquist
2019-05-28 14:10
Just copy and paste the sentence you want link as a new translation (pay attention to flags). The duplicate-merging script will link them together.

This method is an alternative, indirect way of linking for normal contributors. It's a lot easier for advanced contributors. They can link sentences with a single click.
Aiji
2019-05-29 13:25
I am not sure of what you really want to link, but sentences of the same language that have similar meanings might not need to be linked to each other. But sentences should be linked to similar translations, definitely. I think that's also one reason why "linking" is reserved to advanced contributor, first they need to thoroughly understand what this is all about.

Also, you can refer to different guides available at the bottom of the page. (First steps, guide to be a good contributor, etc.)
soliloquist
2019-05-29 19:56
By the way, I don't think linking sentences in the same language is wrong as long as they convey the same idea. I find it rather useful for studying synonymous words and phrases (especially if they're not linked to a sentence in a different language). Most projects using Tatoeba's data focus on links between different languages, but Tatoeba could be used for monolingual studies, too. And current linking system is perhaps the most suitable way to do it unless there are two types of links (one for translations and one for synonyms). Some users leave comments to show synonymous/closely-related sentences, but I don't find it very useful for searching and studying.

Finding such sentences could also be possible by linking them separately to the same sentence in a different language and searching for indirectly-linked results, but this is still not a translation-independent approach. One shouldn't have to find a matching sentence to be linked to in a foreign language before creating synonymous groups in his native language. This would likely end up treating languages like English or French as superior to other languages.
gillux
2019-04-06 17:59 - 2019-04-06 18:24
I’ve been playing around with our default search ranking algorithm. I insist on the "default" part because that’s what the vast majority of visitors use. I also focus on searches that do not use double quotes or any special trick. Just plain words. Again because that’s what the vast majority of visitors use.

Our current way of ranking results is pretty basic: it searches for sentences that include all the words (eventually stemmed) and sort them by total number of words in the sentence.

A problem with this approach is that the order of the words is ignored. The top result of searching for "you go there" is "There you go!" because it’s a shorter sentence than "You may go there."

Ignoring word order is especially catastrophic on languages without word boundaries, like Chinese, because the searched characters are randomly reordered into something totally unrelated. For example, the results for "可不可" in Chinese are cluttered by irrelevant "不可something". Same for kana words in Japanese.

In order to address this problem, I tentatively tweaked the default ranking algorithm on https://dev.tatoeba.org/ into something that prioritize, in the following order:

1. sentences that contains an exact match (like if searching for ="you go there")
2. sentences having the "longest common subsequence" (LCS, [1])
3. sentences having the least number of words

[1] https://docs.manticoresearch.co...anking-factors

However, I don’t know if this new ranking suits everyone out there. What do you think?

You can compare the search results on https://tatoeba.org/ (old ranking) and https://dev.tatoeba.org/ (new ranking). You can run a search on tatoeba.org, and then add "dev." in the URL bar and press alt+return to open a new tab.
hide replies
AlanF_US
2019-04-06 22:10 - 2019-04-06 22:11
I do prefer a ranking that favors exact matches over stemmed matches. Longest common subsequence also sounds good. But sentences having the least number of words are often not the ones I want to see most. I prefer slightly longer ones that give me more context. For that reason, I always choose random ordering. It doesn't always put the sentences that I want at the very top, but at least I have a good chance of finding them without having to go through pages and pages of very short sentences. Also, providing a mix of sentences that is random with regard to sentence length lets people see more diversity. I think that's a good thing.
hide replies
CK
CK
2019-04-07 02:34 - 2019-04-07 02:37
Maybe you wouldn't want this for the default search, but I wonder if it would be possible to add a "minimum word" option to the advanced search. This might prove useful. For example, members could still opt to sort by length, but start by showing sentences that are over a certain length up to 1,000 results.

More involved, perhaps, but an additional idea might be to have a "maximum length" option, too. This would allow members to have search results displayed randomly between 5 and 12 words in length, for example.
hide replies
AlanF_US
2019-04-07 13:42
Yes, I can imagine that a "minimum length" and a "maximum length" option would be useful. However, the nice thing about favoring sentences that meet a certain criterion rather than limiting them to that criterion is that if there are not enough sentences that meet it, you will automatically see the other ones without having to remove the criterion and do another search. I imagine that if I set a minimum and/or maximum length, I would often eliminate some of the fallback sentences I'd like to see, and then I'd have to do a follow-up search.

Indeed, I wouldn't want the default search to be optimized for my particular needs, which would be something like "favor sentences from five to ten words in length". However, I worry that optimizing it for choosing the shortest sentences would pessimize it for people like me, whereas leaving out a criterion of length would allow people to see a variety of sentences, short and long.
Thanuir
2019-04-08 14:48
If I could choose and there were no computational costs:

1. An exact match of the sentence.
2. Sentences with exact match of the query as part of it.
3. Sentences with all the exact words, but possibly in different order or with other words between them.
4. Sentences with all the words, but with stemming and the order might be different etc.
5. Sentences with all but one of the words (with stemming and could be in any order).
6. Sentences with all but two of the words (with stemming and could be in any order).
7. And so on.
8. Sentences with even a single searched word (with stemming).

Random order within the categories. (Some of the categories could be sorted into even finer subcategories, but probably not worth it.)

For example search: haluan kalastaa tänään
1. Haluan kalastaa tänään.
2. Minä haluan kalastaa tänään, niin kuin eilenkin.
3. Tänään minä haluan kalastaa. Haluan kuitenkin kalastaa tänään.
4. Haluankin kalastaa tänään. Haluatteko te tänään tai huomenna kalastamaan?
5. Haluatteko elokuviin tänään?
6-8. Karhut kalastavat lohia.

The idea would be to first have the precise phrase and then to have increasingly distantly related phrases, which hopefully would still give some understanding of the involved words.
hide replies
gillux
2019-05-14 14:17 - 2019-05-14 14:20
@Thanuir @AlanF_US @CK

Thank you for your feedback. I agree about the relative uselessness of having very short sentences showed first. The idea of randomizing the results within a category, like Thanuir said, is appealing (giving the order is deterministic), but I’m afraid it could be a little bit confusing. I temporarily set up https://dev.tatoeba.org/ like that, please let me know what you think.

Or, if we are to rank using the number of words, what would be the ideal number? Not too long and not too short. It depends of the language of course. Here are some stats about the average number of words per sentence in every language on Tatoeba: https://gist.github.com/jiru/81...5917dc18325fc2

I wonder if we could use these numbers to boost the ranking of sentences having a number of words close to the average, with a formula like rank = –abs(average – words)
hide replies
AlanF_US
2019-05-14 16:09
> Or, if we are to rank using the number of words, what would be the ideal number?

When I look at the lists of sentences that I've compiled for my own learning, the length of the sentences does tend to be pretty close to the average for the language (5+ for both Hebrew and Russian).
Thanuir
2019-05-15 07:38
One issue with randomness is that reproducing problems or strange behaviour would be more difficult. Maybe displaying the sentences newest first would be an alternative that adds some amount of randomness while retaining reproducability?

...

Using the average number of words, as suggested by @gillux, would create an alternating pattern that @CK suggests, but without the initial emphasis on slightly longer sentences. It would not be terribly difficult to create a function that would have the type of behaviour that @CK wants when the absolute value in gillux's formula was replaced by it. The function would have to be a piecewise defined function with three linear pieces, or a more complicated one. I do not know how computationally expensive it is to deal with a piecewise defined function.

...

One thing I belive would be good would be to show sentences that contain most of the right words, but not all. For example, I tried searching for the English idiom "to talk through one's hat" using two queries: "talk through one's hat" and "He is talking through his hat." with no results. Presumably the sentences are not there. But if the sentence with "Tom" rather than "he" was there, or the sentences with she/her was there, I would not find it. Or the sentence with you or I.

The use case for searching for the idiom might be that I am trying to understand it or that I would like to translate it. Both would be helped by the search finding sentences which match only some of the words in the search query, but presenting them after the sentences with all the words.
hide replies
gillux
2019-05-15 12:35
Yes, we definitely need randomness to be reproducible (and unpredictable, to avoid "rank boost threats") if this is the direction we’re taking. If I give you a search result URL, I expect that you see the same results as me, and that it stays more or less the same for a little time. I believe it is technically possible to produce a random, deterministic and unpredictable order.

I am concerned that boosting sentences having a number of words close to average is going to be detrimental to diversity, because it’s a incentive for contributors to produce standard-sized sentences. Isn’t there a risk of uniformization of the corpus? Or do we actually want more example sentences that are "efficient" and "standard"? I’d like to know @TRANG's opinion on that matter.

The idea of showing newest sentences first (after sorting by exact matches and LCS) is interesting. It surely adds some randomness, but it’s also an incentive to produce new sentences, and it gives more exposure to new or active users.

Since there is no consensus on an alternative to sorting by number of words, for the moment I’m going to change the default search ranking the way I described on the first post of this thread. We may further improve it later on.
hide replies
Aiji
2019-05-17 00:18
I did not completely follow the conversation, but I'd like to answer the following

> I am concerned that boosting sentences having a number of words close to average is going to be detrimental to diversity, because it’s a incentive for contributors to produce standard-sized sentences. Isn’t there a risk of uniformization of the corpus? Or do we actually want more example sentences that are "efficient" and "standard"? I’d like to know @TRANG's opinion on that matter.

From my personal experience, that would definitely uniformize the corpus in its shape (not necessarily in its content). As you said, that's not so difficult to imagine that sentences with a number of words close to the average of a language would produce a huge amount of "standard-size" contributions.
For Latin languages for example, we would probably have something close to
Subject + Verb + Article + Adjective + Complement
and still from a personal point of view, that would provide SO many similar and boring sentences to translate, I'm pretty convince that would hinder my contributions. The most interesting sentences are CLEARLY NOT the one around the average. Well of course, you may encounter some nice expressions, or interesting words, but the vast majority would be "I borrowed a pen to Tom." instead of more elaborated, interesting sentences.
I'm always translating from the English corpus, and when I do a lot of translating at the same time, I always end by skipping several sentences because I feel like "AGAIN this sentence ?!" I can only imagine my feeling if the search would be biased to serve me more similar sentences... (I know English is special, but I guess the problem would be similar for the TOP 10 language at least).
TRANG
2019-05-19 12:29
> I am concerned that boosting sentences having a number of words close to
> average is going to be detrimental to diversity, because it’s a incentive
> for contributors to produce standard-sized sentences.

The main factor for a diverse corpus is to have a diverse group of contributors, in my opinion. Next to that, the search ranking probably has very little impact on the kind of sentences that people create.

If contributors were paid every time their sentences are displayed in a search result, then I guess that would be a high enough incentive to produce sentences based on the ranking. But even then, unless they earn a living out of it, I think they will still naturally produce standard-sized sentences no matter the ranking because it's just easier to produce such sentences.

So no worries about influencing diversity here.

What we have to consider is: what is the default usage of the search that we're trying to cover?

My personal usage:
- I'm trying to figure out how to say something in a foreign language and I'm missing vocabulary or grammar knowledge.
- I saw a new word/phrase in a foreign language and I want to understand its meaning or see examples of how it's used.

For these use cases, shorter sentences are in general easier to analyze. So it makes sense to order by number of words.
But if the sentence is too short, it may be lacking context and may not be as useful as a longer sentence. So prioritizing average-sized sentences could make sense.
But average-sized sentences might not always be the most useful for everyone either. Randomizing the results also makes sense: it simply means we don't want to make assumptions about what size is "best".

Random order sounds appealing actually, but I wouldn't change to that until we gather more specific information about the issues of ordering by number of words.

I looked at the pageviews for the search in Google Analytics for the month of April.
- Pageviews with order=words: 12,990
- Pageviews with order=random: 8,339
- Pageviews with order=created: 1,198
- Pageviews with order=modified: 259
(Total pageviews for /sentences/search: 223,174)

It seems that when given the choice, people choose in majority to order by words.
hide replies
gillux
2019-05-20 10:40
Thank you for the numbers, that’s valuable information.

This shows that 90% of the visitors making a search are using the "simple search" (top bar or front page), and 10% the advanced search (advanced search page or "more search criteria" block).

> It seems that when given the choice, people choose in majority to order by words.

However it’s not a fair choice because you can’t tell the visitors who made a choice (clicking on the dropdown, examining the choices and choosing) from the ones who didn’t (glancing over the dropdown or not even seeing it, and using the default value). order=words being the default, I believe it is overrepresented.

I find the number for order=random surprisingly high.
hide replies
AlanF_US
2019-05-20 10:58
I agree.
Guybrush88
2019-05-20 11:14
personally, I use order=random in the advanced search because generally I see from there more diversity (whether I use it for translating sentences or for tagging existing ones), since, in a general way, sorting by word order might present similar patterns (which I use whenever I want to translate or tag the same pattern)
hide replies
deniko
2019-05-20 11:21
Same here.

I use the "fewer words first" mode when I use tatoeba to actually find a way to translate something, and the "random" mode when searching for sentences to translate.

If I need exact matches, I still use "fewer words first", but also the syntax like ="gesundheit"
TRANG
2019-05-25 05:17
I've extracted the stats since January 2018 to have a broader view on advanced search usage:
https://docs.google.com/spreads...it?usp=sharing

I would have thought that the order=random was high because it's the default option on the "Translate sentences" page, but it was high even before.

November 2018 is when we changed the default option on the "Translate sentences" from "created" to "modified" to "random" (cf. https://github.com/Tatoeba/tatoeba2/issues/1351). It's actually interesting to see how that influenced the order=created and order=modified.

I'm not sure why the "random" option spiked so much in March...

But in any case, there has been a few months where the advanced search was used more often with order=random than with order=words, even though order=words is the default.
hide replies
gillux
2019-05-26 13:52
Very interesting! Based on this data, I suggest that we ditch order=created and modify the "relevance" algorithm to randomize results within exact matches and LCS matches.
hide replies
TRANG
2019-05-26 23:18
Yes, we can definitely introduce randomness into the "relevance" algorithm. At the very least we can experiment with it for a month or two, see if anyone complains and then can check against the analytics to see if we do keep it like that or not.

I'm not entirely sure about removing order=created, but I cannot think of a use case where ordering by created does something more useful than ordering by modified. I can only think of one inconvenience: if someone has bookmarked or has shared links to search results with order=created, those links would not work as initially intended anymore. I don't think that's a blocking issue for ditching order=created though, considering how little it is used.
CK
CK
2019-05-26 23:59
> I suggest that we ditch order=created and modify the "relevance" algorithm ...

I'd vote to keep the "order=created." I often use it.
Couldn't we have both?
hide replies
gillux
2019-05-27 05:15
May I ask what is your use case?
hide replies
CK
CK
2019-05-27 05:20 - 2019-05-27 05:44
- Find recently-contributed sentences in English on List 907 that don't yet have audio, so I can record them.

https://tatoeba.org/eng/sentenc...=&sort=created

- Find my own recently-contributed sentences without audio.

https://tatoeba.org/eng/sentenc...=&sort=created

- Find recently-contributed Japanese sentences that have no English translations

- Find recently-contributed English sentences that have been released, so I can adopt the good ones.

Perhaps there are other things I do from time to time, too.





kemushi69
2019-05-22 01:10
Reproducible randomness is certainly possible from a mathematical/cryptographic point of view. You do need to think carefully about whether it's possible to reverse-engineer personal information from generated pseudo-random (permalink) seeds, though.
CK
CK
2019-05-16 02:01
Note that the "advanced search" on dev.tatoeba.org doesn't work properly now.

This search should show English sentences with the word "winter" sorted by fewest words.

> https://dev.tatoeba.org/eng/sen...io=&sort=words
hide replies
gillux
2019-05-16 14:28
Thanks. I reverted it back to normal.
cojiluc
2019-05-27 13:59
An alternative sorting algorithm would be sorting by "vote". I remember once I read in the discussions on the wall that there was a suggestion to implement a voting system ( positive or negative vote) that permits the users to vote the sentences: here are some advantages:

1- Good sentences or high quality sentences (in any sense you consider) are likely to have more positive votes
2- Bad sentences are likely to have more negative votes

In this way, we can have an alternative sorting algorithm (not necessarily for the default sorting, but just an alternative sorting).
hide replies
Thanuir
2019-05-27 14:44
This would be more relevant when there are more people rating sentences. If implementation is not difficut, then one could, of course, do it already now, even though most sentences have between zero and two ratings of any kind.

Also, sometimes one might want to see bad sentences (to fix them or neutralize them). This is not really a concern with the default search, but might be with custom searching.
Ricardo14
2019-05-22 00:45
I think it has already been pointed out but maybbe it's a good idea to point it again

The vocabulary feature is a great tool. It helps us to study languages by adding words in languages we don't speak fluently and want to see it used in contexts.

However, looking for https://tatoeba.org/eng/Vocabul...sentences/por, I've found words that don't exist in Portuguese like "decoy", misspelled words like "libro de cabeceira" (it should be "liVro") and also things like "Posso colocar aqui verbos usamos em nossos dia." (I can add here verbs we use every day), "Caso precisem de alguma informação com relação ao Português Brasil posso ajuda-los." (In case you need any information about Brazilian Portuguese I can help you)

http://prntscr.com/nrnhj9

It'd be good if we can correct these requests or even delete some.

https://tatoeba.org/eng/Vocabul..._sentences/eng (English)

Make it up - Помириться (Russian)
communicative language teaching (not a word)
That's what this is all about. - Это то из-за чего это всё. (Russian again)
щзхлзщхлщзх (Can't belive this is in Russian or any other language)

http://prntscr.com/nrnj8o
hide replies
Thanuir
2019-05-22 06:57
I agree with the problem of the list being cluttered with typos and other mistakes.

I do think that for example "communicative language teaching" is completely fine on the list, though. One might very well understand all the component words and have example sentences with them without understanding the whole term or having it in any sentence.

With English this is especially bad, since the language uses few compound words; "language teaching" rather than "languageteaching", in marked contrast with many Germanic languages and Finnish, for example.

At least in scientific English there are many terms that refer to a concrete thing, but due to features of the language, they are written as a collection of separate words. "magnetic resonance imaging", for example, or "inverse problem".

There are also cases like the Danish "rejse" (to travel) versus "rejse sig" (reflexive form, to get up e.g. from a sitting position). It should be completely valid to have "rejse sig" as a vocabulary item.
raggione
2019-05-22 10:37
Full supprt - I'd like to get my hands at the requests for German, which need to be sorted badly.
TRANG
2019-05-22 23:34
Related issue on GitHub: https://github.com/Tatoeba/tatoeba2/issues/1473

To solve this, one possible approach is the one suggested in the GitHub issue: let corpus maintainers edit/delete any vocabulary items. This approach comes with a couple of problems however.

1) What if corpus maintainers edit/delete items that could have been valid?

This problem already shows up here in this thread, as you, Ricardo, said you would have deleted "communicative language teaching", while Thanuir would have kept it.

2) Users don't necessarily add vocabulary to get example sentences.

I can imagine that some users just want to make a list of words/phrases they want to learn and they find it useful to have the translations next to each word/phrase. That's probably why you have those items that contain both English and Russian words. Since the vocabulary feature doesn't allow to connect vocabulary between two languages, users will of course put the two languages into one item if they need to. They're not exactly doing something wrong, they're just adapting the feature to their needs. I feel that deleting or editing their items would not be the proper thing to do.

My suggestion would be to allow contributors to ignore items on the "Sentences wanted" page.
- Ignored items would no longer be displayed on the "Sentences wanted" page.
- Ignored items would be individual, which means that if I ignore an item, it is ignored only for me. If other people don't want to see an item, they would have to ignore it themselves.

In parallel to that, we could split the vocabulary items into two categories:
- A "regular" list where users just create a vocabulary list for themselves.
- A "needs sentences" list where users add vocabulary for which that they explicitly want example sentences. Only vocabulary items in this list would appear on the "Sentences wanted".

Those are my personal ideas. There is no clear decision yet on how we want to handle the "bad" vocabulary items and if you have better ideas, please do share.
hide replies
gillux
2019-05-23 08:20
To me, the root of the problem is that members are unable to put the correct information in the "add vocabulary items" form. And no one can blame them for that. The process of turning an "I want sentences like that" idea into correct values for "language" + "searchable vocabulary item" is indeed difficult, and probably unclear at first. Members don’t exactly know that their vocabulary items are going to be interpreted as search queries, with the number of results displayed, all that listed on a public page. I can see some members are adding whole sentence pairs as vocabulary items, or a semicolon-separated list of items. I can imagine some members assume that what’s under 'my vocabulary items' is only visible to them. I guess some members do not care that much about the flag because they know the language already. After all, when writing down your own vocabulary list, does anyone bother writing the language name?

That is quite of a UX challenge in my opinion. I have no great idea to solve it but there is certainly room for improvement. I’m pretty sure most people do not read the bottom-right "Tips" block that says "Add vocabulary that you are learning. If your vocabulary does not exist yet in Tatoeba, other contributors can add sentences for it."

I don’t think a per-member black list would help. I can make out two categories of items people would like to hide: (1) items that are not meant to achieve 10 sentences (because they are garbage, completely invalid or personal) and (2) items that won’t achieve 10 sentences because of incorrect values. As for category 1, everyone wants to hide them from the public list. As for category 2, they are valid requests that anyone can be interested in, it’s just that Tatoeba isn’t smart enough to show the correct number of sentences.

Splitting the vocabulary items into personal vs. wanted sentences is an interesting idea, but in the end I don’t think it will solve the problem of having "forever 0 sentences" items cluttering the list.

Some other ideas:
How about allowing members to edit the language of their own items?
How about allowing corpus maintainers to edit the language of other members’ items?
How about somehow dividing the vocabulary items into two categories: the ones that can be readily used as search queries, and the ones that cannot. For the ones that cannot, instead of having a link to a search, we allow to directly add sentences that are assumed to match the item and are counted as such.
hide replies
TRANG
2019-05-25 03:30
> That is quite of a UX challenge in my opinion.

Oh yes, and we definitely won't be solving everything at once...

> How about allowing members to edit the language of their own items?

Yes, we should. There's an issue about this: https://github.com/Tatoeba/tatoeba2/issues/1238

> How about allowing corpus maintainers to edit the language of other
> members’ items?

There's the problem that corpus maintainers may have wrong assumptions about the language of the item.

> How about somehow dividing the vocabulary items into two categories: the ones
> that can be readily used as search queries, and the ones that cannot. For the ones
> that cannot, instead of having a link to a search, we allow to directly add sentences
> that are assumed to match the item and are counted as such.

I think this issue is related: https://github.com/Tatoeba/tatoeba2/issues/1281


Another idea: we could exclude old items from the "Sentences wanted". If an item has been added more than 30 days ago (could be more, could be less, could be customizable by the user who is adding sentences), we don't display it anymore. It would show up again only if another user would add the item, or if the same user would re-add the same item.
hide replies
Thanuir
2019-05-25 04:37
Could the time limit be keyed to the activity of the user, instead? If a user has not contributed to the site for a year, no longer display their wanted vocabulary. Otherwise, one would have to continuously upkeep their vocabulary for it to survive, which sounds like busywork or a discouragement.

I would also suggest that one month is far too short. If a word is simple, sentences might be added in a month, but they might also be there already. Most legitimate words that linger for long would presumably be obscure or in languages where people do not actively contribute sentences to wanted words. In this case, even a year sounds like a short period of time.
hide replies
TRANG
2019-05-25 22:59
> Could the time limit be keyed to the activity of the user, instead? If a user has not
> contributed to the site for a year, no longer display their wanted vocabulary.
> Otherwise, one would have to continuously upkeep their vocabulary for it to survive,
> which sounds like busywork or a discouragement.

Well, we would also need to take into account the contributor's point of view (and by "contributor", I mean the person who is creating sentence based on vocabulary). If they have no idea what to do with a vocabulary item (because they're not inspired, or because the item is incorrect) and that item stays on the "Sentences wanted" page for until one year after the vocabulary owner becomes inactive, that would be pretty long.

I get your point though. Having to actively "bump" your vocabulary requests every once in a while can be annoying.

We could implement the time limit as an option rather than a default. By default we could still display vocabulary items of all dates, then the contributor can choose to only show the more recent ones if there are too many undesirable items.

But I'm starting to wonder if the approach of displaying vocabulary items is the best approach. Perhaps we should instead display users who are requesting vocabulary so that the focus is more about connecting members with each other. It would be more engaging and, I feel, more efficient for getting rid of "bad" vocabulary items.

Since I would browse the items of a specific user, I would know who to contact to let them know that something is wrong in their vocabulary list, and they can correct it. And if they never correct it, it probably wouldn't bother anyone because the items would be "isolated" in that user's vocabulary list, as opposed to being mixed up with everyone else's items like it is now.
hide replies
Thanuir
2019-05-26 05:55
I use the vocabulary feature both ways. In case a user case in interesting:

1. When reading a text and encountering a word I do not understand, I sometimes add it to my vocabulary. (This depends on the context where I am reading, more than anything else. How much time I have, for example.) Likewise, if there is a word I will or would like to use, I add it to my vocabulary. When adding something to my vocabulary, I also check the sentences that already are there that the built-in search finds.

I very occasionally go through my vocabulary, removing words that I am satisfied with, and checking the entries Tatoeba finds with the search for those terms I do not yet quite understand.

2. As a contributor, I sometimes go through the wanted Finnish words and add sentences to those. The problem is that I seem to be the only one doing this with any regularity, so I tend to add similar sentences several times. Sometimes I come up with a clever idea for a sentence and find that it has already been added, often by me. There are several wanted Finnish words, so I can just skip the ones that I find hard, or occasionally I do a little bit of research to find how the word is actually used or to find out some facts about it, or even what it precisely means if the word is obscure enough.

I do not think there are any wrong Finnish vocabulary items at the moment. If there were, I would just have to ignore them, or add them once to a sentence that points out the misspelling or misunderstanding. Adding ten such sentences to get the word out of the vocabulary list would not be inspiring at all. But a single sentence would be fine.

...

In my experience, adding words to the vocabulary is like putting a postcard in a bottle and throwing it into the sea - maybe someone will find it and add the word at some point, but there are absolutely no guarantees, and it never happens fast.

Languages: Mostly Danish and both Norwegians, occasional scientific or obscure English, couple of obscure Finnish words, and maybe something else.

Likewise, the list of requested Finnish vocabulary seems to be fairly static, so I consider it as a source of inspiration and a puzzle game, as well as a way of getting to know some obscure words, rather than as actively helping someone in the moment. Actively helping someone would be nice, though, but it also works as is for me.
jegaevi
2019-05-25 18:55
Kedves magyar tagok!
Az utóbbi 15 napban összesen 10180 mondatot olvastam át. Sokhoz írtam hozzászólást. Terveztem, hogy átnézem az összes mondatot, hogy a lehető legkevesebb hiba legyen a magyar mondatok között. Ezzel senkit nem akartam bántani, zaklatni, minősíteni. Tényleg sajnálom, ha valaki ezt negatívan élte meg. Nem tudom, hogy a továbbiakban érdemes-e ezt folytatnom, mert őszintén, nagyon elment tőle a kedvem. És úgy látszik, hogy mások is csak zaklatásnak veszik, és tiszteletnek ítélnek meg miatta. Inkább maradok a fordításnál és audio feltöltésnél. Bocsánatot nem fogok kérni senkitől, mert úgy gondolom, semmi olyat nem tettem, ami ezt megindokolná. Én továbbra is nagyon szívesen fogadom a hozzászólásokat a mondataim alatt, sőt nagyon örülök, hogy valaki veszi a fáradtságot, hogy kijavítson, vagy csak javaslatot tegyen.

További jó Tatoebázást kívánok mindenkinek! :)
hide replies
maaster
2019-05-25 21:10
Én örülök az erőfeszítéseidnek, és részemről nem gond, ha írsz a mondataimhoz megjegyzéseket. Azonban nem vagyunk egyformák, egyesek stílusa nem túl megnyerő, morcos medve stílus - sőt szentnek, sérthetetlennek és tévedhetetlennek gondolják magukat; soha nem fogják megköszönni vagy esetleg válaszra méltatni az észrevételeidet.
A lényeg, hogy örömödet leld abban, amit itt csinálsz, és tanulj belőle.