clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search

Wall (5,772 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages subdirectory_arrow_right

bill

3 hours ago

subdirectory_arrow_right

Ricardo14

7 hours ago

feedback

Aiji

7 hours ago

feedback

CK

10 hours ago

subdirectory_arrow_right

lbdx

23 hours ago

subdirectory_arrow_right

Thanuir

yesterday

subdirectory_arrow_right

gillux

yesterday

subdirectory_arrow_right

AlanF_US

yesterday

subdirectory_arrow_right

CK

yesterday

feedback

AmarMecheri

yesterday

Ooneykcall Ooneykcall 9 days ago March 28, 2020 at 9:28 AM link permalink

So Tom_Facts was suspended. Kinda sad, isn't it? Obviously an alt account made to post joke sentences, which may technically be against the rules, but those sentences were all linguistically sound, just didn't make ordinary sense, for example this one #8639644. It was a nice set of jokes that in a way hearkened back to "Tom and Mary in the land of sentences" from the olden days. I am very much not happy that those funny sentences are rendered illegitimate and closed for further translations. Could an admin please unblock them, since they are grammatically correct and understandable, regardless of not making sense in the real world.

{{vm.hiddenReplies[34630] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 9 days ago, edited 9 days ago March 28, 2020 at 9:30 AM, edited March 28, 2020 at 9:48 AM link permalink

It was not for offensive reasons, nor because some don't make sense, but for legal reasons.

See my comment here:
https://tatoeba.org/eng/sentenc...omment-1167887

TRANG TRANG 9 days ago March 28, 2020 at 9:45 AM link permalink

In case this situation is saddening more people, then please know that there is a technical solution: https://github.com/Tatoeba/tatoeba2/issues/1659

But somebody has to volunteer to implement it.

Until that is implemented, we unfortunately cannot accept contributions when they were very obviously mass-copied from a certain source, and that source is not compatible with CC BY.

Alternatively, someone could try to negotiate with the owners of https://chucknorrisfacts.net/ and see if they would be willing to allow reuse of their content.

{{vm.hiddenReplies[34632] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ooneykcall Ooneykcall 9 days ago March 28, 2020 at 10:13 AM link permalink

> Alternatively, someone could try to negotiate with the owners of https://chucknorrisfacts.net/ and see if they would be willing to allow reuse of their content.

That's what I'm going to do. It's just a joke after all, figures they wouldn't be stuck-up about it. I think preserving the sentences in their original form, with Chuck Norris as the protagonist, would be better though, so if this goes well, I suppose what we do is you set them free (of Tom_Facts' "ownership"), I adopt them and mass edit Tom to Chuck Norris and edit my translations accordingly and ask deniko to edit his and I think no one else translated those sentences other than the PIN code one. Bit tedious but there are only 192 sentences, it wouldn't take that long.

{{vm.hiddenReplies[34633] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 9 days ago March 28, 2020 at 10:39 AM link permalink

You would have to convince them to officially add their licensing terms on the website. If you find a contact point via email, please include team@tatoeba.org in CC.

I would not be as confident as you that they would agree on allowing reuse of their content under a CC BY compatible license. In the end it depends what is their ads revenue.

Making their content CC BY compatible means that anyone can technically and legally just pump their content and create a website that directly competes with them. As a result, there will be a risk that less and less people visit their website. If they make little to no money from ads, they won't care about this risk. But if they make a significant amount of money, I would be extremely surprised that they agree to take such a risk.

{{vm.hiddenReplies[34634] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ooneykcall Ooneykcall 9 days ago, edited 9 days ago March 28, 2020 at 11:06 AM, edited March 28, 2020 at 11:10 AM link permalink

In that case, if we keep 'Tom', wouldn't it help solve the issue if the site admins agreed to accept derivative content and not direct copying. That's different since a hypothetical "not Chuck Norris facts" site obviously couldn't compete as people know it's supposed to be about Chuck Norris and would think something's fishy.

I don't see why it has to be an official request, really... I mean, we're just normal people having fun, not sleazy lawyers. It's my personal initiative/request and I've sent it as such. If I'm granted any permission I'll have the email for proof, no problem.

{{vm.hiddenReplies[34636] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 9 days ago March 28, 2020 at 12:13 PM link permalink

> In that case, if we keep 'Tom', wouldn't it help solve the issue if the
> site admins agreed to accept derivative content and not direct copying.

No, it wouldn't be compatible with CC BY. We wouldn't be able to keep these sentences as part of the corpus that distribute because of this scenario:

- They allow derivatives of their jokes.
- We also allow derivatives of our sentences (that's the nature of CC BY).
- Someone copies sentences from Tatoeba into their website and changes ever sentence with "Tom" to "Chuck Norris".
- Someone has then indirectly copied sentences from the Chuck Norris facts website.

> I don't see why it has to be an official request, really...

Because it would be irresponsible and disrespectful to ignore intellectual property.

https://en.wiki.tatoeba.org/art...ting-sentences

{{vm.hiddenReplies[34640] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ooneykcall Ooneykcall 9 days ago March 28, 2020 at 1:26 PM link permalink

- They allow derivatives of their jokes.
- We also allow derivatives of our sentences (that's the nature of CC BY).
- Someone copies sentences from Tatoeba into their website and changes ever sentence with "Tom" to "Chuck Norris".
- Someone has then indirectly copied sentences from the Chuck Norris facts website.

Are you expecting that to happen, really.
You didn't use to be so concerned with lawyerese I reckon, sad times if you have to fear someone could cry havoc over this, because nobody in their right mind would, but sick minds specialise at giving sound minds a headache.

> Because it would be irresponsible and disrespectful to ignore intellectual property.

That's no answer as to why there would be any need for officialdom. I'm communicating as myself, a private person not a legal entity. All I need is to make sure the admin(s) of that site approve of adding their sentences here and have no mind to object to it.

Chuck Norris jokes weren't invented by that website, too, it's the other way. They've added new jokes over the years, obviously, but most of those seem to have been submitted by others. How much of that copyright actually holds, hmm.

{{vm.hiddenReplies[34642] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 9 days ago March 28, 2020 at 3:09 PM link permalink

> You didn't use to be so concerned with lawyerese I reckon

I am not concerned when the sample of copied content is so minimal that we can still make the assumption that the contributor came up with the sentences on their own.

I have however always been concerned whenever we could very clearly identify the original source of the copied content.

> That's no answer as to why there would be any need for officialdom.

I will try to explain more clearly.

As long as *all* our sentences are exported into a dataset that we distribute under CC BY, it would be irresponsible and disrespectful to ignore intellectual property. We, Tatoeba, have some liability in regards of what is inside this dataset.

Yes, it is crowdsourced and it is not possible for us to monitor every single sentence. But that doesn't take away from us the responsibility to try our best to be compliant with the laws. If we can't do it, we have to stop releasing our content under CC BY. But I'm not going to sacrifice open data just for Chuck Norris jokes.

Personally I'm not going to be nitpicky about a handful of non-CC BY sentences. But there's a point where it starts to be a bit too much. Having close to 200 sentences gathered under the same account makes it very, very obvious what the source is. And when it becomes very obvious what the source is, it is too much.

Now again, the main blocking point is that *all* our sentences are exported into a dataset that we distribute under CC BY. If we could somehow exclude some sentences from our exports, that would solve the problem. Well it turns out that sentences set as "unapproved" are not exported. So that has been our temporary solution for keeping non CC BY sentences in Tatoeba.

But this poses another problem: we will have sentences in red even though they are correct. To solve that, we have a technical solution: https://github.com/Tatoeba/tatoeba2/issues/1659.

Once that is implemented, we can much more safely allow people to copy sentences from other sources because we can easily remove obvious non CC BY content from our CC BY dataset.

The content will no longer be labelled as "CC BY" data, it will just be on our website labelled as "no license" or "unknown license" or whatever else that makes it clear it's not for reuse (or if one chooses to reuse, it will be at their own risk).

We will still encourage everyone to cite the sources, to give attribution where it's due. We will also still remove content from Tatoeba if the original authors ask us to do so. But we don't have to worry about license compatibility.

Also, please be aware that is intellectual work on creating a collection of sentences even if each sentence of the collection are individually free of intellectual property.

For instance you can create a list of "1000 most common sentences". If you take each sentence individually, you can't argue that you own these sentences, millions of people have used them before you. But if someone was to take your exact list and publish in their language learning website as "1000 sentences for beginners", then they basically ripped off your work. Because it is intellectual work to come up with criteria for selecting what is more common.

In the case of Chuck Norris facts, they built a website, they have people accepting or rejecting the submissions, they set up a whole infrastructure to provide collect and share these jokes. It *is* work, is it intellectual value. They cannot claim that each joke belongs to them, but they can claim that the collection of jokes belongs to them. Not only that, but they have ads on their website, so it is also money. Personally, when money is involved, I don't want to make any sort of optimistic assumption.

I hope this clarifies things.

{{vm.hiddenReplies[34643] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ooneykcall Ooneykcall 9 days ago, edited 9 days ago March 28, 2020 at 4:04 PM, edited March 28, 2020 at 4:04 PM link permalink

Thanks for taking your time to present a detailed account. It's not like I had zero understanding, but having an explanation laid out is quite a bit helpful; nice of you to write one up.

This analogy doesn't seem to apply to Tatoeba as it is since sentences are added individually though? I mean, I *could* add 1000 common sentences from such a list as you describe, and this couldn't be copyright violation since nobody owns any of those sentences individually. It seems there's no violation as long as I don't combine them into a list similar to the original one; only then am I arguably using someone else's work.

I have to agree that adding those sentences under an alt account set up specifically for that purpose makes it too obvious. But if a regularly contributing account/user, such as me, added common sentences from a certain source along with other sentences so their existence as a relatively large set wouldn't be visible, it would be realistically no problemo, hmm? Since it only becomes a real problem when you can easily trace those sentences and conceive of them as a single list, rather than stumble upon invididual items that do not suggest a bigger list exists. That is, if those sentences aren't of the sort that makes you think right away they might come from a single author, which Internet memes are not as a anyone may want to create more phrases with the same jocular pattern.

{{vm.hiddenReplies[34644] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 9 days ago March 28, 2020 at 4:54 PM link permalink

Though I realize you're probably posing your question hypothetically, it sounds to me like "Who cares whether it's wrong as long as we can come up with a scheme for getting away with it?" Aside from the ethical problem, it seems like a bad idea to rely on our speculations, as non-lawyers, about what might or might not get us in trouble with the law. In any case, I don't think Tatoeba is so hard up for sentences that we need to steal them.

{{vm.hiddenReplies[34647] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ooneykcall Ooneykcall 6 days ago March 31, 2020 at 2:14 PM link permalink

> Who cares whether it's wrong as long as we can come up with a scheme for getting away with it?

You mean legally wrong. Common jokes shouldn't be copyrightable if we're talking morally, but oh well.

Aiji Aiji 6 days ago March 31, 2020 at 12:15 AM link permalink

Let me add a pinch of explanation because I think it's important. The short answer is: no, it won't work. It could even worsen things, because a lawyer could accuse you of being fully aware of your copyright infringement and trying to hide it (aggravating circumstance). It's like saying "I laundry my dirty 2 millions, but only 2 000 by 2 000. The authorities won't catch me so it's safe."

The scenario is simple. Suppose I own the copyright on contents of some sort, here, sentences. I build a script to search the web for potential copyright infringement (if I have a lot of money, I pay for such script). Above a certain threshold (chosen more or less arbitrary), it is decided that "potential" becomes "very likely". The two common ways to deal from here are the following:
- I'm relatively a nice person so I contact you, explain the problem and ask you to kindly deal with the situation.
- I don't care at all, and demand the removal of your copyrighted content otherwise I'll send you my lawyer.

That's how it works on Youtube for example. Copyright owners won't care who the content creators are, they'll strike their videos, period. You're just a science communicator who makes instructive videos? Well, too bad you used 17 seconds of Black Eyed Peas. Talking about life in the U.S.? Well don't use Mario Main Theme.
Sometimes the content creator will make a phone call, find a reasonable people and a happy outcome (or they belong to a powerful network that can negotiate...), sometimes they won't.

And don't think that popular stuff, the folklore, or something that everybody know is safe. That's how copyright scammers work on Youtube. They claim ownership on any piece of content that is not copyrighted to be able to strike videos and get the corresponding share of the money. Sometimes they even try to claim copyright over content that were officially release for free-use...

Of course, there's no need to be paranoid. But being aware that some people are merciless help being cautious. Since Tatoeba ask people to add their own sentences, it's normal to expect that people will be cautious about adding copyrighted (or possibly copyrighted) content.

rumpelstilzchen rumpelstilzchen 6 days ago March 31, 2020 at 4:11 AM link permalink

> This analogy doesn't seem to apply to Tatoeba as it is since sentences are added individually though? I mean, I *could* add 1000 common sentences from such a list as you describe, and this couldn't be copyright violation since nobody owns any of those sentences individually. It seems there's no violation as long as I don't combine them into a list similar to the original one; only then am I arguably using someone else's work.

Have you ever heard about "Database right"?: https://en.wikipedia.org/wiki/Database_right

TRANG TRANG 9 days ago March 28, 2020 at 10:48 AM link permalink

As for changing back "Tom" to "Chuck Norris", I would recommend against it. Keeping Tom is okay, these are jokes that has been customized for Tatoeba.

On a side note, I would like to point out that before Tom, there was Christopher Columbus:
https://tatoeba.org/eng/tags/sh..._with_tag/1158

{{vm.hiddenReplies[34635] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK 9 days ago March 28, 2020 at 11:10 AM link permalink

> I would like to point out that before Tom, there was Christopher Columbus:

I don't think this is true.

In English, the first "Columbus" sentence is #35544. The first "Tom" sentence was #1780. There were 34 other "Tom" sentences with lower numbers under 35544.

There are 652 sentences with Tom under #467000 and 18 "Columbus" sentences.

The first "Christopher Columbus" sentence is #536592. There are 671 "Tom" sentences with numbers under this.

{{vm.hiddenReplies[34637] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 9 days ago March 28, 2020 at 11:14 AM link permalink

I didn't mean that there was no sentence with Tom before. I meant that when it comes to making jokes with a protagonist who has super human abilities, this trend appeared in Tatoeba with "Christopher Columbus" before "Tom".

deniko deniko 8 days ago March 28, 2020 at 8:42 PM link permalink

> Alternatively, someone could try to negotiate with the owners of https://chucknorrisfacts.net/

These jokes about Chuck Norris are basically modern folklore, I feel like it would be super weird for someone to claim any copyright on them. I'm sure chucknorrisfacts.net is not the only website that collects this folklore. Are you sure they were copied from there and they're not part of Tom_Facts's own collection? The jokes are literally everywhere.

https://www.reddit.com/r/ChuckNorris/

https://parade.com/968666/parad...-norris-jokes/

etc.

I remember reading jokes about him in this style long before chucknorrisfacts.net was created (year 2017, according to whois lookup).

{{vm.hiddenReplies[34648] ? 'expand_more' : 'expand_less'}} hide replies show replies
Tom_Facts_Vol2 Tom_Facts_Vol2 8 days ago March 29, 2020 at 1:33 AM link permalink

+∞

{{vm.hiddenReplies[34649] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 7 days ago March 30, 2020 at 5:33 AM link permalink

Täältä saattaisi löytyä muutama CC-lisensoituna: https://commons.wikimedia.org/w...related_images

{{vm.hiddenReplies[34666] ? 'expand_more' : 'expand_less'}} hide replies show replies
Tom_Facts Tom_Facts 6 days ago March 31, 2020 at 12:35 AM link permalink

👍

rumpelstilzchen rumpelstilzchen 6 days ago March 31, 2020 at 3:54 AM link permalink

Just because the picture of a sentence is CC licensed, doesn't make the sentence itself CC licensed.

{{vm.hiddenReplies[34679] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 6 days ago March 31, 2020 at 5:27 AM link permalink

This file seems to be CC-BY -licensed: https://commons.wikimedia.org/w...indow_sign.jpg

The title of the file contains the sentence. (The picture does, too.) The title of the file is a part of the file. Hence, the title seems to be CC-licensed, too. Or am I missing something?

{{vm.hiddenReplies[34682] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 5 days ago March 31, 2020 at 10:03 PM link permalink

Reusing sentences that can be seen in pictures that were published under CC BY can give us a safety net, but it relies on the assumption that whoever uploaded these pictures was knowledgeable enough about intellectual property.

If @Tom_Facts (or anyone) can justify that there is a legit, creative or intellectually demanding process behind finding and converting Chuck Norris facts into Tom facts, then (to me at least) it would be a better defense than saying "I extracted these jokes from CC BY pictures".

If the process is "I'm browsing random Chuck Norris facts on some website(s) and I take those that I like, replace the name with Tom and add them to Tatoeba", the intellectual added value is... too minimal.

{{vm.hiddenReplies[34689] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 5 days ago April 1, 2020 at 6:24 AM link permalink

En tunne Ranskan tekijänoikeuslainsäädäntöä; paras suomalainen lähde, jonka löysin, on tämä: http://www.kysy.fi/kysymys/voik...kijanoikeuksia

Siinä ei mainita, että vitsien teoskynnyksestä olisi oikeustapauksia. Monien tekijänoikeus on iän takia hävinnyt ja monet ovat osa yleistä kansanperinnettä. ”Pikku Kalle”-vitsit ja ”Suomalainen, ruotsalainen ja norjalainen”-vitsit kuulunevat näihin luokkiin, mutta ”Chuck Norris”-vitsit saattavat olla liian tuoreita.


(Tämä on yksi esimerkki siitä, miten tekijänoikeus käsitteenä haittaa ja hankaloittaa luovaa, hyödyllistä ja osittain myös tieteellistä työtä, aivan kuten monopolit yleensä ovat haitallisia. Tekijänoikeudet tulisi poistaa tai ainakin heikentää niitä suuresti. Ne ovat vahingollisia kulttuurille ja tieteelle.)

{{vm.hiddenReplies[34691] ? 'expand_more' : 'expand_less'}} hide replies show replies
rumpelstilzchen rumpelstilzchen 5 days ago April 1, 2020 at 9:22 AM link permalink

The problem isn't about whether a single joke is protected by copyright or not (it most likely isn't as are the majority of the sentences on Tatoeba IMHO.)

The problem is scraping a lot of jokes/sentences from another database where we don't know its license because databases can be considered as creative works by themselves independent of whether their items are creative works or just facts (i.e. protected by copyright or not).

Now the question is whether compiling a list of jokes is considered as a creative work. I'm pretty sure it is in Europe which has rather strict database laws (see the Wikipedia article I've mentioned in another message). But even in the US which as far as I know don't have such strict laws, it seems to be likely as the following passage shows:

"An example of a database that is protected as a compilation would be a database of selected quotations from U.S. Presidents. The individual quotations themselves may or may not be subject to copyright protection. However, the selection of the quotations involves enough original, creative expression that it is protected by copyright. Therefore, a database of quotations will be protected by copyright as a compilation even though some of the quotations are not protected." (from https://www.bitlaw.com/copyright/database.html )

So as long as nobody can prove reliable that the owner of chucknorrisfacts.net is ok with scraping many or all of the jokes, I think it's better to stop adding them.

{{vm.hiddenReplies[34692] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 5 days ago April 1, 2020 at 11:07 AM link permalink

Olen samaa mieltä siitä, että kokoelma kokonaisuudessaan tai merkittävin osin saattaa olla kopiointioikeuksien suojaamaa ja sen ottaminen käyttöön ilman lupaa on epäkohteliasta.

TRANG TRANG 7 days ago March 29, 2020 at 7:27 PM link permalink

First, please take the time to read fully my reply to Ooneykcall:
https://tatoeba.org/eng/wall/sh...#message_34643

> I remember reading jokes about him in this style long before
> chucknorrisfacts.net was created (year 2017, according to whois lookup).

Note that the domain chucknorrisfacts.net may have been registered in 2017, but the website itself existed since 2008 (according to their footer). It was probably under another domain before 2017.

If the "2008" indicated in the footer is a lie and they exist only since 2017, then they did a great job at replicating website design from 15 years ago...

> Are you sure they were copied from there and they're not part of Tom_Facts's
> own collection?

@Tom_Facts did not explicitly tell me they copied from chucknorrisfacts.net and I do not possess secret government agency surveillance tools, nor do I possess mind reading powers. So no, I'm not sure.

But when I check some of the sentences, I'm getting results like these: https://imgur.com/a/zzLQJQL. I can't help but being very suspicious that a large part of those jokes are copied from chucknorrisfacts.net, as it happens to the be common denominator in these results.

I'm very well aware that these jokes are everywhere. I know many of these jokes, I laughed at many of these jokes, and I wished just as much as many people here that we didn't have to worry, ever, about copyright or licensing or intellectual property.

But that won't change the fact that inserting these jokes on a large scale poses a legal risk for Tatoeba. The more are being added, the more the risk is growing.

If some content has been copied and re-copied, it wouldn't necessarily and magically become CC BY compatible. So even if @Tom_Facts didn't copy directly from chucknorrisfacts.net, but from a website that itself copied from chucknorrisfacts.net, then it can still be a problem.

And I will insist again: the main problem is not that these jokes are published on Tatoeba, the main problem is that they are *incorrectly licensed under CC BY*.

Is there any platform or blog that publishes Chuck Norris facts under a Creative Commons license? No. There's none. So we cannot be thinking "Well, everyone else uses these jokes, why can't we?". We have different legal constraints, that's why we can't.

Now you are lucky, Andreas (aka. @rumpelstilzchen) has volunteered to tackle the issue https://github.com/Tatoeba/tatoeba2/issues/1659. But until it is ready and deployed, please (everyone), don't copy more jokes and memes into Tatoeba. Don't translate them if you see any that might be legally shady. Ask people to stop if you see anyone doing it. Just wait till we have the proper features in place, or just create your own jokes instead. I have no time to play the cop so I count on everyone's cooperation. Thanks.

Aiji Aiji 6 days ago, edited 6 days ago March 31, 2020 at 1:09 PM, edited March 31, 2020 at 1:11 PM link permalink

What's New on Tatoeba? - Your weekly recap °10


UPDATES

A small week it was. Some discussions here, some updates of code there, everybody needs a rest sometimes. Stay tuned next week for nice new features!


ON THE WALL

※ hamsolo474 started a thread leading to discussing various topics, such as the quality of a sentence and good search results https://tatoeba.org/fra/wall/show_message/34604

※ gillux performed a new UX test, this time on a user very familiar with Tatoeba. You can also help us by performing this kind of test and write a summary :) https://tatoeba.org/fra/wall/show_message/34618

※ Ooneykcall started a thread that led to discuss copyright and licensing https://tatoeba.org/fra/wall/show_message/34630

※ Ibdx discussed about adding more sentences containing words whose meanings are often looked for https://tatoeba.org/fra/wall/show_message/34645


CONTRIBUTIONS AND LANGUAGES

※ 15 689 sentences added this week. You can check daily activity on this page https://tatoeba.org/eng/contrib...ivity_timeline

※ This week, two languages were added, bringing the number of languages on Tatoeba to 355! Thanks to Ricardo14 and gillux for coordinating this.
On zorgzikhnit's request, Bislama has been added https://en.wikipedia.org/wiki/Bislama

On MarijnKp's request, Saterland Frisian has been added https://en.wikipedia.org/wiki/Saterland_Frisian

※ Some of our members helped translating the website (crediting using Transifex usernames). In arbitrary order:
gorkaazk, elenacristina260, herrsilen, SAmiri, Les90, yanis.batura, Aiji, gillux, arh, MarijnKp, RyckRichards, fjay69, Gulo_Luscus, robin0van0der0vliet, shekitten, Mohsin_Ali, Yorwba, easononizuka, maxine22zhang, Silja, Thanuir, Guybrush88, 58karel, michel.smts2, robin0van0der0vliet, sabretou, small_snow

----------

If you'd like to help to the development of Tatoeba, report issues, or are just curious, have a look at the GitHub repository: https://github.com/Tatoeba/tatoeba2

If you want to help us translate the website to your language, you can join us on Transifex: https://www.transifex.com/tatoe...ite/dashboard/ and check this article on the wiki https://en.wiki.tatoeba.org/art...ce-translation

----------

Fun fact: Mary Poppins’ “supercalifragilisticexpialidocious” actually appears in some dictionaries.


Last week recap: https://tatoeba.org/fra/wall/show_message/34592
See this recap on the blog: https://blog.tatoeba.org/2020/0...kly-recap.html

hamsolo474 hamsolo474 11 days ago March 26, 2020 at 3:43 AM link permalink

Has there been any progress made on implementing a voting system for reliability.

https://blog.tatoeba.org/2010/0...w-will-we.html

I read this article from a decade ago and it mentioned a timescale of months and the need for at least 20 advanced contributors, as of writing there are 147 advanced contributors with 25 in german and 15 in english alone.

Is the problem that no one is willing to work on the problem from a programming sense? The desire to get it done in french first (given french only has 13 Adv contributors rather than 20)? or is there some other reason.

This is the feature I would most like to see on Tatoeba and I would be willing to try my hand at implementing a programming solution for it.

I would be suprised if I was the first to feel this way in the last 10 years, So i would like to know has anyone attempted this before me? What happened to their submission if they made one. Given how widely voting systems are used on other websites I can't imagine this being too difficult unless there is something fundamental in the design of Tatoeba that would prevent it. Does anyone know?

My questions
1. Why has there seemingly been no progress on a voting system?
2. Has anyone attempted this?
2.1 Why was their attempt rejected/not implemented?
3. Is anyone working on this now? (perhaps i could assist)
4. Is there some reason why such a thing would never work on Tatoeba?

{{vm.hiddenReplies[34604] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 11 days ago March 26, 2020 at 5:10 AM link permalink

@hamsolo - anyone can join the dev team. That's what Trang posted recently:

"The very first thing you can do is to try and set up Tatoeba on your machine and let us know if you've faced any issue along the way, if there's anything we could simplify and if there's anything we should reword in our documentation. The more simple and easy to understand we can make this onboarding process, the better it is :)

The starting point is our GitHub repository: https://github.com/Tatoeba/tatoeba2

Once you're all set, we can move on to more concrete issues. Just let me know when you're ready!"

Aiji Aiji 11 days ago, edited 11 days ago March 26, 2020 at 7:05 AM, edited March 26, 2020 at 7:06 AM link permalink

Because there are pros and cons. The pros are very constraining, while the cons aren't.

First of all, please think carefully of what would be the advantage of a voting system for you? And then, for all of us? And then, for all that are not us?
And first of all, what would a voting system look like?

Here are some cons (with the current system):
- A voting system needs a threshold. What is a good threshold? A majority of some sort? Even if you are in the minority, it doesn't mean that you're wrong.
- Some people could silently sabotage the corpus to defend a stance.
- We have corpus maintainers to take care of correcting sentences. If a sentence is correct, it is correct. If not, we can simply notify them and they will check if it needs to be corrected. There is no need for a voting system in this situation. If they cannot deal with a sentence, they can search the Internet or ask some help, that's what we signed for :P
- We have a "review" feature with which you can mark a sentence as "OK", "unsure", or "not OK". The current system isn't very well integrated with the proofreading process for now, but we have some ideas to improve it and make proofreading less resource-consuming and more efficient. This system relies on the community to point out sentences that may be wrong, but it does not exclude sentences. We're working towards inclusion, not exclusion of contributions.
- The majority of contributions are correct, and should be treated as such.

{{vm.hiddenReplies[34606] ? 'expand_more' : 'expand_less'}} hide replies show replies
hamsolo474 hamsolo474 10 days ago March 27, 2020 at 1:45 AM link permalink

First of all, please think carefully of what would be the advantage of a voting system for you?
- I get nicer search results

And then, for all of us?
- Better search result rankings where the most useful results showed appeared higher in the list is what originally separated google from it's competitors. Tatoeba is an amazing resource and we all benefit if search results are improved and it becomes more popular.

And then, for all that are not us?
- Same as above.

And first of all, what would a voting system look like?
- A did you find this useful button. 1234 other people did.

Pros
- ranking of search results
- - Triage for popular sentences
- - Most common uses are likely to be ranked first
- - Near duplicates are likely to be filtered by the community

Cons
- Slight overlap in functionality

I ran into a problem yesterday when I was looking up sentences, I found a few sentences that were correct and understandable but ultimately archaic. They fit the binary mold of technically correct or incorrect but they were poor examples of good, modern, intelligible english.

Furthermore, I understand that only advanced contributors can tag. With a voting system you could triage the neverending stream of daily sentences so advanced contributors knew which sentences were most valuable to the community, then they could apply their tagging and fixing efforts in the areas it would make the greatest difference.

Corpus works could be sabotaged, but the votes don't necessarily indicatate sentence validity but instead percieved usefulness.
Let's take the word know for example
The most common uses of know are probably in a sentence such as:
"I don't know"
"I know, don't remind me"

Then less common
"I know kung fu"
"I know algebra"

Then less common, perhaps a sentence where someone claims he knows goats in a more biblical sense.

How does a new learner distinguish the difference in meaning between knowing kung fu and biblically knowing goats

Assuming I've guessed the order of use correctly, knowing goats should be on the bottom of the list indicating to a new learner that while this is technically correct, It may not be the best sentence to put in your Anki Deck.

It should also prevent direct duplicates or near duplicates from showing up next to each other. For example "I know!" and "I know." or "I know the Jacksons" and "I know the Smiths". There are obviously differences between these sentences but a voting system could allow the community to decide between seeing lots of near duplicates on the first page then having to search further for a variety of uses or simply voting up (the presumably wider variety of) sentences they found most useful.

I don't understand why a voting system needs an upper theshold. More votes doesn't make a sentence more correct, but merely indicates it's usefullness to people. We could allow sentences to drop below zero, as an indicator that the sentence is not only not useful but perhaps as a flag that something is wrong with it. Then it would be easily searchable and fixed, then it could be reset to zero upon manual review but I am not married to the concept of a down vote system.

We could also grant votes (for example maybe 10) for manually reviewed sentences that are correct, I know there is already functionality for this sort of feature but this would be affecting the ranking of search results rather than just the visibility.

Your cons
- Voting system needs threshold
Why? Is this solved by only having upvotes?

- Potential for corpus sabotage
Is this solved by only having upvotes?

- Overlap with existing functionality
True, but this should aid integration with proofreading by allowing for triage.

- We already have Corpus maintainers
Triage makes their job easier.

- The majority of contributions are correct and should be treated as such.
"Knowing goats" and "knowing kung fu" are both correct sentences but I'm sure one is more useful than the other.

{{vm.hiddenReplies[34614] ? 'expand_more' : 'expand_less'}} hide replies show replies
Aiji Aiji 10 days ago March 27, 2020 at 4:49 AM link permalink

You have good points, but most of them show a very common bias. You want Tatoeba to fit a personal use case. Let me explain.
From what you wrote, I could extract one fundamental problem: Help beginner learners of a language to more easily identify what sentences are more common / useful, or on a more general scale, what level of usefulness is provided by a sentence.

Now, that's indeed a very good problem. However, it lays on an erroneous assumption: Tatoeba is not a tool that aim to teach you language. Oh, it can be used as such, surely. But its fundamental mission is not that. Its fundamental mission is to provide good corpora of sentences (there's a problem on the meaning of "good" but we can discuss that another time). What you describe is a tool (for language learners), while Tatoeba is a source (of data). The tool uses the source in a twisted way to fit its need; but it cannot ask the source to bend to fit its vision.

Of course, I don't say that the problem you mentioned should be ignored, far from that. But the statement of the problem(s) to be solved shouldn't be biased by "it's easier for learners". Otherwise, you will get people saying: please do this to help my Natural Language Processing algorithm, please implement that because I could use it in the Japanese class I'm teaching.
Now, again, I don't say that those problems should be ignored. However, after listening to these problems, we should try to extract the fundamental issue, free of all biases. I think that is what Trang expressed when she described how we design features.

As a simple illustration, let's take your three pros and let me give exaggerate simple cons (for the sake of argument):

Pros
- ranking of search results
- - Triage for popular sentences
--> This would bring a vicious (not virtuous) circle of "what's popular get more popular because it's popular". Tatoeba is not about popularity. Every contribution is considered equal as long as its respect "quality-standard". Then, of course we could set "popularity" as optional, but in the end that would bring only more work compared as if the problem was tackled another way, a bias-free way, from the beginning
- - Most common uses are likely to be ranked first
--> Same problem as above. This implies a bias that shouldn't exist. Also, think about American / Australian / British / etc. or French / Canadian / Senegal / etc. Saying something is most commonly used because more users come from a particular region seems pretty unfair.
- - Near duplicates are likely to be filtered by the community
--> How so? We cannot consider one good enough, and the other not. And what if I do want near duplicates? They have values in themselves, even if they are a bother in some situations.

The last point (near-duplicate) actually depicts the best the point of view I try to defend in my post(s): If there is a problem inherent to Tatoeba, Tatoeba should try to solve it in a way that is completely independent of any particular use (language learning, NLP algorithm, translation tool, etc.).

Most of your other points could be answered in a similar way. In particular, "usefulness" is difficult concept to handle, and in your post itself I can see a potential contradiction between "the community would vote for what they think best" and "A is more useful than B".


And to summarize my ideas, let me answer quickly at one of your post below:
- Tags would not help you learn the language or distinguish what is for beginners and what is not, because that is not what they are made for (the functionality suffers from some flaws but hopefully work will be done soon to improve it).
- If you think the search doesn't allow you to find relevant results (many of us think so), please explain us why, and we could try improving the search functionality together.

{{vm.hiddenReplies[34617] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 10 days ago, edited 10 days ago March 27, 2020 at 5:16 PM, edited March 27, 2020 at 5:19 PM link permalink

I'm always surprised by the assumption that Tatoeba is primarily geared toward beginning language learners, since in my view, it's not particularly well suited to their needs. Beginning learners need guidance, a path to follow. There are much better places to find that elsewhere. I've always considered Tatoeba far more useful to learners who have had some time to get to know the basics of a language and now want to see examples of words or grammatical features in use.

In line with what Thanuir said, it doesn't take long to know what the basic meanings of "know" are, at which point sentences that contain them become much less useful to the learner than sentences with more advanced meanings. Understanding the biblical sense of "know" is not really a necessity for intermediate or even advanced learners, which means it's not a great example of a verb with a gradation of meanings, but since the example has already been used, I'll stick with it.

It doesn't take much familiarity with a language for someone to figure out that "know" is being used in an unfamiliar context, and therefore, the sentence may be of limited applicability. For instance, perhaps the sentence was this:

"For lo, the ewe goat was comely in appearance, hence the shepherd knew her twice upon the hill."

Even if I did not know English well, the presence of rare words like "lo" and "comely" would suggest to me that this was not an ordinary colloquial sentence and therefore, I would not assume that it's a typical example of an utterance I should expect to produce. Nor would it cause me to throw out what I've already learned about more standard usages of the word and conclude that knowing is something one only does to an attractive goat on a hill.

Another point I wanted to raise is that voting for the usefulness of sentences would be an extremely tedious exercise. It would be hard to convince most people to do it, and for good reason, since, as has been said, the vast majority of sentences on Tatoeba are already useful. Therefore, you'd get a small number of people voting, and hence a biased vote. I would try to convince people to put their effort into more constructive areas.

Thanuir Thanuir 10 days ago March 27, 2020 at 6:31 AM link permalink

Mainitsemistasi käyttötavoista ainoastaan ”knowing goats” on uusi ja kiinnostava minulle. Tietokanta ei ole pelkästään aloittelevia kielenopiskelijoita varten.

TRANG TRANG 11 days ago March 26, 2020 at 1:39 PM link permalink

> Has there been any progress made on implementing a voting system
> for reliability.

If you ask about progress since the blog article has been published: yes, there has been progress. In 2015, we introduced a feature to review sentences: https://github.com/Tatoeba/tatoeba2/pull/738

It was initially called "Collections" but we recently renamed it to "Reviews". Besides of the name change, this feature did not evolve at all ever since its introduction. It was introduced as "experimental" and still is today.

> Is the problem that no one is willing to work on the problem from a
> programming sense?

Well, there's a bit of that, but we are not just lacking developers.

There has been a shift on how we design features. It used to be that people would suggest things to change on Tatoeba and if we felt it is a good idea, we would implement what they suggested. Over time, we learned it's not a good practice. From that realization, we started trying to first understand the problems, and then design and implement then solutions.

So this idea of having some sort of voting system is just a solution. But a solution to which problem exactly? Is the problem really a problem? Does the solution really solve the problem? Maybe, maybe not. We actually don't have clarity on that.

> I would be suprised if I was the first to feel this way in the last 10 years,
> So i would like to know has anyone attempted this before me?

Besides my implementation of the reviews feature, no one else attempted anything. But I would be more than happy if you could assist in pushing this feature out of its "experimental" status.

Before that though, you said that this voting system is the feature you would like to see the most in Tatoeba. Could you elaborate why is that? What is the problem/frustration that you are facing when using Tatoeba, that you think a voting system would solve?

{{vm.hiddenReplies[34609] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic 10 days ago March 26, 2020 at 8:19 PM link permalink

>So this idea of having some sort of voting system is just a solution. But a solution to which problem exactly? Is the problem really a problem? Does the solution really solve the problem? Maybe, maybe not. We actually don't have clarity on that.

That's exactly that : a solution without a problem. Because, then the question still remains : who's going to decide what is right and what is wrong ? The supposed "wisdom of the crowd" ? Which crowd ? educated crowds or uneducated ones ?
If the majority rules, then we will have to accept that uneducated crowd rules. Is it OK ?

So what ?!?

{{vm.hiddenReplies[34610] ? 'expand_more' : 'expand_less'}} hide replies show replies
hamsolo474 hamsolo474 10 days ago March 27, 2020 at 2:01 AM link permalink

Perhaps I wasn't clear in my original post. I agree that the "uneducated crowd" shouldn't dictate what is correct or incorrect, but they could vote on what they found useful and choose not to vote on what they did not find useful. I don't think there is a need for a downvote feature, however an upvote feature could provide better ranking for search results.

The problems as I see it are
1. Presently we are getting sentences faster than we are tagging them. Triage sentences for the adv. contributors to tag so that the sentences most popular with the community are ensured to be correct.
2. Uncommon uses are right next to common uses and for beginners it's hard to know which is which as there is no indication of usefulness of the phrase. (See my comment to Aiji above, specifically the bit about the goats.)
3. When i search for common words I find lots of duplicates and near duplicates of words. For example, "I know the Jacksons", "I know the Smiths". Filtering duplicates and near duplicates.
4. Inefficient layout of search results. A voting system would show people the things other people found particularly useful first.

Would you have a problem with a button that said
1234 people found this useful, Did you?

sacredceltic sacredceltic 10 days ago March 26, 2020 at 8:21 PM link permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

{{vm.hiddenReplies[34611] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic 10 days ago March 26, 2020 at 8:22 PM link permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

hamsolo474 hamsolo474 10 days ago March 27, 2020 at 2:11 AM link permalink

Honestly, I'm learning chinese and many of the characters have multiple meanings, heavily based on context. When I search for an example sentence I don't know which sentences are common uses of the word or phrase that I'm interested in and which sentences are uncommon, but still correct uses of the word or phrase I'm interested in.

I see the problems as
1. Presently we are getting sentences faster than we are tagging them. Triage sentences for the adv. contributors to tag so that the sentences most popular with the community are ensured to be correct.
2. Uncommon uses are right next to common uses and for beginners it's hard to know which is which as there is no indication of usefulness of the phrase. (See my comment to Aiji above, specifically the bit about the goats.)
3. When i search for common words I find lots of duplicates and near duplicates of words. For example, "I know the Jacksons", "I know the Smiths". Filtering duplicates and near duplicates.
4. Inefficient layout of search results. A voting system would show people the things other people found particularly useful first.

I'll download it and take a look.

{{vm.hiddenReplies[34616] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic 10 days ago March 27, 2020 at 6:49 AM link permalink

> so that the sentences most popular with the community are ensured to be correct.

Err...precisely NO. A majority of the population makes on and on the same mistakes. That’s why education was invented...
“Most popular” = most wrong, in most cases.

sacredceltic sacredceltic 10 days ago March 27, 2020 at 6:53 AM link permalink

> A voting system would show people the things other people found particularly useful first.

So what is actually correct would actually disappear from sight...
I see your point.

Yorwba Yorwba 9 days ago March 27, 2020 at 8:44 PM link permalink

> I'm learning chinese and many of the characters have multiple meanings, heavily based on context. When I search for an example sentence I don't know which sentences are common uses of the word or phrase that I'm interested in and which sentences are uncommon, but still correct uses of the word or phrase I'm interested in.

Could you give some examples of Chinese words you've searched where the search results didn't make clear to you which uses were the common ones?

Maybe you wanted to give an English example to make your problem clearer to non-Chinese speakers, but currently there don't seem to be any sentences involving biblically knowing goats https://tatoeba.org/cmn/sentenc...rom=eng&to=und (I half expected someone to have rectified that as a result of this discussion.) so it's not a very good example.

With a specific instance of the problem to look at, finding a solution should be easier. Maybe that solution will involve some kind of voting, maybe we can come up with something else.

{{vm.hiddenReplies[34624] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 9 days ago March 27, 2020 at 9:06 PM link permalink

> I half expected someone to have rectified that as a result of this discussion.

Fixed. See:
https://tatoeba.org/eng/sentences/show/8639180

TRANG TRANG 6 days ago March 30, 2020 at 10:25 PM link permalink

Thanks for explaining your problems, hamsolo.

One thing I can say is that the various problems you mentioned are unlikely to be solved with one single solution. I'll go through them one by one and I will have to interrogate you a bit more on some points, if you don't mind.

> When I search for an example sentence I don't know which sentences are
> common uses of the word or phrase that I'm interested in and which
> sentences are uncommon, but still correct uses of the word or phrase
> I'm interested in.

I'll ask you the same thing as Yorwba on this one. Could you give some examples of Chinese words you've searched where the search results didn't make clear to you which uses were the common ones?

> 1. Presently we are getting sentences faster than we are tagging them.
> Triage sentences for the adv. contributors to tag so that the sentences
> most popular with the community are ensured to be correct.

It is true that Tatoeba doesn't provide any way to find sentences based on popularity and if your preferred way to contribute would be to proofread the most popular sentences, then we wouldn't be able to fulfill your needs at the time being. I'm wondering what is your definition of popularity though.

We have a feature that somehow measures popularity already: the favorites (users can favorite a sentence by clicking on the heart icon in the sentence menu).

My questions are:
- Does this "favorite" feature correspond to your definition of popularity, or do we have a different definition of what is popular?
- If this does not measure popularity the way you wished it was measured, then how exactly would you measure popularity?
- And what difference would it make for you to proofread the most favorited sentences compared to the most popular sentences?

> 2. Uncommon uses are right next to common uses and for beginners it's
> hard to know which is which as there is no indication of usefulness of the
> phrase. (See my comment to Aiji above, specifically the bit about the goats.)

I'm not sure if I could clearly understand your problem with your example about knowing goats.

In the end my interpretation is that you, as an English speaker who is learning Chinese at beginner level, when you browse/search Tatoeba for sentences to add to your Anki deck, you are often having trouble figuring out which sentences would be the most useful to add to your deck.

If that is a correct interpretation of your situation, then perhaps you could explain to us what is your workflow on using Tatoeba to build your Anki deck?

> 3. When i search for common words I find lots of duplicates and near
> duplicates of words. For example, "I know the Jacksons",
> "I know the Smiths". Filtering duplicates and near duplicates.

On the issue of finding lots of near-duplicates, I recommend you set the sort option to "Random" rather than "Relevance" when you search sentences. It can happen that two near-duplicates appear on the same page, but common words usually have 1000+ results. For common words, it would be extremely unlucky for you to have two near duplicates on the same page.

> 4. Inefficient layout of search results. A voting system would show
> people the things other people found particularly useful first.

Assuming we use a voting system to measure usefulness of search results, I think we would need to associate each vote to a specific search. A sentence cannot be universally more useful than another. Maybe the sentence "I know algebra" would be useless for someone who searched "know" but would be useful for someone who search "algebra".

But I think upvoting for useful sentences would be very inefficient compared to reporting bad search results. You would need millions of votes and you wouldn't really be sure that those votes will help. On the other hand, just one person reporting to us that a certain sentence was not useful for a certain search could help us make actual improvements.

I feel this 4th problem is in the end the same problem as your 2nd problem. The way sentences are ordered feels inefficient for your task of building an Anki deck.

But if your use case here isn't about trying to build an Anki deck, then it would be helpful to know what are the other contexts in which you have experienced inefficient search results. What did you search exactly and for what purpose did you need to search this? Were you trying to understand the lyrics of a song? Were you trying to write a sentence in Chinese to a Chinese acquaintance?

{{vm.hiddenReplies[34670] ? 'expand_more' : 'expand_less'}} hide replies show replies
hamsolo474 hamsolo474 6 days ago March 31, 2020 at 4:36 AM link permalink

Q: I'll ask you the same thing as Yorwba on this one. Could you give some examples of Chinese words you've searched where the search results didn't make clear to you which uses were the common ones?

A: 后来, Yorwba and I recently found out that we intrepreted this word in two different ways, In my dictionary and in the chinese grammar wiki this is a word that means afterwards. This is my understanding of it. However Yorwba told me that 后来 did mean afterwards but had additional connotations making it mean something closer to, "afterwards it was suprisingly revealed", pointing me to a sentence page on Tatoeba, where there were admitedly sentences with that connontation.

Fortunately I had studied this phrase on other websites such as the chinese grammar wiki as well as discussed it with my native chinese girlfriend and I knew that while this is a possible connotation of the word it is certainly not the most common meaning.

However my biggest problem is that I have only studied 后来 and a few hundred other words to the extent where can be confident in using them and knowing that others will understand them, leaving more than 90% of the other words in mandarin full of potential ambiguity. So while I have yet to learn something from Tatoeba that is wrong, then use it and be corrected, remember where I learned that use from, go back to the source and request it be corrected; I acknowledge that it is a definite possibility.

Q: - Does this "favorite" feature correspond to your definition of popularity, or do we have a different definition of what is popular?

A: This is likely a good solution to my problem, I will have to play around with this for a while.

Q: If this does not measure popularity the way you wished it was measured, then how exactly would you measure popularity?

A: A visible number next to each sentence saying this many people thought this was a useful sentence, perhaps we could even put natives vs second language votes.

Q: And what difference would it make for you to proofread the most favorited sentences compared to the most popular sentences?

A: At this point I'm not sure and I'll have to play with the favourite feature a bit.

Q: On the issue of finding lots of near-duplicates, I recommend you set the sort option to "Random" rather than "Relevance" when you search sentences. It can happen that two near-duplicates appear on the same page, but common words usually have 1000+ results. For common words, it would be extremely unlucky for you to have two near duplicates on the same page.

A: This is a good solution.

Q: But I think upvoting for useful sentences would be very inefficient compared to reporting bad search results. You would need millions of votes and you wouldn't really be sure that those votes will help. On the other hand, just one person reporting to us that a certain sentence was not useful for a certain search could help us make actual improvements.

A: As an outsider/newbie I honestly can't tell if relying on human maintainers to do this is efficient and I understand that a sentence/keyword search engine is different to the google search engine which looks for sentences/keywords on webpages. However google beat yahoo (who at the time had humans manually review and categorise pages) by relying on an algorithm that was constantly fed new data based on the relevance of their results. (They saw which results were the most popular for a given search and gave them priortiy ranking). Doing what google did is likely beyond me, but the first step is understanding what is popular.


Q: What did you search exactly and for what purpose did you need to search this? Were you trying to understand the lyrics of a song? Were you trying to write a sentence in Chinese to a Chinese acquaintance?

A: As I learn new words or phrases in chinese I try to make sentences with them, practice using them etc... and chinese grammar has a few hard to predict differences compared to english grammar. For example, in english we would say that red is almost always an adjective, for example a red ball, red hair, red paint, red car. In chinese colours are nouns, even when used to describe something like a red ball, red hair, red paint or red car. This changes the words you can use with such words, I personally have trouble remembering every word I learn as an adjective, noun, verb etc.. even in english I just remember context, can produce a few sentences and figure it out, but the fact that run or climb are verbs are not saved in my memory like they would be written on the page of a dictionary. I just know how to use them and understand the rules that define the grammar (in english at least). Perhaps it is a flawed approach but this is also how I am trying to learn chinese, learn enough sentences and attempt to gain an intuitive understand collocations the same way I do in english.

In short, Nothing in particular, I'm just trying to build my mental list of sentence examples so I can produce new sentences in the near future. Chinese is hard.

Hybrid Hybrid 8 days ago March 29, 2020 at 5:46 PM link permalink

I hope that everyone is doing well despite the virus. Stay strong and far away from each other! 😊

{{vm.hiddenReplies[34660] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 7 days ago March 29, 2020 at 9:32 PM link permalink

Thanks, Hybrid! The same for you!

{{vm.hiddenReplies[34662] ? 'expand_more' : 'expand_less'}} hide replies show replies
Hybrid Hybrid 6 days ago March 31, 2020 at 12:15 AM link permalink

Thank you.

Balamax Balamax 7 days ago March 29, 2020 at 11:05 PM link permalink

Try to stay inside the Solar system. Otherwise the interstellar police will have reasonable questions for you. :)

{{vm.hiddenReplies[34663] ? 'expand_more' : 'expand_less'}} hide replies show replies
Hybrid Hybrid 6 days ago March 31, 2020 at 12:16 AM link permalink

Thank you. Is going to Pluto allowed or is that outside of the Solar System?

{{vm.hiddenReplies[34674] ? 'expand_more' : 'expand_less'}} hide replies show replies
Balamax Balamax 6 days ago March 31, 2020 at 12:27 AM link permalink

It takes about five hours for sunlight to reach Pluto.

gillux gillux 10 days ago March 27, 2020 at 6:30 AM link permalink

I am publishing a new UX test: https://en.wiki.tatoeba.org/art...show/ux-test-4 This time I’ve performed the test on somebody who is very familiar with Tatoeba already, so I don’t know if I can really call it a UX test. That said, it contains relevant feedback, including about the use of Tatoeba in a teaching environment.

{{vm.hiddenReplies[34618] ? 'expand_more' : 'expand_less'}} hide replies show replies
rumpelstilzchen rumpelstilzchen 10 days ago March 27, 2020 at 6:05 PM link permalink

Thanks for another UX test.

Just a quick comment:
> Having the list as a DOC file would be useful to R. because he can freely edit the text. Currently, he has to copy and paste sentences from the CSV file into a Word document to edit them the way he wants.

I don't use Word (or any word processor) so this may be a strange question, but why can't Word open/import the csv file? This is a simple plain text file.

{{vm.hiddenReplies[34623] ? 'expand_more' : 'expand_less'}} hide replies show replies
Guybrush88 Guybrush88 9 days ago, edited 9 days ago March 27, 2020 at 9:25 PM, edited March 27, 2020 at 9:33 PM link permalink

Sometimes I tried to open the csv file with the exported sentences with Libreoffice (the import feature was still working, and I used the exported file to grab the sentences I wanted to mass translate more quickly), and the software always told me that the file was too big and not everything was shown. Using gedit (on Linux) and Notepad++ (on Windows) worked for my purpose, since they showed all the sentences

{{vm.hiddenReplies[34626] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba 9 days ago March 27, 2020 at 9:58 PM link permalink

Putting all the exported sentences into a single .DOC file would probably make Word choke, but the excerpt from the UX test report is specifically about exporting lists, where file size is less likely to be a problem.

gillux gillux 9 days ago March 28, 2020 at 7:29 AM link permalink

I think nothing’s preventing Word from opening the CSV file as plain text, however I assume the file extension CSV is associated with Excel, so opening it with Word is rather counter-intuitive. I imagine users have to right click → "open with", and then find Word from whatever selection box pops up. Compare this with simply double-clicking on the file.

As pointed out in the test, most users are not familiar with the CSV format to begin with, so they don’t know whether they should open it with Word, Excel of whatever.

To put it another way: of course, if you have the knowledge and the skills you can do whatever you want with whatever file format.

{{vm.hiddenReplies[34628] ? 'expand_more' : 'expand_less'}} hide replies show replies
rumpelstilzchen rumpelstilzchen 9 days ago March 28, 2020 at 12:30 PM link permalink

> As pointed out in the test, most users are not familiar with the CSV format to begin with, so they don’t know whether they should open it with Word, Excel of whatever.

So should we add some info text about how to open the file in a word processor? Or change the file extension to TXT?

Eccles17 Eccles17 11 days ago March 25, 2020 at 8:56 PM link permalink

Hi. I'm a full-stack developer.

How can I help?

Thanks.

{{vm.hiddenReplies[34602] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 11 days ago March 25, 2020 at 11:36 PM link permalink

Thank you for offering your help, Eccles!

The very first thing you can do is to try and set up Tatoeba on your machine and let us know if you've faced any issue along the way, if there's anything we could simplify and if there's anything we should reword in our documentation. The more simple and easy to understand we can make this onboarding process, the better it is :)

The starting point is our GitHub repository: https://github.com/Tatoeba/tatoeba2

Once you're all set, we can move on to more concrete issues. Just let me know when you're ready!

{{vm.hiddenReplies[34603] ? 'expand_more' : 'expand_less'}} hide replies show replies
Eccles17 Eccles17 10 days ago March 27, 2020 at 1:00 AM link permalink

Thanks!

CK CK 26 days ago, edited 26 days ago March 11, 2020 at 6:22 AM, edited March 11, 2020 at 7:27 AM link permalink

I have 2 questions for contributors who translate from English into their languages.

1. Do you think that it isn't useful for me to contribute English sentences with the same meanings and that I should stop doing this?

2. If you like the idea of having various English sentences with the same meaning, do you think that sentences that are interchangeable should be directly linked to each other?



◼ ◼ Question 1 Details

It's been suggested by two people that it isn't useful for me to contribute English sentences with the same meaning.

[#8355426] Tom bought Mary a dress. (CK) *audio*
[#8355425] Tom bought a dress for Mary. (CK) *audio*

[#8589486] Tom has brains. (CK) *audio*
[#1024870] Tom is smart. (CK) *audio*
[#1024985] Tom is intelligent. (CK) *audio*
[#1025488] Tom has a good head on his shoulders. (CK) *audio*

... and many others, some are word order differences, some are vocabulary differences, some have additional words for clarity. In many cases, either or all versions of the sentences are very close in terms of frequency of use.


◼ I've already asked a few members who have translated such sentences this question. Here are their replies.

► Ergulis
https://tatoeba.org/eng/sentenc...omment-1165044

I am not offended by that at all and I think that it is entirely ok and even useful.

It is always good for learners to know all the variants of the same sentence, if possible: with or without conjuction, using different word orders/ patterns, etc.

I do the same at translating, albeit I don't add all the possibilities every time.


► soliloquist
https://tatoeba.org/eng/sentenc...omment-1164178

> I wonder if you, too, think this isn't useful and that I should stop doing this.

On the contrary, I find adding alternative translations useful; not only for learners, but also for native speakers. It's one of my favorite things on Tatoeba.

#6704076


► Aiji
https://tatoeba.org/eng/sentenc...omment-1164177

Well English has a lot of this. I think that, I had I would I'd, etc.

I think it's useful to have correct translations, because I never know where to place the object when using those darned phrasal verbs. I can understand however that it's irritating to translate several times the same thing, but I think that's a problem inherent to Tatoeba. Also, maybe they had this reaction because you added them as original sentences, but if they were added as translations their reaction would be different? I don't know. I guess arguments can be given for both sides.


► danepo
https://tatoeba.org/eng/sentenc...omment-1164172

I think that's extremely useful. That's one of the things that distinguishes Tatoeba from
other bilingual or multilingual corpora.

Here's a lot of Danish translations of the sentence "Tom was very drunk.":
#4211106

2 sentences with the same meaning is a kind of paraphrasing, I think.

https://www.google.com/search?q...rasing+tatoeba


► marafon
https://tatoeba.org/eng/sentenc...omment-1164171
(The question was: I wonder if you, too, think this isn't useful, and that I should stop doing this.)

I don't think so, CK.


► Pfirsichbaeumchen
https://tatoeba.org/eng/sentenc...omment-1164187

I think the best thing to do is to try to steer a middle course. Do neither of the extremes: neither completely forego adding them, nor add them in every possible case.

[#8592554] I think that I can trust you. (CK) *audio*
[#8592555] I think I can trust you. (CK) *audio*

This is a very common thing to say. The word "that" may be omitted by some people while others leave it in. I think it's worth having both, but should we systematically add a variant to every possible sentence of this type? Can we have a clear conscience about leaving it to chance whether someone else will add a possible variant of a sentence we have just written? I think each of us has tried to add a sentence that already existed, or found out that there were variants out there. It comes with a little extra work, but I don't think it matters so much. I think it's OK. We can let it happen. :)

If I was given a choice whether I wanted you to add "I think that I can trust you" to "I think I can trust you" or come up with something new like these ...

[#8594680] Mary usually wears earrings.
[#8594682] Try not to be so pessimistic.

... I would clearly choose the latter. They are simple, but they are fresh and useful. You are one of the people who can really make a difference. :)



◼ ◼ Question 2 Details

There is an issue about this on GitHub.
https://github.com/Tatoeba/tato...ment-594273857

My suggestion was to allow members to link same language equivalents as we have been doing, using the same linking system, and then put such linked-sentences under the main sentence, labeled as "Same language equivalents."


► Here is a link to see all English-English links. You will have to visit each sentence's page and look at the logs to see who actually did the linking.

https://tatoeba.org/eng/sentenc...e=yes&orphans=


► There are, of course, related sentences that are not interchangeable, so I wouldn't be linking these kinds of sentences.

There are many examples in which related English sentences can be translated by just one sentence in another language, but are not interchangeable in English, due to tense differences, pronoun differences, etc.

Here are a couple of examples, but there are many others.

ukr [#8604202] Том дуже гучно розмовляє.
Tom talks very loudly.
Tom is talking very loudly.

jpn [#192448] りんごを食べています。
He's eating an apple.
She's eating an apple.
They're eating apples.
We're eating apples.
I'm eating an apple.

{{vm.hiddenReplies[34400] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 26 days ago March 11, 2020 at 7:47 AM link permalink

1. Saman asian ilmaiseminen useilla tavoilla on hyödyllistä. Minusta luontevin tapa toteuttaa se on kääntää sama vieraskielinen lause useilla tavoilla. Jos on lisäämässä uusia lauseita (kääntämisen sijaan), minusta olisi parempi lisätä vaihtelevia ja monipuolisia lauseita. Toki kielikohtaiset sananparret ja sanonnat on hyvä lisätä, vaikka sama lause jo löytyisikin muualta.

ENG summary: It is good to add several similar translations to a given sentence. When adding original sentences, it is good to add idioms and language-specific expressions, but otherwise a greater variety of original sentences trumps a large group of similar or even synonymous original sentences.

2. Olen varovainen samankielisten lauseiden kytkemisen kanssa, koska usein samankaltaisillakin lauseilla on vivahde-eroja.

Rockaround Rockaround 26 days ago March 11, 2020 at 8:46 AM link permalink

1. I think it is useful to have several variations of the same meaning. I actually wish it were the case in the languages I learn.

2. As long as the technical solution highlighted to separate the same-language links, I am kind of against, as they already should appear in the indirect translations. Once it's available, I think it would be a nice addition.

Ricardo14 Ricardo14 26 days ago March 11, 2020 at 11:44 AM link permalink

1 - It's really useful to have variants of a sentence and we (I mean people that are studying English) learn a lot. You can express yourself in different ways, it happens to all languages and it's important to know these ways.

2 - Actually, I'm not in favor of it. "I'm here." and "She's here." means different things. Both use the same structure, verb but the meaning changes a lot between them. Same as "I study English." and "I'm studying English." In the 1st sentence, it's something that I do punctually but the 2nd one tells what I'm doing now.

{{vm.hiddenReplies[34407] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko 26 days ago March 11, 2020 at 11:58 AM link permalink

> Actually, I'm not in favor of it. "I'm here." and "She's here."

It would definitely be incorrect to link "I'm here" and "She's here".

I believe CK is talking about linking "I'm here" and "I am here", or something like "I think she's cool" with "I think she is cool" and "I think that she's cool", etc.

deniko deniko 26 days ago March 11, 2020 at 12:03 PM link permalink

1. I think it's EXTREMELY useful to have similar sentences in the same language that express the same or fairly similar meaning.

This way, if I add the following sentence in Ukrainian:

Я знаю, що я ідіот.

I would appreciate if I could link it to all of these:

I know I'm an idiot.
I know I am an idiot.
I know that I'm an idiot.
I know that I am an idiot.

It's very useful to a language learner to be able to see all these variants, whenever possible.

2. I think we should carry on linking sentences with the same meaning in the same language together, but only when the difference is really trivial. For example, I'd link "I am a cat" and "I'm a cat", but not "I'm a cat" and "I'm a tomcat".

AlanF_US AlanF_US 26 days ago, edited 26 days ago March 11, 2020 at 12:49 PM, edited March 11, 2020 at 12:57 PM link permalink

> 1. Do you think that it isn't useful for me to contribute English sentences with the same meanings and that I should stop doing this?

The question does not mention quantity. It omits the fact that you use automation to generate your sentences, that you add huge quantities of them, and that your sentences in general are very similar to each other in multiple aspects (names, vocabulary, structure, difficulty). So any additional elimination of variety is like flooding a store that already contains thousands of nearly identical items with even more. Not only does it make the corpus boring, it makes good translations harder to find, since they're scattered over near-duplicate sentences.

You're not the only person to add sets of sentences that differ in only small ways from each other (the pronoun, for instance), and while I'm sure that those who do it have good intentions, I wish they would put their energy into other areas. It's easy enough to find elsewhere (Wiktionary, for instance) how to conjugate a verb. It's much harder to find the meaning of a word captured in a realistic sentence. This is Tatoeba's key mission, the thing that makes it unique, and I believe we should focus on it.

I find it interesting that you've argued so hard for drastic reduction of the number of names in sentences as a means of preventing near-duplicates, and yet you're arguing for intentionally adding another kind of near-duplicates. You should think about that inconsistency.

As for whether sentences that are interchangeable should be linked together, I don't have a problem with it.

{{vm.hiddenReplies[34410] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 26 days ago, edited 26 days ago March 11, 2020 at 5:23 PM, edited March 11, 2020 at 5:27 PM link permalink

Here are the sentence and comments that led to this discussion:

https://tatoeba.org/eng/sentences/show/8585408

And here is a search query that shows a lot of the near-duplicate sentences being added:

https://tatoeba.org/eng/sentenc...rt_reverse=yes

{{vm.hiddenReplies[34412] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pfirsichbaeumchen Pfirsichbaeumchen 25 days ago, edited 25 days ago March 11, 2020 at 7:12 PM, edited March 11, 2020 at 7:12 PM link permalink

It's interesting to note that all of those search results also have the "do that" placeholder for more concrete actions.

{{vm.hiddenReplies[34413] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 25 days ago March 11, 2020 at 9:16 PM link permalink

It's also interesting to note that some of the sentences are wrong:

Do you think Tom will allow Mary do that? (#6071420)
Do you think that Tom will allow Mary do that? (#8180321)

This happened with another set of CK's sentences, namely these:

Tom said he's glad that that that's going to happen. (#7180272)
Tom said that he's glad that that that's going to happen. (#7180016)

When you write large numbers of near-duplicate sentences, they're boring not only to translate, but to proofread. Therefore, whether or not you're the author, it's likely that mistakes will slip by. CK reviewed both of them as OK, and he tagged one of them (#6071420) "List 907", meaning that he proofread it and thought it was suitable for language learners. I don't think there's any better evidence that quantity of sentences can interfere with quality.

Thanuir Thanuir 25 days ago, edited 25 days ago March 11, 2020 at 8:22 PM, edited March 11, 2020 at 8:23 PM link permalink

Kohdassa yksi kysyit nimenomaan, että miten sinun tulisi toimia.

Ehdotan, että lisäät kiinnostavia ja muista lauseista poikkeavia ainutkertaisia englanninkielisiä lauseita. Ehdotan, ettet lisää suuria määriä itseään toistavia lauseita, jotka eivät ole vieraskielisen lauseen suoria käännöksiä.
Tämä tekisi englanninkielisistä lauseista ja sitä myötä kaikista lauseista kiinnostavampia ja monipuolisempia. Minä esimerkiksi käännän englannista melko harvoin, koska sen lauseet toistavat itseään niin paljon. Kyllästyn niihin nopeasti ja vaihdan kieltä.

Lisäksi ehdotan, että kun käännät lauseita japanista tai muista kielistä, voit lisätä niin paljon tai niin vähän käännöksiä kuin haluat. (Japaninkielisiä lauseita, joita ei ole käännetty englanniksi ja joiden kirjoittajan äidinkieli on japani, löytyy kyllä. Vielä enemmän käännettävää löytyy, jos jättää äidinkielisyysvaatimuksen pois.)

{{vm.hiddenReplies[34415] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 25 days ago March 11, 2020 at 9:06 PM link permalink

Thanuir's message in English, via Google Translate:

In section one you specifically asked what you should do.

I suggest you add unique English phrases that are interesting and different from other sentences. I suggest that you do not add large numbers of repetitive phrases that are not direct translations of a foreign phrase.
This would make the English phrases and, consequently, all the phrases more interesting and versatile. For example, I rarely translate from English because its sentences repeat themselves so much. I get tired of them quickly and change languages.

Also, I suggest that when you translate sentences from Japan or other languages, you can add as many or as few translations as you want. (Japanese sentences that have not been translated into English and whose author's mother tongue is Japanese can be found. Even more translations can be found if you omit the native language requirement.)

Hybrid Hybrid 25 days ago March 11, 2020 at 11:14 PM link permalink

Although I don't translate English into other languages, I think that they're both useful.

Ooneykcall Ooneykcall 25 days ago March 12, 2020 at 12:02 AM link permalink

The gist of the issue seems to be about applying moderation (as always, really). It's good to have some interchangeable sentences that only differ in spelling or syntax for learning purposes, such as "I know you don't like Tom" & "I know that you don't like Tom", "I'm hungry" and "I am hungry", "Blue is my favorite color" and "Blue is my favourite colour". Adding tens of thousands of such sentences is overkill, however, as hordes of very similar sentences look monotonous and dull, possible making new users less excited about the project... and, of course, adding such sentence pairs (triplets, etc.) adds less value to it then adding two different sentences.

{{vm.hiddenReplies[34419] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 25 days ago March 12, 2020 at 4:55 PM link permalink

Well put.

deniko deniko 25 days ago, edited 25 days ago March 12, 2020 at 5:10 PM, edited March 12, 2020 at 5:12 PM link permalink

> Adding tens of thousands of such sentences is overkill

You don't have to add them, but I don't see how someone adding tens of thousands similar sentences does any harm. You find it dull, but I don't. It's useful for the learners, it's useful for the AI as the source of data, we are not afraid the database will take up too much space just because of it, so there seems to be little to discuss.

Let's add more similar sentences! Join the race.

{{vm.hiddenReplies[34422] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 24 days ago March 12, 2020 at 6:50 PM link permalink

Sinänsä olen samaa mieltä, mutta toivoisin, että samankaltaiset lauseet syntyisivät käännöksinä. Tällöin niillä olisi enemmän välitöntä arvoa ja tulisivat helpommin linkitetyksi muihin lauseisiin.

Ooneykcall Ooneykcall 24 days ago March 12, 2020 at 10:44 PM link permalink

It doesn't harm the project all in all, so sure CK can do as he pleases. I'm just saying focusing on more varying sentences would be quite a bit more helpful, imo.

{{vm.hiddenReplies[34425] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko 24 days ago, edited 24 days ago March 13, 2020 at 9:25 AM, edited March 13, 2020 at 9:26 AM link permalink

> focusing on more varying sentences would be quite a bit more helpful

I don't necessarily disagree, but I want to point out that we're all volunteers here, we all have our own ideas of how to contribute and be useful, but also have fun. The beauty of this project is that the rules are really loose - basically, contribute complete well written sentences and don't be rude - and we all can focus on what we feel is more useful or just more fun. That's why all attempts to ban certain names (be it Sami or Tom), or impose certain names on us, or to ban similarly sounding but valid and natural translations/sentences freak me out a lot. Because if something like this makes it into the rules, I feel like it will make Tatoeba a less welcoming place comparing to what it is now.

Impersonator Impersonator 24 days ago March 13, 2020 at 9:34 AM link permalink

> I don't see how someone adding tens of thousands similar sentences does any harm

I do.

> It's useful for the learners,

Not really. Learners benefit more from diverse sentences. So, if too much attention is given to translating same similar sentences about Tom and Mary learning French in Boston, this is a loss for learners.

It *could* be OK to add similar sentences if Tatoeba interface were changed. E.g. if 'Random sentences' was a weighted random that gave similar sentences less prominence, and so on. Unfortunately, changing the interface this way is difficult. So, for now, when you add too many repetitive sentences, you diminish chances of other sentences being translated.

> it's useful for the AI as the source of data

Not really. There's a problem of overfitting: if AIs are feed with too many similar automatic-generated data, they might end up with wrong conclusions. That's why it's important to have a diverse corpus.

> we are not afraid the database will take up too much space just because of it

Are we? I personally think it's better to keep database smaller, if possible. It might not be a problem for Tatoeba, but huge database certainly makes re-using the data more difficult.

{{vm.hiddenReplies[34427] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko 24 days ago March 13, 2020 at 9:41 AM link permalink

> Not really. Learners benefit more from diverse sentences.

Learners do benefit from diverse sentences, I don't say there should be no diversity. But they also benefit from seeing variants of the same sentence, especially the beginners, but it's mildly useful even for advanced learners.

There's no contradiction in expanding diversity and adding variants of the same sentence. Those are just different tasks, both useful in its own way.

{{vm.hiddenReplies[34428] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pfirsichbaeumchen Pfirsichbaeumchen 24 days ago March 13, 2020 at 10:26 AM link permalink

Those arguments are very theoretical.

If the "learner" is a dumb machine, then perhaps they will benefit from having tens of thousands of similar sentences. If it's a decently intelligent human learner (even a beginner), they will simply get bored. 🙂

We are quite unlikely to ever beat the mountain of those mass-imported sentences (the majority of the 1.3 million English sentences) with our handiwork. It does make a difference what our most productive contributors choose to put their energy in.

{{vm.hiddenReplies[34430] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko 24 days ago March 13, 2020 at 10:33 AM link permalink

> If the "learner" is a dumb machine, then perhaps they will benefit from having tens of thousands of similar sentences. If it's a decently intelligent human learner (even a beginner), they will simply get bored.

Well, your arguments seem to be no less theoretical than mine, aren't they? Personally, I've never been bored by similar sentences, moreover, I thoroughly enjoy being able to study similar variants. I'm probably not a decently intelligent human learner by your standard.

Also, it's not like we are really exposed to all those tens of thousands sentences at all times. We usually search by a key word or a phrase, get a dozen of sentences, and we study that, when we're learning something. Being able to see similar variants within this set is very helpful.

{{vm.hiddenReplies[34431] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pandaa Pandaa 24 days ago, edited 24 days ago March 13, 2020 at 10:51 AM, edited March 13, 2020 at 5:32 PM link permalink

"We like diversity. Unleash your creativity! Avoid using the same words, names, topics, or patterns over and over again."

{{vm.hiddenReplies[34432] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko 24 days ago, edited 24 days ago March 13, 2020 at 11:24 AM, edited March 13, 2020 at 11:41 AM link permalink

I wish I could understand you, Pandaa :)

(EDIT: and also other people who chose to use their own language,which is a valid choice that I respect, of course)

Pfirsichbaeumchen Pfirsichbaeumchen 24 days ago, edited 24 days ago March 13, 2020 at 10:59 AM, edited March 13, 2020 at 10:59 AM link permalink

> I'm probably not a decently intelligent human learner by your standard.

Of course that's not what I was trying to say. I'm sorry if it sounded like that, Denis. 🙁

> Also, it's not like we are really exposed to all those tens of thousands sentences at all times. We usually search by a key word or a phrase, get a dozen of sentences, and we study that, when we're learning something.

I'm exposed to them all the time, directly or indirectly. Most of the time, when I'm trying to use Tatoeba to find phrases that I need, I either get no hits at all or several pages of the "Tom, Mary, French, Boston" type, through which I have to browse to find a useful entry. It's similar to what Alan said somewhere else.

We are not really arguing about the existence of variants, though. I agree that they are, in general, useful. What led to this discussion was the massive linking of same-language sentences (and generally doing things massively as a kind of automated process), cf. https://tatoeba.org/deu/sentences/show/8585408.

{{vm.hiddenReplies[34433] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko 24 days ago, edited 24 days ago March 13, 2020 at 11:24 AM, edited March 13, 2020 at 2:04 PM link permalink

> I'm sorry if it sounded like that

Don't worry, it didn't sound like that at all. I was just being mildly sarcastic :)

> What led to this discussion was the massive linking of same-language sentences

I read that discussion too, and I don't see any crime there. Those sentences should absolutely be linked, this saves us (translators) a lot of efforts. You translate "I know I'm crazy" once and then just link it to "I know that I'm crazy", and also to "I know that I am crazy" because they're already linked - so they're there, all together.

As opposed to translating "I know I'm crazy". Then stumbling upon "I know that I'm crazy" in a year and translating it from scratch. And then translating "I know that I am crazy" again from scratch in 6 months.

One might argue it even brings more diversity, because if they're not linked originally, I translate three sentences that are the same separately, but if they're linked, I translate them in one go, link my translation to all three, and then when I'm looking for other sentences to translate I translate some other sentences, not that one.

As for the automated process or semi-automated process, I don't have my opinion on that. It depends how it is automated, how many mistakes that introduces, what kind of mistakes, and how promptly they're fixed.


EDIT. An example to illustrate my point. I just stumbled upon those 4 sentences, which are really 4 variants of the same sentence:

https://i.imgur.com/8sPK9AZ.png

I translated one of them, and linked my translation to all four. Now, when a learner of English stumbles upon my Ukrainian sentence, they would be able to see 4 ways to translate it:

https://i.imgur.com/zK4fy9t.png

At the same time, I didn't waste my time translating all four of them at some point of my translator's career, so the diversity is unaffected - I keep translating different sentences.

Linking them helps a lot all of us.

{{vm.hiddenReplies[34434] ? 'expand_more' : 'expand_less'}} hide replies show replies
Aiji Aiji 24 days ago March 13, 2020 at 12:39 PM link permalink

As a side note, could you tell me (either here or by PM) what is your usual process when you translate? By that I mean the criteria you use for your search or anything else that could be relevant.

Hopefully, that will be helpful to develop a way to deal with this situation.

{{vm.hiddenReplies[34436] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko 24 days ago, edited 24 days ago March 13, 2020 at 2:02 PM, edited March 13, 2020 at 2:03 PM link permalink

When I binge translate, I use this link:

https://tatoeba.org/eng/sentenc...o=&sort=random

Translate everything from English (not necessarily English, but mostly), sentences that have audio (I turn that off though when I translate from other languages), and that have no direct translations into Ukrainian, sort order - random.

So, obviously, if we have a "cluster" of linked English sentences and at least one of them is translated into Ukrainian, I won't see any of them, which makes sense.

I also like browsing Ukrainian sentences and creating direct links from indirect, when it's appropriate.

Sometimes I just search by keywords or expressions when I'm looking for something in particular.

Pandaa Pandaa 24 days ago, edited 24 days ago March 13, 2020 at 9:57 AM, edited March 13, 2020 at 10:02 AM link permalink

+1
Plusz, nem csak a tanulók számára nem olyan hasznos, hanem a fordítóknak sem, ha ezer és ezernyi ugyanolyan mondatot fordít valaki, nincs diverzitás, csak butítja.
Nem egy fordító számolt be róla, hogy bizonyos nyelvekből csak kopott a tudása, mióta a Tatoebán fordít.

Ooneykcall Ooneykcall 24 days ago March 13, 2020 at 4:22 PM link permalink

Being able to exclude sentences owned by specific users from the search would go a long way towards ensuring diversity in one's personal search results, actually it would be entirely good enough for now if one could just exclude CK, CM, CF etc.

CK CK 11 days ago March 26, 2020 at 8:26 AM link permalink

Does this seem like most members can understand the reasons for doing the following and that it would be OK to continue doing so?

1. Contribute various sentences in the same language that have the same meanings.

2. Link sentences in the same language that are interchangeable.

{{vm.hiddenReplies[34607] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 11 days ago March 26, 2020 at 9:49 AM link permalink

1. On ok, mutta mieluiten silloin, kun lisäät ne kaikki saman toisenkielisen lauseen käännöksinä. Vaihtelevat lauseet ovat myös parempia. Kyllä ne samaatarkoittavat lauseet ilmestyvät itsekseenkin, kun lauseita käännetään kielistä toisille.

2. Jos muistan oikein, sinuahan nimenomaan pyydettiin välttämään tätä, koska se vaikuttaa oudolta uudessa lausenäkymässä. Minulla ei sinänsä ole mielipidettä asiasta.

TRANG TRANG 18 days ago March 18, 2020 at 10:04 PM link permalink

**Transcriptions in the new sentence design**

The transcription feature is ready to be tested on the dev website. I encourage everyone to test it, even if you have never used this feature before.

https://dev.tatoeba.org/

As usual, let me know here if you find any issue or find it too confusing to use.

Thank you!

{{vm.hiddenReplies[34534] ? 'expand_more' : 'expand_less'}} hide replies show replies
fjay69 fjay69 18 days ago, edited 18 days ago March 19, 2020 at 6:26 AM, edited March 19, 2020 at 9:01 AM link permalink

Note that you have to enable "Always show transcriptions and alternative scripts" in your settings.

{{vm.hiddenReplies[34535] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 18 days ago March 19, 2020 at 8:42 AM link permalink

You don't need to enable "Always show transcriptions and alternative scripts" to be able to see or edit transcriptions, but it does makes them more noticeable. You will otherwise have to expand the sentence menu in order to see/edit them.

Languages for which you can edit transcriptions are Mandarin Chinese (pinyin) and Japanese (furigana).

Some other languages have transcriptions but they won't be editable: Cantonese and Uzbek.

{{vm.hiddenReplies[34536] ? 'expand_more' : 'expand_less'}} hide replies show replies
fjay69 fjay69 18 days ago March 19, 2020 at 9:00 AM link permalink

OK. In the new design I can't review a machine generated transcription, if a sentence not belongs to me.

{{vm.hiddenReplies[34537] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 18 days ago March 19, 2020 at 11:20 AM link permalink

Since you are not advanced contributor on the dev website, that's normal.

I just changed your status to advanced contributor right now, so you can try to edit generated transcriptions of sentences that are not yours.

{{vm.hiddenReplies[34538] ? 'expand_more' : 'expand_less'}} hide replies show replies
fjay69 fjay69 18 days ago March 19, 2020 at 11:44 AM link permalink

I see. Can we, instead of hiding "Edit transcription" icon, make it inactive with tooltip "You have to be an advanced contributor to edit transcriptions"?

{{vm.hiddenReplies[34539] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 17 days ago March 19, 2020 at 9:37 PM link permalink

We can but would that be really necessary? In your case you were confused to not see the button because you were used to seeing in on the main website. For regular contributors, I think it should be intuitive that they can't edit transcriptions of other people's sentences (just like they can't edit other people's sentences).

{{vm.hiddenReplies[34544] ? 'expand_more' : 'expand_less'}} hide replies show replies
fjay69 fjay69 17 days ago March 20, 2020 at 4:39 AM link permalink

From https://en.wiki.tatoeba.org/art...d-contributors
"After you become an Advanced Contributor, you will be able to do these 2 things.
1. link and unlink sentences
2. tag sentences"
Should we add
3. edit transcriptions of any sentences
?

{{vm.hiddenReplies[34546] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 16 days ago March 21, 2020 at 5:17 PM link permalink

I have updated the article.

{{vm.hiddenReplies[34555] ? 'expand_more' : 'expand_less'}} hide replies show replies
fjay69 fjay69 16 days ago March 21, 2020 at 5:52 PM link permalink

Thanks!

TRANG TRANG 12 days ago March 24, 2020 at 11:46 PM link permalink

I have made some changes how editing transcriptions work.

If you have time, please go to the dev website, make sure you have enabled the new sentence design and try to edit some transcriptions.

https://dev.tatoeba.org/

If you have already edited transcriptions before: let me know if the new way is intuitive enough and if there's anything in the old design that you are missing in the new design.

If you have never edited transcriptions before: let me know if you could easily figure out how it works. Be aware that only Japanese and Mandarin Chinese have editable transcriptions.

Thanks!

{{vm.hiddenReplies[34596] ? 'expand_more' : 'expand_less'}} hide replies show replies
Aiji Aiji 12 days ago, edited 12 days ago March 25, 2020 at 1:16 AM, edited March 25, 2020 at 1:17 AM link permalink

I tried to shake the system a little bit and strange things happened. Quickly:

- I went to https://dev.tatoeba.org/fra/sentences/show/701209
- I tried added spaces inside the curly bracket, click Save -> Bad Request is displayed in red under "Transcription". Not sure that's a good message to display.
- I tried adding spaces after the curly bracket -> Bad Request.
- I tried removing spaces to put back the transcription to what it was originally -> Bad Request. But I guess that it's because I didn't put everything back correctly.

- I edited the sentence to be "あり得ねぇー。.", click Save. The transcription becomes "あり得{え}ねぇー。.{}"
- Again, I click edit and put the sentence to "あり得ねぇー。" (the transcription is still "あり得{え}ねぇー。.{}"), but I click on "Mark as reviewed" instead of "Save". The sentence is still "あり得ねぇー。."
- I clik edit again. Inside the edit block, the sentence is "あり得ねぇー。" (my unsaved change was kept, I guessed) and the transcription is "あり得ねぇー。.{}".
- I want to modify the transcription, but now the "Mark as reviewed" button is gone!
- I don't know what to do, so I click on "Reset". The edit block closes. The sentence I see is "あり得ねぇー。" while the transcription I see is "あり得ねぇー。.{}".
- If I open the edit block, the same is displayed. However, if I reload the page, the sentence and the transcriptions are the same.

{{vm.hiddenReplies[34597] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 12 days ago March 25, 2020 at 12:47 PM link permalink

The error message should be "The transcription could not be saved". I'm not sure why locally it displays properly but on the dev website it doesn't...

The extra brackets that are added in the transcription is also a different behavior than locally.

For the interactions with the reset and review buttons, the current implementation makes these buttons completely independent from the rest of the form. Clicking on one of them will ignore any other modification and only apply what the button is supposed to do (either resetting the transcription or marking it as reviewed).

The use case they cover are:
- What if you want to reset a transcription to the auto-generated value?
- What if an auto-generated transcription is correct but has a warning, how do you remove that warning?

Integrating these use cases as part of the overall edit workflow is tricky...

{{vm.hiddenReplies[34599] ? 'expand_more' : 'expand_less'}} hide replies show replies
Aiji Aiji 12 days ago March 25, 2020 at 1:29 PM link permalink

Also, I was wondering if it was necessary to have the "edit transcription" inside the "edit sentence" functionality. Since the line for transcription has no buttons, maybe it would be simpler (and more intuitive maybe, maybe not) to have a separate button, on the transcription line, for the "edit transcription" functionality only.

I'm aware I have a bias due to my status of old user. It wouldn't be difficult to get used to the way you implemented it. Just sharing ideas.

{{vm.hiddenReplies[34600] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux 12 days ago March 25, 2020 at 3:20 PM link permalink

It used to be like that, but it was changed for the reasons mentioned in this discussion: https://github.com/Tatoeba/tatoeba2/pull/2224

Ricardo14 Ricardo14 12 days ago March 25, 2020 at 9:02 AM link permalink

## Translations on Tatoeba User Interface (U.I.) ##

» As far as many people have joint us, many questions arose about this platform (Transifex) and the translation of the UI. That said, on the following days, I'm going to start to create a FAQ on Tatoeba Wiki - https://en.wiki.tatoeba.org/. Just stay tuned here and on the Wall. By there, I'd like to get some more questions from you. Please, send them to me by email - rvj1444@gmail.com - or just PM me -

» New content has been added and it'll continue happening whenever Tatoeba gets updated. For **now**, the following got the "ready for use" status (got all the strings translated:

- Basque
- Esperanto
- Finnish
- Portuguese
- Russian
- Spanish

I hope more languages get this status soon (I'll reply to this thread when it happens until the next "Translations on Tatoeba User Interface (U.I.)" update.

» The developers have created a way to avoid us, U.I. translators, to translate what shouldn't be translated like things in the brackets

» The developers have found a way (at least for now) to try to provide some context by adding screenshots on strings' comments. So please, whenever you have a question, check if there's a screenshot in the respective string already - http://prntscr.com/rmdbxy

Thanks to **everyone** who have been contributing and welcome to the new members.

If you want to help us translate the website to your language, you can join us on Transifex: https://www.transifex.com/tatoe...ite/dashboard/, check this article on the wiki https://en.wiki.tatoeba.org/art...ce-translation and PM me

Thanuir Thanuir 13 days ago March 24, 2020 at 5:46 AM link permalink

Seuraavat lauseet sopivat hyvin myös miehen suuhun. ”female speaker”-tunnus vaikuttaa oudolta ja tarpeettomalta. Ehdotan sen poistamista näistä.

https://tatoeba.org/spa/sentences/show/1496250
(seurusteluahan harrastavat sekä miehet että naiset ja molemmilta se voidaan kieltää)

https://tatoeba.org/spa/sentences/show/5422768
(sukupuolineutraalin avioliiton takia, mutta myös, koska sanojen ”fiancé” ja ”fiancée” välinen ero on ilmeisesti häviämässä englannista)

Loput taitavat olla muotiseikkoja. Sekä historiasta että nykymuodista löytyy runsain mitoin vastaesimerkkejä lauseiden sukupuolittuneisuudelle.

https://tatoeba.org/spa/sentences/show/34510
https://tatoeba.org/spa/sentences/show/36857
https://tatoeba.org/spa/sentences/show/36858
https://tatoeba.org/spa/sentences/show/41673
https://tatoeba.org/spa/sentences/show/41673
https://tatoeba.org/spa/sentences/show/57274
https://tatoeba.org/spa/sentences/show/58071
https://tatoeba.org/spa/sentences/show/60774
https://tatoeba.org/spa/sentences/show/60775

{{vm.hiddenReplies[34594] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 13 days ago March 24, 2020 at 6:41 AM link permalink

Sivustosta halutaan ystävällisempi, ja osa sitä on, ettei tehdä turhia sukupuolittavia oletuksia.