Menu
So Tom_Facts was suspended. Kinda sad, isn't it? Obviously an alt account made to post joke sentences, which may technically be against the rules, but those sentences were all linguistically sound, just didn't make ordinary sense, for example this one #8639644. It was a nice set of jokes that in a way hearkened back to "Tom and Mary in the land of sentences" from the olden days. I am very much not happy that those funny sentences are rendered illegitimate and closed for further translations. Could an admin please unblock them, since they are grammatically correct and understandable, regardless of not making sense in the real world.
It was not for offensive reasons, nor because some don't make sense, but for legal reasons.
See my comment here:
https://tatoeba.org/eng/sentenc...omment-1167887
In case this situation is saddening more people, then please know that there is a technical solution: https://github.com/Tatoeba/tatoeba2/issues/1659
But somebody has to volunteer to implement it.
Until that is implemented, we unfortunately cannot accept contributions when they were very obviously mass-copied from a certain source, and that source is not compatible with CC BY.
Alternatively, someone could try to negotiate with the owners of https://chucknorrisfacts.net/ and see if they would be willing to allow reuse of their content.
> Alternatively, someone could try to negotiate with the owners of https://chucknorrisfacts.net/ and see if they would be willing to allow reuse of their content.
That's what I'm going to do. It's just a joke after all, figures they wouldn't be stuck-up about it. I think preserving the sentences in their original form, with Chuck Norris as the protagonist, would be better though, so if this goes well, I suppose what we do is you set them free (of Tom_Facts' "ownership"), I adopt them and mass edit Tom to Chuck Norris and edit my translations accordingly and ask deniko to edit his and I think no one else translated those sentences other than the PIN code one. Bit tedious but there are only 192 sentences, it wouldn't take that long.
You would have to convince them to officially add their licensing terms on the website. If you find a contact point via email, please include team@tatoeba.org in CC.
I would not be as confident as you that they would agree on allowing reuse of their content under a CC BY compatible license. In the end it depends what is their ads revenue.
Making their content CC BY compatible means that anyone can technically and legally just pump their content and create a website that directly competes with them. As a result, there will be a risk that less and less people visit their website. If they make little to no money from ads, they won't care about this risk. But if they make a significant amount of money, I would be extremely surprised that they agree to take such a risk.
In that case, if we keep 'Tom', wouldn't it help solve the issue if the site admins agreed to accept derivative content and not direct copying. That's different since a hypothetical "not Chuck Norris facts" site obviously couldn't compete as people know it's supposed to be about Chuck Norris and would think something's fishy.
I don't see why it has to be an official request, really... I mean, we're just normal people having fun, not sleazy lawyers. It's my personal initiative/request and I've sent it as such. If I'm granted any permission I'll have the email for proof, no problem.
> In that case, if we keep 'Tom', wouldn't it help solve the issue if the
> site admins agreed to accept derivative content and not direct copying.
No, it wouldn't be compatible with CC BY. We wouldn't be able to keep these sentences as part of the corpus that distribute because of this scenario:
- They allow derivatives of their jokes.
- We also allow derivatives of our sentences (that's the nature of CC BY).
- Someone copies sentences from Tatoeba into their website and changes ever sentence with "Tom" to "Chuck Norris".
- Someone has then indirectly copied sentences from the Chuck Norris facts website.
> I don't see why it has to be an official request, really...
Because it would be irresponsible and disrespectful to ignore intellectual property.
https://en.wiki.tatoeba.org/art...ting-sentences
- They allow derivatives of their jokes.
- We also allow derivatives of our sentences (that's the nature of CC BY).
- Someone copies sentences from Tatoeba into their website and changes ever sentence with "Tom" to "Chuck Norris".
- Someone has then indirectly copied sentences from the Chuck Norris facts website.
Are you expecting that to happen, really.
You didn't use to be so concerned with lawyerese I reckon, sad times if you have to fear someone could cry havoc over this, because nobody in their right mind would, but sick minds specialise at giving sound minds a headache.
> Because it would be irresponsible and disrespectful to ignore intellectual property.
That's no answer as to why there would be any need for officialdom. I'm communicating as myself, a private person not a legal entity. All I need is to make sure the admin(s) of that site approve of adding their sentences here and have no mind to object to it.
Chuck Norris jokes weren't invented by that website, too, it's the other way. They've added new jokes over the years, obviously, but most of those seem to have been submitted by others. How much of that copyright actually holds, hmm.
> You didn't use to be so concerned with lawyerese I reckon
I am not concerned when the sample of copied content is so minimal that we can still make the assumption that the contributor came up with the sentences on their own.
I have however always been concerned whenever we could very clearly identify the original source of the copied content.
> That's no answer as to why there would be any need for officialdom.
I will try to explain more clearly.
As long as *all* our sentences are exported into a dataset that we distribute under CC BY, it would be irresponsible and disrespectful to ignore intellectual property. We, Tatoeba, have some liability in regards of what is inside this dataset.
Yes, it is crowdsourced and it is not possible for us to monitor every single sentence. But that doesn't take away from us the responsibility to try our best to be compliant with the laws. If we can't do it, we have to stop releasing our content under CC BY. But I'm not going to sacrifice open data just for Chuck Norris jokes.
Personally I'm not going to be nitpicky about a handful of non-CC BY sentences. But there's a point where it starts to be a bit too much. Having close to 200 sentences gathered under the same account makes it very, very obvious what the source is. And when it becomes very obvious what the source is, it is too much.
Now again, the main blocking point is that *all* our sentences are exported into a dataset that we distribute under CC BY. If we could somehow exclude some sentences from our exports, that would solve the problem. Well it turns out that sentences set as "unapproved" are not exported. So that has been our temporary solution for keeping non CC BY sentences in Tatoeba.
But this poses another problem: we will have sentences in red even though they are correct. To solve that, we have a technical solution: https://github.com/Tatoeba/tatoeba2/issues/1659.
Once that is implemented, we can much more safely allow people to copy sentences from other sources because we can easily remove obvious non CC BY content from our CC BY dataset.
The content will no longer be labelled as "CC BY" data, it will just be on our website labelled as "no license" or "unknown license" or whatever else that makes it clear it's not for reuse (or if one chooses to reuse, it will be at their own risk).
We will still encourage everyone to cite the sources, to give attribution where it's due. We will also still remove content from Tatoeba if the original authors ask us to do so. But we don't have to worry about license compatibility.
Also, please be aware that is intellectual work on creating a collection of sentences even if each sentence of the collection are individually free of intellectual property.
For instance you can create a list of "1000 most common sentences". If you take each sentence individually, you can't argue that you own these sentences, millions of people have used them before you. But if someone was to take your exact list and publish in their language learning website as "1000 sentences for beginners", then they basically ripped off your work. Because it is intellectual work to come up with criteria for selecting what is more common.
In the case of Chuck Norris facts, they built a website, they have people accepting or rejecting the submissions, they set up a whole infrastructure to provide collect and share these jokes. It *is* work, is it intellectual value. They cannot claim that each joke belongs to them, but they can claim that the collection of jokes belongs to them. Not only that, but they have ads on their website, so it is also money. Personally, when money is involved, I don't want to make any sort of optimistic assumption.
I hope this clarifies things.
Thanks for taking your time to present a detailed account. It's not like I had zero understanding, but having an explanation laid out is quite a bit helpful; nice of you to write one up.
This analogy doesn't seem to apply to Tatoeba as it is since sentences are added individually though? I mean, I *could* add 1000 common sentences from such a list as you describe, and this couldn't be copyright violation since nobody owns any of those sentences individually. It seems there's no violation as long as I don't combine them into a list similar to the original one; only then am I arguably using someone else's work.
I have to agree that adding those sentences under an alt account set up specifically for that purpose makes it too obvious. But if a regularly contributing account/user, such as me, added common sentences from a certain source along with other sentences so their existence as a relatively large set wouldn't be visible, it would be realistically no problemo, hmm? Since it only becomes a real problem when you can easily trace those sentences and conceive of them as a single list, rather than stumble upon invididual items that do not suggest a bigger list exists. That is, if those sentences aren't of the sort that makes you think right away they might come from a single author, which Internet memes are not as a anyone may want to create more phrases with the same jocular pattern.
Though I realize you're probably posing your question hypothetically, it sounds to me like "Who cares whether it's wrong as long as we can come up with a scheme for getting away with it?" Aside from the ethical problem, it seems like a bad idea to rely on our speculations, as non-lawyers, about what might or might not get us in trouble with the law. In any case, I don't think Tatoeba is so hard up for sentences that we need to steal them.
> Who cares whether it's wrong as long as we can come up with a scheme for getting away with it?
You mean legally wrong. Common jokes shouldn't be copyrightable if we're talking morally, but oh well.
Let me add a pinch of explanation because I think it's important. The short answer is: no, it won't work. It could even worsen things, because a lawyer could accuse you of being fully aware of your copyright infringement and trying to hide it (aggravating circumstance). It's like saying "I laundry my dirty 2 millions, but only 2 000 by 2 000. The authorities won't catch me so it's safe."
The scenario is simple. Suppose I own the copyright on contents of some sort, here, sentences. I build a script to search the web for potential copyright infringement (if I have a lot of money, I pay for such script). Above a certain threshold (chosen more or less arbitrary), it is decided that "potential" becomes "very likely". The two common ways to deal from here are the following:
- I'm relatively a nice person so I contact you, explain the problem and ask you to kindly deal with the situation.
- I don't care at all, and demand the removal of your copyrighted content otherwise I'll send you my lawyer.
That's how it works on Youtube for example. Copyright owners won't care who the content creators are, they'll strike their videos, period. You're just a science communicator who makes instructive videos? Well, too bad you used 17 seconds of Black Eyed Peas. Talking about life in the U.S.? Well don't use Mario Main Theme.
Sometimes the content creator will make a phone call, find a reasonable people and a happy outcome (or they belong to a powerful network that can negotiate...), sometimes they won't.
And don't think that popular stuff, the folklore, or something that everybody know is safe. That's how copyright scammers work on Youtube. They claim ownership on any piece of content that is not copyrighted to be able to strike videos and get the corresponding share of the money. Sometimes they even try to claim copyright over content that were officially release for free-use...
Of course, there's no need to be paranoid. But being aware that some people are merciless help being cautious. Since Tatoeba ask people to add their own sentences, it's normal to expect that people will be cautious about adding copyrighted (or possibly copyrighted) content.
> This analogy doesn't seem to apply to Tatoeba as it is since sentences are added individually though? I mean, I *could* add 1000 common sentences from such a list as you describe, and this couldn't be copyright violation since nobody owns any of those sentences individually. It seems there's no violation as long as I don't combine them into a list similar to the original one; only then am I arguably using someone else's work.
Have you ever heard about "Database right"?: https://en.wikipedia.org/wiki/Database_right
As for changing back "Tom" to "Chuck Norris", I would recommend against it. Keeping Tom is okay, these are jokes that has been customized for Tatoeba.
On a side note, I would like to point out that before Tom, there was Christopher Columbus:
https://tatoeba.org/eng/tags/sh..._with_tag/1158
> I would like to point out that before Tom, there was Christopher Columbus:
I don't think this is true.
In English, the first "Columbus" sentence is #35544. The first "Tom" sentence was #1780. There were 34 other "Tom" sentences with lower numbers under 35544.
There are 652 sentences with Tom under #467000 and 18 "Columbus" sentences.
The first "Christopher Columbus" sentence is #536592. There are 671 "Tom" sentences with numbers under this.
I didn't mean that there was no sentence with Tom before. I meant that when it comes to making jokes with a protagonist who has super human abilities, this trend appeared in Tatoeba with "Christopher Columbus" before "Tom".
> Alternatively, someone could try to negotiate with the owners of https://chucknorrisfacts.net/
These jokes about Chuck Norris are basically modern folklore, I feel like it would be super weird for someone to claim any copyright on them. I'm sure chucknorrisfacts.net is not the only website that collects this folklore. Are you sure they were copied from there and they're not part of Tom_Facts's own collection? The jokes are literally everywhere.
https://www.reddit.com/r/ChuckNorris/
https://parade.com/968666/parad...-norris-jokes/
etc.
I remember reading jokes about him in this style long before chucknorrisfacts.net was created (year 2017, according to whois lookup).
+∞
Täältä saattaisi löytyä muutama CC-lisensoituna: https://commons.wikimedia.org/w...related_images
👍
Just because the picture of a sentence is CC licensed, doesn't make the sentence itself CC licensed.
This file seems to be CC-BY -licensed: https://commons.wikimedia.org/w...indow_sign.jpg
The title of the file contains the sentence. (The picture does, too.) The title of the file is a part of the file. Hence, the title seems to be CC-licensed, too. Or am I missing something?
Reusing sentences that can be seen in pictures that were published under CC BY can give us a safety net, but it relies on the assumption that whoever uploaded these pictures was knowledgeable enough about intellectual property.
If @Tom_Facts (or anyone) can justify that there is a legit, creative or intellectually demanding process behind finding and converting Chuck Norris facts into Tom facts, then (to me at least) it would be a better defense than saying "I extracted these jokes from CC BY pictures".
If the process is "I'm browsing random Chuck Norris facts on some website(s) and I take those that I like, replace the name with Tom and add them to Tatoeba", the intellectual added value is... too minimal.
En tunne Ranskan tekijänoikeuslainsäädäntöä; paras suomalainen lähde, jonka löysin, on tämä: http://www.kysy.fi/kysymys/voik...kijanoikeuksia
Siinä ei mainita, että vitsien teoskynnyksestä olisi oikeustapauksia. Monien tekijänoikeus on iän takia hävinnyt ja monet ovat osa yleistä kansanperinnettä. ”Pikku Kalle”-vitsit ja ”Suomalainen, ruotsalainen ja norjalainen”-vitsit kuulunevat näihin luokkiin, mutta ”Chuck Norris”-vitsit saattavat olla liian tuoreita.
(Tämä on yksi esimerkki siitä, miten tekijänoikeus käsitteenä haittaa ja hankaloittaa luovaa, hyödyllistä ja osittain myös tieteellistä työtä, aivan kuten monopolit yleensä ovat haitallisia. Tekijänoikeudet tulisi poistaa tai ainakin heikentää niitä suuresti. Ne ovat vahingollisia kulttuurille ja tieteelle.)
The problem isn't about whether a single joke is protected by copyright or not (it most likely isn't as are the majority of the sentences on Tatoeba IMHO.)
The problem is scraping a lot of jokes/sentences from another database where we don't know its license because databases can be considered as creative works by themselves independent of whether their items are creative works or just facts (i.e. protected by copyright or not).
Now the question is whether compiling a list of jokes is considered as a creative work. I'm pretty sure it is in Europe which has rather strict database laws (see the Wikipedia article I've mentioned in another message). But even in the US which as far as I know don't have such strict laws, it seems to be likely as the following passage shows:
"An example of a database that is protected as a compilation would be a database of selected quotations from U.S. Presidents. The individual quotations themselves may or may not be subject to copyright protection. However, the selection of the quotations involves enough original, creative expression that it is protected by copyright. Therefore, a database of quotations will be protected by copyright as a compilation even though some of the quotations are not protected." (from https://www.bitlaw.com/copyright/database.html )
So as long as nobody can prove reliable that the owner of chucknorrisfacts.net is ok with scraping many or all of the jokes, I think it's better to stop adding them.
Olen samaa mieltä siitä, että kokoelma kokonaisuudessaan tai merkittävin osin saattaa olla kopiointioikeuksien suojaamaa ja sen ottaminen käyttöön ilman lupaa on epäkohteliasta.
First, please take the time to read fully my reply to Ooneykcall:
https://tatoeba.org/eng/wall/sh...#message_34643
> I remember reading jokes about him in this style long before
> chucknorrisfacts.net was created (year 2017, according to whois lookup).
Note that the domain chucknorrisfacts.net may have been registered in 2017, but the website itself existed since 2008 (according to their footer). It was probably under another domain before 2017.
If the "2008" indicated in the footer is a lie and they exist only since 2017, then they did a great job at replicating website design from 15 years ago...
> Are you sure they were copied from there and they're not part of Tom_Facts's
> own collection?
@Tom_Facts did not explicitly tell me they copied from chucknorrisfacts.net and I do not possess secret government agency surveillance tools, nor do I possess mind reading powers. So no, I'm not sure.
But when I check some of the sentences, I'm getting results like these: https://imgur.com/a/zzLQJQL. I can't help but being very suspicious that a large part of those jokes are copied from chucknorrisfacts.net, as it happens to the be common denominator in these results.
I'm very well aware that these jokes are everywhere. I know many of these jokes, I laughed at many of these jokes, and I wished just as much as many people here that we didn't have to worry, ever, about copyright or licensing or intellectual property.
But that won't change the fact that inserting these jokes on a large scale poses a legal risk for Tatoeba. The more are being added, the more the risk is growing.
If some content has been copied and re-copied, it wouldn't necessarily and magically become CC BY compatible. So even if @Tom_Facts didn't copy directly from chucknorrisfacts.net, but from a website that itself copied from chucknorrisfacts.net, then it can still be a problem.
And I will insist again: the main problem is not that these jokes are published on Tatoeba, the main problem is that they are *incorrectly licensed under CC BY*.
Is there any platform or blog that publishes Chuck Norris facts under a Creative Commons license? No. There's none. So we cannot be thinking "Well, everyone else uses these jokes, why can't we?". We have different legal constraints, that's why we can't.
Now you are lucky, Andreas (aka. @rumpelstilzchen) has volunteered to tackle the issue https://github.com/Tatoeba/tatoeba2/issues/1659. But until it is ready and deployed, please (everyone), don't copy more jokes and memes into Tatoeba. Don't translate them if you see any that might be legally shady. Ask people to stop if you see anyone doing it. Just wait till we have the proper features in place, or just create your own jokes instead. I have no time to play the cop so I count on everyone's cooperation. Thanks.