menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
Thanuir Thanuir March 25, 2019 March 25, 2019 at 10:27:54 AM UTC link Permalink

What to do with red sentences?

There are some sentences which are, as far as I know, completely valid, but they are in red and can not be translated. (E.g. https://tatoeba.org/fin/sentences/show/3313641 , https://tatoeba.org/fin/sentences/show/415526 )

The sentences can still be linked. Adopting them does not change the status. I have not tested if editing them changes the red status.

What is up with these sentences and what, if anything, should one do with them? (I am guessing they are from frozen contributors, but I do not know if that is the sole criterion.)

{{vm.hiddenReplies[31542] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG March 25, 2019 March 25, 2019 at 5:43:59 PM UTC link Permalink

Currently, the only solution is to contact an admin so they "re-approve" those sentences.

It would definitely be useful (and it has been considered) to remove the red warning when the sentence gets adopted by a trusted user, but this has yet to be implemented.

{{vm.hiddenReplies[31544] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir March 27, 2019 March 27, 2019 at 7:15:52 PM UTC link Permalink

A personal message informed me that some of the red sentences might be copyright violations.

1. If they are copyright violations, they should be removed, not marked red.

I am not sure they can be; if there exists a French teacher's organization, they might have good ideas about where the boundaries of copyright violations lie in these types of cases. I would be very surprised if a collection of standard and common phrases would be a copyright violation, but I know only a little bit and that about the Finnish law, not the French one.

2. If they are of utterly awful quality, they could be removed en masse.

3. If they are salvageable or reasonable, I see no benefit to the red status. Maybe I am missing it. Automatic tagging with @needs native check , maybe, instead?

{{vm.hiddenReplies[31568] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG March 27, 2019 March 27, 2019 at 8:47:55 PM UTC link Permalink

If they were clear copyright violations, we would of course remove them. For instance if you're gonna copy-paste every sentence from the Harry Potter books, that would be an issue and we wouldn't keep the sentences.

In our case, the red sentences are in an unclear status. Many of them aren't really out of the ordinary and sound like something that anyone could come up with. Some of them are actually only a problem because they are not CC BY compatible (for instanced copied from CC BY-SA content) and would actually be fine once we would implement this: https://github.com/Tatoeba/tatoeba2/issues/1659

As for other sentences that are not subject to copyright issues, most of the time when we mark them as red, they were not reviewed one by one. They were just the result of us figuring out that one user has a lot of problematic sentences, and no one has time to check each sentence one by one. So we mark them all red to reduce the risk that they get translated and because fixing a sentence that has been translated is often a mess. It's better to first fix them then remove the red mark.

{{vm.hiddenReplies[31570] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK March 28, 2019, edited November 1, 2019 March 28, 2019 at 12:18:37 AM UTC, edited November 1, 2019 at 3:28:22 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[31574] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir March 28, 2019 March 28, 2019 at 5:39:44 AM UTC link Permalink

That would not be obvious in the Finnish legal system. The original source would have to be unique enough to be considered a "teos", i.e. demonstrate unique artistic vision and be something another person would not have come up with.

This would be unlikely to be true of a short list a common greetings and their translations. A more unique translation would certainly qualify.

But the French copyright law is probably different.

TRANG TRANG March 28, 2019 March 28, 2019 at 11:14:43 PM UTC link Permalink

It is true that lists can be copyrighted, but it's not as straightforward when you take into account fair use. You'd have to consider how much of the list has been copied, and of course whether the list itself had some originality or not.

For instance let's say I have copied 20 basic sentences from a very large dataset of thousands and thousands of sentences. It would be impossible for the copyright owner of the dataset to claim a copyright infringement.

If anything, we should be able to keep very basic sentences within Tatoeba, even if they have been copied, as long as they are not presented within Tatoeba in a list that looks nearly identical to the source.

For instance if I'm a new user and I've copied 20 basic sentences from a blog post, and those are my only sentences in Tatoeba, then there could be copyright infringement due to the sentences being listed on my sentences page. But if I unadopt all these sentences, and there become blended into the corpus, then there is almost no more risk of copyright infringement because I have effectively removed the list.

On the other hand let's assume someone wants to create a list of basic sentences in Tatoeba, and they came upon an interesting blog where the author compiled a list of good sentences to learn for beginners. They decide to re-create this list in Tatoeba. Coincidentally, all the sentences on that blog post already exist in Tatoeba. Well, the list that the user created in Tatoeba can be considered copyright infringement despite the fact that the sentences existed already. But we would just need to delete the list to solve the copyright issue, we wouldn't have to delete the sentences.

My point is that when dealing with copyright on lists, we have to look at the places where Tatoeba lists sentences ("My sentences" page, lists, tags, favorites). This is where the list copyright infringement can make sense and it is the list that we would need to delete, not the items in the list. For the items, we have to look at each of them individually and evaluate if on their own they have any originality or not. And if not, it is fine to keep them.

Thanuir Thanuir March 28, 2019 March 28, 2019 at 5:41:49 AM UTC link Permalink

I can, with some confidence, say that "Vi ses!" is okay in Swedish.

{{vm.hiddenReplies[31577] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG March 29, 2019 March 29, 2019 at 12:00:56 AM UTC link Permalink

I'll remove the red mark on the sentences you mentioned.

I've also created an issue on this topic: https://github.com/Tatoeba/tatoeba2/issues/1847

Feel free to suggest other solutions if you have any better ideas.