menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
TRANG TRANG July 31, 2018 July 31, 2018 at 5:28:53 PM UTC link Permalink

**Testing the detection of original sentences**

As one of the first steps towards allowing multiple licenses in Tatoeba, we've been working on segregating "original" sentences from translations.

https://github.com/Tatoeba/tatoeba2/issues/1589

We have deployed a first iteration on the dev website and would need you to test.

If you go to the page of a sentence, for instance
https://dev.tatoeba.org/eng/sentences/show/576307, you'll see in the "Logs" section whether the sentence is original or a translation from another sentence.
Have a look around and see if you notice anything wrong regarding which sentences are marked as "original" and which aren't. Perhaps there are some tricky cases when deduplication comes into the mix.

We're also not completely decided yet on how to display this information. It's displayed in the Logs section but it's perhaps not the best place. Here would be other options:
https://github.com/Tatoeba/tato...ment-408595821
Any feedback on that would be appreciated as well.

Thank you!

{{vm.hiddenReplies[29534] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 August 1, 2018 August 1, 2018 at 10:35:21 AM UTC link Permalink

Some questions "summoned to my mind":

1st - Do I have to set up each sentence to CC0 or is there a way to "convert" them all?

2nd - Isn't there a "symbol" which would be easier to indentify if the sentence is CC0 or CC-BY-2.0?

3rd - Can we have the same sentence (duplicates) that belong to different licenses?

I'm Ricardo. (CC-BY-2.0)
I'm Ricardo. (CC0 1.0)

{{vm.hiddenReplies[29535] ? 'expand_more' : 'expand_less'}} hide replies show replies
Guybrush88 Guybrush88 August 1, 2018 August 1, 2018 at 11:45:02 AM UTC link Permalink

"3rd - Can we have the same sentence (duplicates) that belong to different licenses?

I'm Ricardo. (CC-BY-2.0)
I'm Ricardo. (CC0 1.0)"

Generally speaking, duplicates are merged, so in general I think there should be a single license for them, but personally I don't think it would be useful to have multiple licenses for exactly identical sentences. If I were someone who chooses to reuse data from Tatoeba, I would find it very confusing to have completely different licenses for a single sentence, like having two different licenses for a sentence like "I'm Ricardo".

{{vm.hiddenReplies[29536] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 August 2, 2018 August 2, 2018 at 1:33:35 AM UTC link Permalink

> but personally I don't think it would be useful to have multiple licenses for exactly identical sentences.

Agreed but I don't know how it's going to work. If we won't have two identical sentences with different licenses, which one will prevail?

> If I were someone who chooses to reuse data from Tatoeba, I would find it very confusing to have completely different licenses for a single sentence, like having two different licenses for a sentence like "I'm Ricardo".

Maybe not. Depending on how and why I want to use this, I should look for sentences written on a specific license, shouldn't I?

{{vm.hiddenReplies[29537] ? 'expand_more' : 'expand_less'}} hide replies show replies
Guybrush88 Guybrush88 August 2, 2018 August 2, 2018 at 10:15:42 PM UTC link Permalink

> I should look for sentences written on a specific license, shouldn't I?

I agree with this, but, with different licenses applied to the same sentence, I have your same question: which one will prevail?

And this is why I find it confusing

TRANG TRANG August 3, 2018 August 3, 2018 at 5:18:51 AM UTC link Permalink

Note that for now we are only at the step where we try to identify sentences that are original within the existing ones. We are not dealing with licenses yet. You are able to see a license dropdown on the "Add sentence" page because you are admin on the dev website. But this feature is not supposed to be used yet and is limited to admins.

> 1st - Do I have to set up each sentence to CC0 or is there a way to "convert" them all?

When we officially allow people to change the license of their original sentences, we will of course provide a way to convert all sentences.

> 2nd - Isn't there a "symbol" which would be easier to indentify if the sentence is CC0
> or CC-BY-2.0?

There will be more licenses than just CC0 and CC-BY. Having just one symbol would not be enough.

> 3rd - Can we have the same sentence (duplicates) that belong to different licenses?
> ...
> If we won't have two identical sentences with different licenses, which one will
> prevail?

This case could happen, yes. We don't have a final decision for what to do yet but provided that both duplicates are "original" (i.e. the contributor didn't add it as a translation and came up with the sentence by themselves), then it would be more logical to me that the least restrictive license prevails.