menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
FeuDRenais FeuDRenais October 19, 2010 October 19, 2010 at 9:23:19 PM UTC link Permalink

Deliberately Creating Duplicates Instead of Linking (because it's actually *faster*):

Should this behavior be encouraged?

{{vm.hiddenReplies[3772] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic October 19, 2010 October 19, 2010 at 9:40:51 PM UTC link Permalink

could you illustrate the issue, please?

kebukebu kebukebu October 19, 2010 October 19, 2010 at 10:28:31 PM UTC link Permalink

Are you referring to people adding new sentences or translations instead of taking the time to search for similar or identical sentences in the language they are using? (I've run across countless numbers of these, especially in Esperanto and a few in Spanish, French, English, etc.)

One of the problems is that people either don't look at the indirect translations, or probably more commonly, the identical sentence is linked two or more degrees away from the sentence being translated. I've been adding as many links as I can for correct translations to try to reduce this kind of occurrence...

The other issue is that only trusted users can add more than one link to a new sentence, which could be frustrating to recently arrived polyglots. Of course, the only way around that is for more users to become trusted... That's the kind of thing that comes in due course...

{{vm.hiddenReplies[3774] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais October 19, 2010 October 19, 2010 at 10:44:42 PM UTC link Permalink

Okay, I'll add some detail.

What I mean is that if the duplicate-removal script is run often enough, it automatically fuses the links of the merged duplicates (to the best of my knowledge, or at least it should). In this sense, adding exact duplicates doesn't really do harm, and it can save significant time versus linking. It's a really wasteful solution, but this is not the environment, so we can probably be wasteful. Here's an example:

You're translating from Esperanto to English, and you see an Esperanto sentence with an indirect English translation. The sentence is really simple (say, less than ten words). To link it, it would require you either: a) leaving a comment, b) adopting and linking, c) using the linking address if you know of that trick. But all of these can take time (anywhere from 30 sec to a minute, let's say). On the other hand, if you simply add a duplicate as a direct translation, it'll take you 3 seconds, and you'll move on. Then, when the duplicate is run, it'll automatically remove your duplicate but keep the link. Overall, you've now linked by deliberately making a duplicate, and the time saved is tenfold.

I guess my overall argument is: if the duplicate-removal script is run often enough, there's no harm in having duplicates, since they just turn into links.

{{vm.hiddenReplies[3778] ? 'expand_more' : 'expand_less'}} hide replies show replies
kebukebu kebukebu October 19, 2010 October 19, 2010 at 11:22:07 PM UTC link Permalink

Well, if the script works like you say:

- combines all links for the duplicate sentences
- leaves only the oldest (lowest number) duplicate remaining
- runs once or twice a week, perhaps?

Then it seems like a practical option. Is there an older thread that discusses the script in detail? I have only heard legends of it :)

One last question: does the script retain (possibly useful) comments from deleted duplicates?

{{vm.hiddenReplies[3785] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais October 19, 2010 October 19, 2010 at 11:27:55 PM UTC link Permalink

Yea, it's mythical... but I've seen it run once. It was a pretty glorious day. The number of English sentences jumped from 155,000 to 153,000... just like that!

I think it gets brought up in threads occasionally. I don't know of specific ones.

By "possibly useful", you mean "comments"? Otherwise, I think the priority ladder goes like this:

1) with audio
2) with owner
3) age

(so audio is kept... but I guess comments are lost, though technically still there for a deleted sentence?)

kebukebu kebukebu October 19, 2010 October 19, 2010 at 10:37:52 PM UTC link Permalink

OR...

Are you referring to purple who have the capability to add links, but they don't like typing the URL formulas with the two sentence numbers over and over, so they create a duplicate so they can use the graphical links on their own sentence, then wait for the duplicate removal script to clean up after them?

In that case, the solution would be to show the graphical (chain/scissors) to trusted users for ALL sentences, whether they own the sentence or not.

{{vm.hiddenReplies[3775] ? 'expand_more' : 'expand_less'}} hide replies show replies
kebukebu kebukebu October 19, 2010 October 19, 2010 at 10:40:25 PM UTC link Permalink

* purple -> people. Silly smartphone.

sacredceltic sacredceltic October 19, 2010 October 19, 2010 at 10:46:32 PM UTC link Permalink

>so they create a duplicate so they can use the graphical links on their own sentence, then wait for the duplicate removal script to clean up after them?

I don't see a problem with this, since the point is to create translation links. Whether the deduplication script does it or not is irrelevant.
Many translations are too many hops away to lbe linked by a non-moderator, and duplicating a sentence is thus a good way to ensure that the link will be created.
It is actually even better, since there is no risk of accidentally linking sentences that shouldn't be linked.

FeuDRenais FeuDRenais October 19, 2010 October 19, 2010 at 10:50:34 PM UTC link Permalink

K, I just took the liberty of demonstrating the benefits of this on Muiriel's sentence (sorry, Muiriel):

http://tatoeba.org/eng/sentences/show/574051

Here, I had five linked translations, and she added a translation to one of them. Instead of notifying her and telling her to link to the other 4 (since she could), I created a blatant duplicate of hers and linked it myself to all of the others (this was relatively fast). Now, mine will be deleted, but her original will inherit all the links (once the script is run...).

{{vm.hiddenReplies[3780] ? 'expand_more' : 'expand_less'}} hide replies show replies
ludoviko ludoviko October 19, 2010 October 19, 2010 at 11:17:35 PM UTC link Permalink

A funny and efficient way to solve a really existing problem. Congratulations!

kebukebu kebukebu October 20, 2010 October 20, 2010 at 12:46:25 AM UTC link Permalink

I shall watch, and pray :P

Well, I'm glad we're raising users' awareness of the duplicate removal script. This is a useful and timesaving method, which forefoot doesn't cause any problems.

By the way, how does the script react when the two duplicates are directly linked, as in your example? I suppose it just unlinks them in the process of deleting...

{{vm.hiddenReplies[3789] ? 'expand_more' : 'expand_less'}} hide replies show replies
kebukebu kebukebu October 20, 2010 October 20, 2010 at 12:48:56 AM UTC link Permalink

* forefoot = hopefully? I don't even know. That's the beauty of Swype.

FeuDRenais FeuDRenais October 20, 2010 October 20, 2010 at 12:51:32 AM UTC link Permalink

No clue... I would actually like one of the bosses' opinions on this before I start creating (or advocating the creation of) blatant duplicates.

Well, in any case... even if it's not run every one or two weeks but one or two months, it's not like there's some committee that does a rigorous "Tatoeba Quality Check" on a weekly basis, right? (right?)

{{vm.hiddenReplies[3791] ? 'expand_more' : 'expand_less'}} hide replies show replies
Demetrius Demetrius October 20, 2010 October 20, 2010 at 1:00:14 AM UTC link Permalink

In fact, the users should do this =>>> You're in that comittee.

{{vm.hiddenReplies[3792] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais October 20, 2010 October 20, 2010 at 1:02:42 AM UTC link Permalink

Yea, but I often neglect my duties.

CK CK October 20, 2010, edited October 26, 2019 October 20, 2010 at 1:53:54 AM UTC, edited October 26, 2019 at 4:19:18 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[3797] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais October 20, 2010 October 20, 2010 at 1:57:20 AM UTC link Permalink

On an unrelated note, it would be nice to be able to edit our own comments/posts directly (instead of having to delete+repost).

Dorenda Dorenda October 21, 2010 October 21, 2010 at 9:31:45 PM UTC link Permalink

> I doubt if comments get merged.
I think they do.

Swift Swift October 20, 2010 October 20, 2010 at 2:51:36 AM UTC link Permalink

Thanks for bringing this up. I've been wondering about the same for a while.
For those who do want to link sentences the "old-fashioned way", I made a little form to handle the link trick: http://martin.swift.is/tatoeba/ . This way, it's just a question of copy-ing and pasting sentence numbers.

{{vm.hiddenReplies[3799] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais October 20, 2010 October 20, 2010 at 2:55:22 AM UTC link Permalink

Neat-o!

But, question. Does your site distinguish trusted/non-trusted users?

{{vm.hiddenReplies[3800] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift October 20, 2010 October 20, 2010 at 4:12:16 AM UTC link Permalink

Tatoeba takes care of that. :-)

sacredceltic sacredceltic October 20, 2010 October 20, 2010 at 9:17:36 AM UTC link Permalink

This way, one doesn't see what is being linked or updated...Very secure indeed!

{{vm.hiddenReplies[3803] ? 'expand_more' : 'expand_less'}} hide replies show replies
Demetrius Demetrius October 20, 2010 October 20, 2010 at 10:20:20 AM UTC link Permalink

Linking will still be visible on the sentence page, in the edit history.

Swift Swift October 20, 2010 October 20, 2010 at 11:02:27 AM UTC link Permalink

The form has no impact on security whatsoever.

{{vm.hiddenReplies[3805] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic October 20, 2010 October 20, 2010 at 11:11:26 AM UTC link Permalink

Yes, but as in any craft, you can't do a good job if you don't see what you're doing...
Linking 2 sentences is an ART which requires MUCH thought, with the sentences clearly laid out in front of one's eyes to make a decision.

{{vm.hiddenReplies[3806] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift October 20, 2010 October 20, 2010 at 11:55:18 AM UTC link Permalink

This looks like a PEBKAC problem, not an issue with security. Feel free to have the last word, though.
Should you want a form that collects the two sentences before you link them, send me a private message and I'll see what I can do.

{{vm.hiddenReplies[3810] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic October 20, 2010 October 20, 2010 at 12:12:59 PM UTC link Permalink

Surely, using English acronyms helps international understanding very much...
I'm sure all the non-anglophones who already struggle to follow our debates will appreciate...

Demetrius Demetrius October 20, 2010 October 20, 2010 at 12:43:47 PM UTC link Permalink

+10

{{vm.hiddenReplies[3815] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic October 20, 2010 October 20, 2010 at 12:53:27 PM UTC link Permalink

@Demetrius; What is your scale?

TRANG TRANG October 20, 2010 October 20, 2010 at 5:13:08 AM UTC link Permalink

I wouldn't encourage it. The duplicate script is not completely reliable, it's only a quick solution to deal with duplicates. One problem I see is that the script doesn't create log entries so you won't get to see in the logs that certain sentences got linked if they were linked with the script.

{{vm.hiddenReplies[3802] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais October 20, 2010 October 20, 2010 at 11:36:14 AM UTC link Permalink

Can the script be improved? :-)

{{vm.hiddenReplies[3807] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift October 20, 2010 October 20, 2010 at 12:02:19 PM UTC link Permalink

I can't remember whether this is going to be in the new Tatoeba version or if it was just on my wish-list, but I once discussed the option of displaying more distantly related sentences for the purposes of linking.

For completely unrelated sentences, some sort of interface feature might be created. Ideally, we should come up with a way that's faster and more reliable than adding a duplicate sentence. Perhaps a "link a sentence" feature where you can search for sentences to link: type in a few words, get the restuls of exact-phrase matches and select the one that you want.

Not sure if this would be worth the development effort before we have the new database to play with.

MUIRIEL MUIRIEL October 20, 2010 October 20, 2010 at 11:59:36 AM UTC link Permalink

There is another problem: there are too many examples like the following that were probably created by the duplication script:
Sentence number 127615 (http://tatoeba.org/eng/sentences/show/127615) was obviously melted with 454246 (http://tatoeba.org/eng/sentences/show/454246) once, but 455197 (http://tatoeba.org/eng/sentences/show/455197) which was linked to 127615 didn't get linked to 454246.
I saw several cases like that one and they aren't easy to find, so the dark figure will be horrible...
I don't like the duplication script.

{{vm.hiddenReplies[3811] ? 'expand_more' : 'expand_less'}} hide replies show replies
MUIRIEL MUIRIEL October 20, 2010 October 20, 2010 at 12:06:18 PM UTC link Permalink

*deduplication