Menu
Deliberately Creating Duplicates Instead of Linking (because it's actually *faster*):
Should this behavior be encouraged?
could you illustrate the issue, please?
Are you referring to people adding new sentences or translations instead of taking the time to search for similar or identical sentences in the language they are using? (I've run across countless numbers of these, especially in Esperanto and a few in Spanish, French, English, etc.)
One of the problems is that people either don't look at the indirect translations, or probably more commonly, the identical sentence is linked two or more degrees away from the sentence being translated. I've been adding as many links as I can for correct translations to try to reduce this kind of occurrence...
The other issue is that only trusted users can add more than one link to a new sentence, which could be frustrating to recently arrived polyglots. Of course, the only way around that is for more users to become trusted... That's the kind of thing that comes in due course...
Okay, I'll add some detail.
What I mean is that if the duplicate-removal script is run often enough, it automatically fuses the links of the merged duplicates (to the best of my knowledge, or at least it should). In this sense, adding exact duplicates doesn't really do harm, and it can save significant time versus linking. It's a really wasteful solution, but this is not the environment, so we can probably be wasteful. Here's an example:
You're translating from Esperanto to English, and you see an Esperanto sentence with an indirect English translation. The sentence is really simple (say, less than ten words). To link it, it would require you either: a) leaving a comment, b) adopting and linking, c) using the linking address if you know of that trick. But all of these can take time (anywhere from 30 sec to a minute, let's say). On the other hand, if you simply add a duplicate as a direct translation, it'll take you 3 seconds, and you'll move on. Then, when the duplicate is run, it'll automatically remove your duplicate but keep the link. Overall, you've now linked by deliberately making a duplicate, and the time saved is tenfold.
I guess my overall argument is: if the duplicate-removal script is run often enough, there's no harm in having duplicates, since they just turn into links.
Well, if the script works like you say:
- combines all links for the duplicate sentences
- leaves only the oldest (lowest number) duplicate remaining
- runs once or twice a week, perhaps?
Then it seems like a practical option. Is there an older thread that discusses the script in detail? I have only heard legends of it :)
One last question: does the script retain (possibly useful) comments from deleted duplicates?
Yea, it's mythical... but I've seen it run once. It was a pretty glorious day. The number of English sentences jumped from 155,000 to 153,000... just like that!
I think it gets brought up in threads occasionally. I don't know of specific ones.
By "possibly useful", you mean "comments"? Otherwise, I think the priority ladder goes like this:
1) with audio
2) with owner
3) age
(so audio is kept... but I guess comments are lost, though technically still there for a deleted sentence?)
OR...
Are you referring to purple who have the capability to add links, but they don't like typing the URL formulas with the two sentence numbers over and over, so they create a duplicate so they can use the graphical links on their own sentence, then wait for the duplicate removal script to clean up after them?
In that case, the solution would be to show the graphical (chain/scissors) to trusted users for ALL sentences, whether they own the sentence or not.
* purple -> people. Silly smartphone.
>so they create a duplicate so they can use the graphical links on their own sentence, then wait for the duplicate removal script to clean up after them?
I don't see a problem with this, since the point is to create translation links. Whether the deduplication script does it or not is irrelevant.
Many translations are too many hops away to lbe linked by a non-moderator, and duplicating a sentence is thus a good way to ensure that the link will be created.
It is actually even better, since there is no risk of accidentally linking sentences that shouldn't be linked.
K, I just took the liberty of demonstrating the benefits of this on Muiriel's sentence (sorry, Muiriel):
http://tatoeba.org/eng/sentences/show/574051
Here, I had five linked translations, and she added a translation to one of them. Instead of notifying her and telling her to link to the other 4 (since she could), I created a blatant duplicate of hers and linked it myself to all of the others (this was relatively fast). Now, mine will be deleted, but her original will inherit all the links (once the script is run...).
A funny and efficient way to solve a really existing problem. Congratulations!
I shall watch, and pray :P
Well, I'm glad we're raising users' awareness of the duplicate removal script. This is a useful and timesaving method, which forefoot doesn't cause any problems.
By the way, how does the script react when the two duplicates are directly linked, as in your example? I suppose it just unlinks them in the process of deleting...
* forefoot = hopefully? I don't even know. That's the beauty of Swype.
No clue... I would actually like one of the bosses' opinions on this before I start creating (or advocating the creation of) blatant duplicates.
Well, in any case... even if it's not run every one or two weeks but one or two months, it's not like there's some committee that does a rigorous "Tatoeba Quality Check" on a weekly basis, right? (right?)
In fact, the users should do this =>>> You're in that comittee.
Yea, but I often neglect my duties.
[not needed anymore- removed by CK]
On an unrelated note, it would be nice to be able to edit our own comments/posts directly (instead of having to delete+repost).
> I doubt if comments get merged.
I think they do.
Thanks for bringing this up. I've been wondering about the same for a while.
For those who do want to link sentences the "old-fashioned way", I made a little form to handle the link trick: http://martin.swift.is/tatoeba/ . This way, it's just a question of copy-ing and pasting sentence numbers.
Neat-o!
But, question. Does your site distinguish trusted/non-trusted users?
Tatoeba takes care of that. :-)
This way, one doesn't see what is being linked or updated...Very secure indeed!
Linking will still be visible on the sentence page, in the edit history.
The form has no impact on security whatsoever.
Yes, but as in any craft, you can't do a good job if you don't see what you're doing...
Linking 2 sentences is an ART which requires MUCH thought, with the sentences clearly laid out in front of one's eyes to make a decision.
This looks like a PEBKAC problem, not an issue with security. Feel free to have the last word, though.
Should you want a form that collects the two sentences before you link them, send me a private message and I'll see what I can do.
Surely, using English acronyms helps international understanding very much...
I'm sure all the non-anglophones who already struggle to follow our debates will appreciate...
+10
@Demetrius; What is your scale?
I wouldn't encourage it. The duplicate script is not completely reliable, it's only a quick solution to deal with duplicates. One problem I see is that the script doesn't create log entries so you won't get to see in the logs that certain sentences got linked if they were linked with the script.
Can the script be improved? :-)
I can't remember whether this is going to be in the new Tatoeba version or if it was just on my wish-list, but I once discussed the option of displaying more distantly related sentences for the purposes of linking.
For completely unrelated sentences, some sort of interface feature might be created. Ideally, we should come up with a way that's faster and more reliable than adding a duplicate sentence. Perhaps a "link a sentence" feature where you can search for sentences to link: type in a few words, get the restuls of exact-phrase matches and select the one that you want.
Not sure if this would be worth the development effort before we have the new database to play with.
There is another problem: there are too many examples like the following that were probably created by the duplication script:
Sentence number 127615 (http://tatoeba.org/eng/sentences/show/127615) was obviously melted with 454246 (http://tatoeba.org/eng/sentences/show/454246) once, but 455197 (http://tatoeba.org/eng/sentences/show/455197) which was linked to 127615 didn't get linked to 454246.
I saw several cases like that one and they aren't easy to find, so the dark figure will be horrible...
I don't like the duplication script.
*deduplication