menu
Tatoeba
language
Registrarse Identificarse
language Español
menu
Tatoeba

chevron_right Registrarse

chevron_right Identificarse

Navegar

chevron_right Mostrar oración aleatoriamente

chevron_right Navegar por idioma

chevron_right Navegar por lista

chevron_right Navegar por etiqueta

chevron_right Navegar por oraciones con voz

Comunidad

chevron_right Muro

chevron_right Lista de todos los miembros

chevron_right Idiomas de los miembros

chevron_right Hablantes nativos

search
clear
swap_horiz
search
Guybrush88 Guybrush88 14 de abril de 2014 14 de abril de 2014, 11:04:09 UTC link Enlace permanente

ABOUT MANUAL DELETION OF EXACT DUPLICATES

This sentence i added: http://tatoeba.org/ita/sentences/show/3173766
got manually deleted by another CM just because it was an exact duplicate of another existing sentence. This should be stopped, since there's already a script that automatically does that, even if it's not used very often. Please stop deleting manually exact duplicates, since it's already an automated job that doesn't require the imposition from other users to not add certain sentences just because the script isn't frequently used

{{vm.hiddenReplies[19042] ? 'expand_more' : 'expand_less'}} ocultar respuestas mostrar respuestas
Selena777 Selena777 14 de abril de 2014 14 de abril de 2014, 13:32:43 UTC link Enlace permanente

Why do you care, if another user delete it or automatical script delete it? What's the difference for you?

{{vm.hiddenReplies[19043] ? 'expand_more' : 'expand_less'}} ocultar respuestas mostrar respuestas
AlanF_US AlanF_US 14 de abril de 2014 14 de abril de 2014, 15:04:49 UTC link Enlace permanente

The difference between manual and automatic duplicate merging is that an automatic procedure, if written correctly, can preserve the comments and history of all instances of the sentence. It can also preserve audio, if it exists, for one of the sentences. (If and when we allow multiple audio files for a single sentence in the future, it will preserve all the audio.)

The problem is that we don't have a good automatic duplicate merging mechanism right now. The script that was used in the past was run infrequently (perhaps only several times a year). In part, this was because it was a very slow procedure with many restrictions on when it could be run. It also lost audio links. We're in the process of writing new code. I consider this our second highest near-term priority, after restoring access to the wiki and our help/FAQ pages.

In the meantime, there are two "camps": those who believe that contributors should look for existing sentences before adding possible duplicates, and those who believe that this is too burdensome and thus should be a task left to the software, whenever it is ready. I'm not going to comment on the arguments made on both sides, other than to say they both have some validity in my eyes.

However, there is much that can be done to maintain peace in the time before the duplicate merging code is written. It's a good idea for everyone to look for possible duplicates (especially when contributing sentences consisting of common words). On the other hand, I don't see much use in corpus maintainers deleting duplicate sentences. When we have such a high number of duplicate sentences (~25,000 for English is the figure I've seen), any manual deletion will have no real impact. Nor does it seem to prevent people from contributing duplicate sentences. It does, however, succeed in annoying people and leading to arguments, both of which are counterproductive.

Thus I urge everyone to be patient until the duplicate merging code is complete and in use, and to refrain from either contributing duplicate sentences or from deleting the ones that exist.

Guybrush88 Guybrush88 14 de abril de 2014 14 de abril de 2014, 15:21:41 UTC link Enlace permanente

>Why do you care, if another user delete it or automatical script delete it? What's the difference for you?

The difference is that if native speakers only search for exact duplicates just to delete them, they won't have the time to find mistakes in sentences and post corrections in comments, which, in my opinion, is a more urgent thing to do, rather than removing manually duplicates

{{vm.hiddenReplies[19047] ? 'expand_more' : 'expand_less'}} ocultar respuestas mostrar respuestas
Selena777 Selena777 14 de abril de 2014 14 de abril de 2014, 17:23:40 UTC link Enlace permanente

I agree with you, but every user have a right to deside, what he/she likes better to do. If someone don't like to post corrections in comments, we can't force him, can we?

I also agree, that quality management system on Tatoeba.org can be improved. I think, every sentence, regardless, if it was created by a native speaker or non native speaker, should be checked, corrected if it's nesessary and tagged OK. Also, I think, it would be better, if we can see, who tagged it. So, if we trust this user, we can trust all the sentences, he/she tagged, regardless, who created those sentences.

Also, I suggest to create a list of users, who are willing to check sentences, in order those users, who are not sure about the quality of their contribution, can ask them.

{{vm.hiddenReplies[19053] ? 'expand_more' : 'expand_less'}} ocultar respuestas mostrar respuestas
Gulo_Luscus Gulo_Luscus 14 de abril de 2014 14 de abril de 2014, 19:56:30 UTC link Enlace permanente

@Selena777

You can see the user who added a tag. When you move your mouse over the tag, you see ''user: 1234, date: ...''
You just need to add those numbers at the end of this link:
http://tatoeba.org/eng/users/show/

(http://tatoeba.org/eng/users/show/1234)

For example:
http://tatoeba.org/eng/sentences/show/3131311
The OK tag says ''user: 52497''

http://tatoeba.org/eng/users/show/52497 and that's me.

{{vm.hiddenReplies[19054] ? 'expand_more' : 'expand_less'}} ocultar respuestas mostrar respuestas
Selena777 Selena777 14 de abril de 2014, modificado 15 de abril de 2014 14 de abril de 2014, 20:04:13 UTC, modificado 15 de abril de 2014, 5:44:37 UTC link Enlace permanente

Thanks.
Though, this way is rather tricky...
Also, I think, it would be good, if more, than one user can tag the same sentence.

CK CK 14 de abril de 2014, modificado 30 de octubre de 2019 14 de abril de 2014, 21:25:44 UTC, modificado 30 de octubre de 2019, 7:22:28 UTC link Enlace permanente

[not needed anymore- removed by CK]

{{vm.hiddenReplies[19056] ? 'expand_more' : 'expand_less'}} ocultar respuestas mostrar respuestas
Guybrush88 Guybrush88 14 de abril de 2014, modificado 14 de abril de 2014 14 de abril de 2014, 21:32:23 UTC, modificado 14 de abril de 2014, 21:33:25 UTC link Enlace permanente

but then, we could even miss new translations by doing that. plus, users might actually learn something with those sentences, so it's far from being a waste of time if they translate them, even if they produce duplicates

{{vm.hiddenReplies[19057] ? 'expand_more' : 'expand_less'}} ocultar respuestas mostrar respuestas
CK CK 14 de abril de 2014, modificado 30 de octubre de 2019 14 de abril de 2014, 21:33:31 UTC, modificado 30 de octubre de 2019, 7:22:40 UTC link Enlace permanente

[not needed anymore- removed by CK]

{{vm.hiddenReplies[19058] ? 'expand_more' : 'expand_less'}} ocultar respuestas mostrar respuestas
Guybrush88 Guybrush88 14 de abril de 2014 14 de abril de 2014, 21:39:50 UTC link Enlace permanente

well, i don't think it's such a problem if users sometimes add some duplicates by chance, so i don't see the need to rush to delete them because the deduplication script isn't used that often. they'll eventually be merged, so if any user decides to add new translations to them, they'll be added to just one sentence after the merging