Menu
[not needed anymore- removed by CK]
Has any progress been made on the development of a new duplicate merging script?
I just sent saeb a note to ask.
The former merging script worked wonders. Why on Earth was it ever changed ?!
[not needed anymore- removed by CK]
Data is lost when Corpus maintainers manually merge sentences while forgetting to merge comments, tags and lists attachments.
and update histories, without which comments don't make sense...
update history must be editable at least
what do you mean ? History shouldn't be changed...
This is true indeed. We should check if someone has not put the sentence into a list if we manually delete a duplicate.
Also, I think that merging with a script would be preferrable to make sure that any effort made by any user to add a sentence, even if it's a duplicate, will be credited in the sentences logs. if two different users add the same sentence, why should we credit only one user for having added it and completely ignore the other one?
Tal vez se podrian añadir el autor y los comentarios al final de los comentarios de la frase no borrada.
Aunque que criterio se usa para decidir cual de las dos frases se borra y cual se mantiene?
wallebot wanted to know what criteria would be used to decide which sentences are deleted and which are kept. A number of developers had a brainstorming session where we collected requirements for the new script. You can see it on the wiki: http://en.wiki.tatoeba.org/arti...icate-merging# The discussion was written for developers rather than users, but you still may find it useful. Throughout the discussion, you can see some mention of the criteria. For instance, the new script should save a sentence with audio preferentially over a sentence without. The old script did not do this, which meant that audio was lost when the script was run.
I asked saeb when he expected to have the script ready. He told me on Saturday, when we were chatting on IRC, that he thought he could submit it that day, along with some tests. However, apparently it didn't happen. I'll ask him again.
Muchas gracias.
Una frase puede tener mas de un audio?
Seria util que las frases pudiesen tener mas de un audio, para oir diferentes acentos por ejemplo.
ENGLISH
Well, it would be certainly advantageous to hear the examples pronounced both by a male and a female voice. Also the pronunciation in different regions would be interesting. But I think this is not the most urgent matter, as at the time being an overwhelmimg majority of the sentences don't have audio at all.
ESPERANTO
Nu, estus certe avantaĝe aŭdi la ekzemplojn prononcitajn kaj de vira kaj de virina voĉo. Ankaŭ la prononco en diversaj regionoj estus interesa. Sed mi kredas, ke tio ne estas la plej urĝa afero, ĉar nuntempe la grandega plimulto de la frazoj tute ne havas sonon.
DEUTSCH
Nun, es wäre sicherlich von Vorteil, könnte man die Beispiele sowohl von einer männlichen als auch von einer weiblichen Stimme gesprochen hören. Auch die Aussprache in verschiedenen Regionen wäre interessant. Aber ich denke, das ist nicht die dringendste Angelegenheit, da zur Zeit die übergroße Mehrheit der Sätze gänzlich ohne Vertonung ist.
Pienso que no es prioritario cambiar el sofware para que acepte varios audios de frases. Pero pienso que deberia haber un plan B, si hay dos audio para una sola frase.
Pienso que incluir un comentario con un enlace a los otros audios seria ya suficiente, Si en el Futuro se ve necesario añadir varios audios, se pueden recuperar de los comentarios.
In chinese writed, a sentence can be speeaked in more than one languaje, by example mandarin and cantonesse.
Even if they are pronounced radically different to each other, my understanding is that the various Chinese dialects use essentially the same form of written communication. There is sometimes a distinction between traditional hanzi and the simplified ones now used in modern China, but as I understand it, the two styles of hanzi are not specifically associated with individual dialects like Mandarin, Cantonese, Xiang, Min, Wu, etc.
[not needed anymore- removed by CK]
Duplicates can be introduced in different ways :
1) because the same sentence is being simultaneously translated in the same target language, by different people who are unaware of it.
This is particularly a side-effect of showing last sentences on the front page => every newcomer jumps on what is being shown and translates it...
Maybe we should not show latest contributions on the front page but on a different, dedicated page...
But this might also happen because several people translate simultaneously from the same list, and I can't see how to avoid this
2) because different sentences, in the same or different languages, end up being translated into the same target sentence.
That is unavoidable, especially since the two source sentences that are being translated are not directly linked.
When I translate "I show her" and then "I show him" into French, I have to create a duplicate, because these 2 sentences are initially unlinked, but they have the same translation in French : « Je lui montre ». I can't see how to avoid this.
All the procedures I have seen so far, based on javascript gimmicks are awkward (you need to copy sentences numbers...) and inconvenient and they don't work on all devices.
I want to be able to translate what I want from any device, at any time.
One way I could see would be to be able to link my latest translations to a sentence from a list. But that would work only if the 2 sentences are close to each other in the list I'm translating...
Does Tatoeba tell if a sentence has already been added? If not, it would be a good idea to develop a system to do so. :)
The search don't found recently added sentences. Maybe works for older sentences, but I think is a heavy work for server.