menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Wall (6,960 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages subdirectory_arrow_right

morbrorper

2 hours ago

subdirectory_arrow_right

marafon

5 days ago

feedback

CK

5 days ago

feedback

sharptoothed

10 days ago

subdirectory_arrow_right

Cangarejo

10 days ago

subdirectory_arrow_right

Cangarejo

14 days ago

subdirectory_arrow_right

Thanuir

14 days ago

subdirectory_arrow_right

ondo

14 days ago

subdirectory_arrow_right

ddnktr

15 days ago

feedback

ondo

15 days ago

sysko sysko April 30, 2011 April 30, 2011 at 9:26:27 PM UTC link Permalink

I'm going to make some maintenance operations on the database, so tatoeba will be down during some minutes, it shouldn't be long.

{{vm.hiddenReplies[5941] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko April 30, 2011 April 30, 2011 at 10:24:05 PM UTC link Permalink

if the number of sentences is decreasing for the following hours it's normal , the duplicates are being removed

{{vm.hiddenReplies[5942] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift May 1, 2011 May 1, 2011 at 11:17:39 AM UTC link Permalink

Awesome, indeed. Noticed a pretty deep dip. Do you know how many duplicates it merged?

{{vm.hiddenReplies[5949] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko May 1, 2011 May 1, 2011 at 11:35:35 AM UTC link Permalink

around 13 000.

{{vm.hiddenReplies[5950] ? 'expand_more' : 'expand_less'}} hide replies show replies
ludoviko ludoviko May 1, 2011 May 1, 2011 at 11:47:44 PM UTC link Permalink

Wonderful, just very wonderful :-(((

You, sysko, do spend your time to take away 13 000 duplicates - but it seems that up to now you did not attack the cause of all these 13 000 duplicates. You wrote to me, some time ago, you wanted to make a program so that all translations linked via other languages would be linked - but I still can't see that. I have a look every week or so.

It is a pity that people translated 13 000 sentences - just to see them thrown away.

I am waiting. I would like to promote Tatoeba. But I won't write anything good about Tatoeba, when people translating sentences in Tatoeba will see them thrown away afterwards.

{{vm.hiddenReplies[5957] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko May 2, 2011 May 2, 2011 at 12:01:53 AM UTC link Permalink

writing that new script actually did take me 20 minutes of my own personnal time, moreover the duplicates were already here, so handling it didn't delay the new version.

the main reason of the delay is that my personal life is __really__ busy, I'm really sorry to not be able to work as much as would like on tatoeba, but I do have job obligations too :(

{{vm.hiddenReplies[5959] ? 'expand_more' : 'expand_less'}} hide replies show replies
ludoviko ludoviko May 2, 2011 May 2, 2011 at 11:16:51 AM UTC link Permalink

A gigantic waste of time

If it is two minutes per translation, then it's 26 000 minutes for 13 000 translations, or 433 hours of work of the Tatoeba contributors thrown away, more than two month's work.

How about thinking about a rapid solution of the problem?

{{vm.hiddenReplies[5992] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG May 4, 2011 May 4, 2011 at 1:39:50 AM UTC link Permalink

> A gigantic waste of time

I would like to suggest another point of view about this :)

Imagine we would do this instead: whenever a user adds a translation or new sentence, we check whether it already exists or not, and if it exists, instead of adding it, we just add the necessary links. Would you feel this is as much of a waste of time?
I'm going to guess not so much (at least I would feel that way). But well, in both case, people still have to spend time typing their translations... The duplicate removal script leads to the same kind of result as what I described, except it does it with some delay.

You also have to take into account that many (...most? ...all?) members in Tatoeba translate for practice and/or for learning. I can't speak in the name of everyone, but I can say at least that as far as I'm concerned, I wouldn't feel I've wasted my time if 10% or more of my contributions happened to be duplicates and were "merged". Because when I contribute... or rather contributed, I also got to practice and to learn a lot, and it was much less boring than doing typical translation homework for my language classes.
Of course, if others don't feel that way, I can perfectly understand, and they are free to stop contributing (they even should), until we release the new version. I know it's been a while since we've talked about it, but it's a pretty big task and it's not surprising that it's taking that long.


Now, regarding your question of a quick solution. Well, the duplicate removal script was the quick solution.

Another "meantime" but less quick solution is to get more people link sentences... but I think people have much more fun translating than linking.

Someone could also try making lists of "sentences that could potentially be linked but are not linked". All the information needed here to make such a list is here:
http://tatoeba.org/eng/download...mple_sentences
And from there we could probably do something to display the lists in some way and get people to create links more quickly.

And we still need a feature that allows a multi-links translation, so that those speaking more than 2 languages can add a translations that links to more than one language.


Finally, regarding "How could we help you with your personal life?", we simply (or not so simply) need to get more people involved in the project as a whole. As Swift mentioned it, I started writing about the matter this weekend: http://blog.tatoeba.org/2011/04...s-to-help.html
And I will be writing more...


Anyway, I hope that gives you a better picture :)

{{vm.hiddenReplies[6036] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais May 4, 2011 May 4, 2011 at 1:49:40 AM UTC link Permalink

Out of curiosity, how much time would it take to do something as brute as run a similar_text() comparison between a new translation and all the existing sentences already in the database? Really, really long, I'm guessing?

{{vm.hiddenReplies[6038] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko May 4, 2011 May 4, 2011 at 4:28:00 PM UTC link Permalink

I did it for you, the raw request (so even without the treating of it)

select 1 from sentences where text = "I love you."
1 row in set (1,24 sec)

so with 2000 contribution, you've lost 1 hours a day only for pure checking, moreover it does not solve the problem that you still need to type the sentence first before knowing the sentence already exist.

{{vm.hiddenReplies[6052] ? 'expand_more' : 'expand_less'}} hide replies show replies
Zifre Zifre May 4, 2011 May 4, 2011 at 7:46:35 PM UTC link Permalink

Could you add a column to the table with a MD5 hash? It would take a long time to generate for all existing sentences but it would greatly speed up searching for duplicates.

{{vm.hiddenReplies[6056] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko May 4, 2011 May 4, 2011 at 8:06:03 PM UTC link Permalink

yup we can, actually md4 will be enough as we're not looking for secure hash, it's just i decided to use the few time I have to develop the new version, it's just i didn't realize few moths ago that there as an inherent tricky bug in mysql when the previous script was run with high load and concurency.

Now with this script that will be run at least once a week and that does not require modification in the current php code, i think the result will more or less the same than doing the check in real time with a hash (as I've said to FeuDrenais this kind of solution still require the user to type and validate the sentence)

but if Trang or someone wants to code it, sure.

FeuDRenais FeuDRenais May 4, 2011 May 4, 2011 at 6:39:00 PM UTC link Permalink

That's good speed though. If you partitioned by language and string length it probably would go fast enough not to bother a regular user too much.

But I agree that it doesn't really make sense if the duplicate script is going to do the job anyway.

{{vm.hiddenReplies[6053] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko May 4, 2011 May 4, 2011 at 6:43:22 PM UTC link Permalink

the problem is that does not affect you but everyone. and on a server used more than 100% of it's capacity it means one hour less of doing other stuff.

{{vm.hiddenReplies[6054] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko May 4, 2011 May 4, 2011 at 6:45:45 PM UTC link Permalink

because it does not mean "me and only me is going to wait 1.24 second nore" but "during 1.24s the server will do nothing except looking for my sentence"

FeuDRenais FeuDRenais May 4, 2011 May 4, 2011 at 1:13:59 AM UTC link Permalink

> A gigantic waste of time

Are you joking, or are you seriously critizing sysko? Because I don't think this is a fair argument to criticize by.

You could also say things like "if everyone gave a dollar a day to such and such charity, starvation in such and such country would not be a problem". And it'd be true (maybe), and you could make similar arguments for lots of situations. But sysko isn't a machine, and I don't think he thought of optimizing Tatoeba to within 1% global accuracy when he coded it.

In short: Thanks for all the work, sysko. I don't think 13,000 duplicates is as catastrophic as some people are making them out to be. We waste many more minutes each day on far less productive things. "Work thrown away"? Not at all.

{{vm.hiddenReplies[6032] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic May 4, 2011 May 4, 2011 at 1:33:11 AM UTC link Permalink

Je pense également que si nous disposons, comme c'est de nouveau le cas, d'une bonne procédure de dédoublonnage, alors la création de doublons n'est pas si grave et elle crée des liens nouveaux. Je crée moi-même des doublons à cet effet.
Si des contributeurs se désolent que leurs phrases disparaissent, il est de leur responsabilité de contrôler que leurs phrases n'existent pas déjà dans la base avant de les insérer. Le guide du contributeur l'indique clairement.
Si par la suite, une nouvelle version permet de détecter un doublon dès sa création, ce sera un plus, mais ça n'empêchera pas le créateur d'un doublon de l'avoir tapé pour rien de toutes manières, puisqu'il faut qu'il existe pour être détecté...Donc le temps de création serait de toutes manières perdu.

{{vm.hiddenReplies[6034] ? 'expand_more' : 'expand_less'}} hide replies show replies
ludoviko ludoviko May 4, 2011 May 4, 2011 at 2:05:20 PM UTC link Permalink

La procédure de regarder, si il y a déjà une phrase traduite dans une certaine langue, est plutôt difficile. Je l'ai fait de temps en temps - mais cela dure et c'est compliqué. Est-ce que sacredceltic l'a déjà fait quelques fois pour certaines phrases?

Pour moi la solution est simple : Je ne traduis pas pour le moment. Et je n'envoie plus de messages sur Tatoeba dans les listes en espéranto.

Il s'agit d'ailleurs de beaucoup de phrases en espéranto. Je me souviens d'une liste de plus de 3000 doublons en espéranto. Plus probablement une grande partie des 13 000 phrases jetées il y a quelques jours. La proportion de FeuDRenais devrait considérer les langues. Quelle est le nombre de phrases en espéranto jetées jusqu'à présent?

{{vm.hiddenReplies[6049] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais May 4, 2011 May 4, 2011 at 3:00:54 PM UTC link Permalink

> La proportion de FeuDRenais devrait considérer les langues.

"Toutes les langues sont égales sur Tatoeba."

sacredceltic sacredceltic May 4, 2011 May 4, 2011 at 3:12:11 PM UTC link Permalink

>Est-ce que sacredceltic l'a déjà fait quelques fois pour certaines phrases?

Je ne suis pas accrédité pour lier des traductions, mais lorsque j'avais cette possibilité, je le mettais à profit très largement entre l'anglais, le français, l'allemand, l'espagnol et...l'espéranto!
Maintenant, je crée souvent des doublons car c'est mon seul moyen de lier des phrases pour éviter qu'elles ne se retrouvent dans mes listes de phrases non traduites, ce qui m'énerve par-dessus tout...
Je préfère avoir des doublons qui seront fusionnés par la suite que de retrouver éternellement les mêmes phrases stupides dans mes listes à traduire...
D'ailleurs, pour faire un doublon, il suffit de faire copier-coller...ça ne me prend pas beaucoup plus de temps, c'est juste un peu plus bizarre...

FeuDRenais FeuDRenais May 4, 2011 May 4, 2011 at 1:50:50 AM UTC link Permalink

+1

FeuDRenais FeuDRenais May 4, 2011 May 4, 2011 at 1:29:34 AM UTC link Permalink

Also, (840000-13000)/840000 = 98.5% efficiency. I don't know what everyone's standards are, but that's pretty damn good, IMO.

(or is the total number of duplicates over all time >> 13000?)

{{vm.hiddenReplies[6033] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic May 4, 2011 May 4, 2011 at 1:35:33 AM UTC link Permalink

>(or is the total number of duplicates over all time >> 13000?)

No, it's just the number since the last time a deduplication procedure was run, a few months ago...

{{vm.hiddenReplies[6035] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais May 4, 2011 May 4, 2011 at 1:45:48 AM UTC link Permalink

I wonder what the worst case number is.

Let's just go nuts, and say that it's, I dunno, 80,000. So, roughly 10% of your contributions are duplicates.

I would say that *even then*, if you told me that 1 out of every 10 of my translations was a link instead of a brand new thing, I wouldn't throw a fit and start complaining about mass inefficiency. It's really not that big of a deal...

ludoviko ludoviko May 2, 2011 May 2, 2011 at 10:49:37 AM UTC link Permalink

So the question would be: How could we help you with your personal life? :-) Or with your job? Or, more realistic: It's now at least six months that I began to write messages on the wall and to you and to Trang about the problem. If you don't have time to solve that - is there no other solution? Someone else who could do it?

To constantly throw away user's contributions is just not very kind to them. Do you and Trang really want to continue that periodically? Do you want me to put a message on the wall about the subject every week or so? What solution do you have besides waiting for Godot?

Swift Swift May 2, 2011 May 2, 2011 at 12:45:05 PM UTC link Permalink

Well, there are two issues here: The scope of the problem and the solutions to it.

Regarding the scope, I don't think these 13 000 translations took an average of a minute to translate. Generally the duplicates are the sort of simple sentences that are likely to be added twice to the corpus. Something like "I'm waiting for Godot" rather than the longer sentences.

From the logs, I reckon it's closer to a few seconds. Still a considerable amount of time and I don't think anyone is terribly happy with the situation.

At the same time, as Muiriel and I have pointed out, should the same sentence be entered twice as A and A' and then subsequently translated to different languages B and C, then merging A and A' will leave a single sentence A, with links to both B and C where nothing is lost.

It's also possible that B and C are entered in different languages and each of these is translated into the same language with sentences A and A'. When A and A' are merged, the end result is again a sentence A with links to B and C, indirectly linking the two latter. The inefficiency only arises if it takes longer to translate a sentence than to find, verify and link two sentences that don't share a translation. I think that's rather uncommon.

The final case is when sentence A is translated into B, which is translated into C and then back into the original language with sentence A'. Again here, there is value in that last translation as it is effectively provides a link between C and A. With the current Tatoeba system, I think making a quick duplicate is actually the simplest way to link such sentences (even with Zifre's awesome Greasemonkey script).

So, the problem isn't so great, in my mind, but it would still be nice to do something about it. The duplicate script currently takes a long time to run and slowed the server down considerably yesterday.

As Trang mentioned in her post from yesterday: http://blog.tatoeba.org/2011/04...s-to-help.html , there's still a while until collaborative work on the new version can begin. At the same time, anyone is free to take part in maintaining the current version or developing new features for it. One idea would be to create a "safe" contribution feature that would first check for related translations. This would perfectly fit translators who spend more time on translations.

But as Trang noted, there are loads of things related to Tatoeba that people can do, and the more people actively join the various "departments" the more pressure there will be on sysko to use his days off to code in some dingy little office than prance around the streets or hills in the fresh spring air. ;-)

Ough... I think I was going to conclude this by pulling these threads together into some nice conclusion, but this has been way to long and I need to get to my work so that I can enjoy some of the Icelandic spring (did I mention we had snow yesterday...?).

Swift Swift May 1, 2011 May 1, 2011 at 1:49:14 PM UTC link Permalink

Great. Thanks for taking care of this!

{{vm.hiddenReplies[5951] ? 'expand_more' : 'expand_less'}} hide replies show replies
xtofu80 xtofu80 May 1, 2011 May 1, 2011 at 7:28:32 PM UTC link Permalink

Currently Tatoeba is awfully slow. Is this due to the duplication removal operation, or is there another issue?

{{vm.hiddenReplies[5953] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko May 1, 2011 May 1, 2011 at 8:04:11 PM UTC link Permalink

yup the script is currently updating the logs, sorry for the inconvenience, next time it should be a lot faster, as here the sentences hasn't been deduplicated for months

Zifre Zifre April 30, 2011 April 30, 2011 at 10:53:02 PM UTC link Permalink

Awesome!

How does this handle comments, links, etc.? Will links made by the system show up in the logs as if they were made by whoever made the link on the duplicate sentence that is being removed?

{{vm.hiddenReplies[5943] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko April 30, 2011 April 30, 2011 at 11:11:21 PM UTC link Permalink

at the end (when i will post the message saying it's finished:P) to make it simple everything will be moved to the sentence that is kept

so if you post a comment on a duplicate it will be moved to, same for link, and the history at the right it will be shown as if all the link were made to only this sentence

{{vm.hiddenReplies[5944] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais May 1, 2011 May 1, 2011 at 3:31:08 AM UTC link Permalink

bug: the kept sentence appears twice in the link list

{{vm.hiddenReplies[5946] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko May 1, 2011 May 1, 2011 at 6:09:02 AM UTC link Permalink

not a bug the script is not finished yet

{{vm.hiddenReplies[5947] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko May 1, 2011 May 1, 2011 at 6:13:59 AM UTC link Permalink

we will regenerate the link list/sentence list just after :)

{{vm.hiddenReplies[5948] ? 'expand_more' : 'expand_less'}} hide replies show replies
U2FS U2FS May 2, 2011 May 2, 2011 at 12:51:22 PM UTC link Permalink

awesome job so far. just wondering, is there a way one could know which duplicates sentences he/she owned ?

arcticmonkey arcticmonkey April 30, 2011 April 30, 2011 at 3:04:24 PM UTC link Permalink

I just got a spam message from this user:

http://tatoeba.org/eng/user/profile/angel

{{vm.hiddenReplies[5937] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pharamp Pharamp April 30, 2011 April 30, 2011 at 3:07:00 PM UTC link Permalink

Thanks for reporting it on the Wall. I also received it, and already told it to Trang :)

Swift Swift April 28, 2011 April 28, 2011 at 8:24:20 PM UTC link Permalink

** A quick note on derivative works **

The Creative Commons licenses[1] currently popular with the kids these days all have an attribution clause which I understand we cannot honour. In the case of content from Wikipedia, one could link to the article and author list to satisfy the attribution clause[2] in the comments to the sentence in question, as well as each translation of it.

That would, however, only work for the corpus as it's accessible through the tatoeba.org interface; it wouldn't allow us to distribute these sentences (including the translations) in the downloadable CSV files in its current form.

Now, this isn't really such a big deal -- in particular if compared with the mountain of pain tracking all the different licenses across the database. It just means that we have to come up with our own original sentences. We do that every day, so it shouldn't be too difficult. If it is, one can always use one's time translate existing sentences.

Bottom line: Don't add anything with any sort of license. Just make your won sentences or translate others.

Now, most people reading this are probably well aware of the issue, but everyone should take a little bit of care when they come across a sentence which looks like it may be from a copyrighted source. It's a lot better if it gets caught soon, before anyone wastes time on translations that we're going to have to delete in the end.

For more on licensing issues, see Trang's blog post from January this year: http://blog.tatoeba.org/2011/01...d-content.html

[1] http://creativecommons.org/licenses/
[2] http://wikimediafoundation.org/wiki/Terms_of_Use

{{vm.hiddenReplies[5927] ? 'expand_more' : 'expand_less'}} hide replies show replies
Shishir Shishir April 28, 2011 April 28, 2011 at 8:30:04 PM UTC link Permalink

Swift, I've translated it into Spanish, because I think this is an important issue :)

Traducción en español del comentario de Swift:

La licencia Creative Commons [1] que es tan popular entre los niños de hoy en día tiene una cláusula de atribución que a mi entender no podemos cumplir. En el caso de contenido de la Wikipedia, se podría indicar la dirección del artículo y el autor para satisfacer dicha cláusula [2] dejando un comentario en la frase en cuestión, al igual que en cada traducción de dicha frase.

Sin embargo, solo funcionaría en el corpus siempre que se acceda a él por medio del interfaz de tatoeba.org; no permitiría distribuir estas oraciones (incluyendo sus traducciones) en los archivos CSV descargables en su forma actual.

En este momento no es un gran problema – espacialmente si lo comparamos con el problema que sería tener que rastrear todas las diferentes licencias por toda la base de datos. Sólo significa que tenemos que crear nuestras propias frases originales. Lo hacemos cada día, de modo que no debería ser muy difícil. Si lo es, siempre se puede emplear el tiempo traduciendo frases que ya existan.

Por ultimo: No añadáis nada que tenga alguna clase de licencia. Sólo cread vuestras propias frases o traducid otras que ya formen parte del corpus.

Ahora, la mayor parte de la gente que haya leído esto serán conscientes de este problema, pero todo el mundo debería tener un poco de cuidado cuando se crucen con una frase que parezca que pueda provenir de una fuente con derechos de autor (copyright). Es mejor que se encuentre cuanto antes, antes de que alguien pierda el tiempo haciendo traducciones que vamos a tener que acabar eliminando.

Para más información acerca de temas de licencia, véase la entrada de enero de este año en el blog de Trang:

http://blog.tatoeba.org/2011/01...d-content.html

Scott Scott April 29, 2011 April 29, 2011 at 4:45:09 PM UTC link Permalink

Tatoeba could have a special way of inserting CC sentences with an attribution clause by including a field where one would enter the attribution.

I don't know if the Tatoeba admins are interested in doing this, but it could be possible.

kooler kooler April 29, 2011 April 29, 2011 at 11:50:32 AM UTC link Permalink

I can not tag my sentence. No input field in the "Tag" block. Is it because I'm a "user", and no right to do so?

{{vm.hiddenReplies[5929] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift April 29, 2011 April 29, 2011 at 11:58:49 AM UTC link Permalink

Yes, you have to be a so-called "trusted user". See this FAQ entry: http://tatoeba.org/eng/faq#add-tag

{{vm.hiddenReplies[5930] ? 'expand_more' : 'expand_less'}} hide replies show replies
kooler kooler April 29, 2011 April 29, 2011 at 12:00:12 PM UTC link Permalink

Thank you!!

Daga007 Daga007 April 28, 2011 April 28, 2011 at 3:31:04 PM UTC link Permalink

Hola, ya hay oraciones en "asturianu" subidas por Duernu http://tatoeba.org/spa/sentences/of_user/Duernu

Si pudiese añadirse el "asturianu" a la lista de idiomas disponibles en Tatoeba... :)
http://upload.wikimedia.org/wik...turias.svg.png

Thanks in advance

{{vm.hiddenReplies[5919] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pharamp Pharamp April 28, 2011 April 28, 2011 at 4:05:40 PM UTC link Permalink

http://tatoeba.org/eng/wall/sho...3#message_5873
http://tatoeba.org/eng/faq#new-language

Por favor, sigue los pasos ilustrados :)
Ya he hecho la bandera cuando Duerno la pidió.

De todos modos, puedes seguir añadiendo frases en asturiano si las pones en una lista (puedes crear una aquí http://tatoeba.org/eng/sentences_lists/index).

{{vm.hiddenReplies[5921] ? 'expand_more' : 'expand_less'}} hide replies show replies
Daga007 Daga007 April 28, 2011 April 28, 2011 at 4:18:04 PM UTC link Permalink

Estupendo. Gracias.

inmachan inmachan April 28, 2011 April 28, 2011 at 12:49:56 PM UTC link Permalink

Hola! Más preguntas por aquí...
Veo que tenemos una oración principal y las traducciones abajo, por lo que al principio supuse que que la principal era la original. Sin embargo, si abro una de esas traducciones entonces la que antes era la principal se convierte en una traducción más.

¿Cómo puedo saber cuál es la original? Dicho de otra forma, si por ejemplo veo que dos oraciones no se corresponden exactamente, ¿cómo sé cuál es la que está mal traducida?

Espero haberme explicado...

{{vm.hiddenReplies[5916] ? 'expand_more' : 'expand_less'}} hide replies show replies
U2FS U2FS April 28, 2011 April 28, 2011 at 12:59:52 PM UTC link Permalink

¡Hola! Lo puedes saber eso en primer lugar al mirar el número de la frase que te estas viendo, y la que viste antes o que verás después. Adémas, el historico por la derecha tambien le puede ayudar a uno, así que sepa cuáles frases se escribieron al principio.

{{vm.hiddenReplies[5917] ? 'expand_more' : 'expand_less'}} hide replies show replies
inmachan inmachan April 28, 2011 April 28, 2011 at 1:09:07 PM UTC link Permalink

Es verdad, no me había fijado en que están numeradas. ¡Muchas gracias!

arcticmonkey arcticmonkey April 28, 2011 April 28, 2011 at 6:16:45 AM UTC link Permalink

How about adding IPA as a language?

{{vm.hiddenReplies[5913] ? 'expand_more' : 'expand_less'}} hide replies show replies
papabear papabear April 28, 2011 April 28, 2011 at 7:03:11 AM UTC link Permalink

Perhaps, in a far-flung postapocalyptic future, we can offer IPA transcriptions of our audio recordings.

{{vm.hiddenReplies[5915] ? 'expand_more' : 'expand_less'}} hide replies show replies
arcticmonkey arcticmonkey April 28, 2011 April 28, 2011 at 4:09:41 PM UTC link Permalink

Seeing that transcribing into IPA is a rather tedious task, I don't think adding IPA would clutter up the system too much. It would be more of a task for the linguists amongst us anyway. As far as different language varieties go, I don't see them as a problem. To my mind, having transcriptions in multiple varieties is an asset rather than a burden. One the other hand, there's no need for completeness. One transcription is still better than no transcription at all.

{{vm.hiddenReplies[5922] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift April 28, 2011 April 28, 2011 at 4:49:14 PM UTC link Permalink

I think this might be better implemented as a separate layer, similar to, and possibly as an extension to, the current audio examples, but with some information about the dialect.

{{vm.hiddenReplies[5924] ? 'expand_more' : 'expand_less'}} hide replies show replies
arcticmonkey arcticmonkey April 28, 2011 April 28, 2011 at 7:55:40 PM UTC link Permalink

I agree.

Daga007 Daga007 April 26, 2011 April 26, 2011 at 1:17:13 AM UTC link Permalink

Hi. I'm new here! I've been browsing the web for a few hours and I'm really impressed.

I come from meneame.net too (as a matter of fact, I was the one who uploaded the "meneo" XD).

Congratulations. Felicidades a todos, estoy seguro de que llegaréis muy lejos :)

{{vm.hiddenReplies[5883] ? 'expand_more' : 'expand_less'}} hide replies show replies
szaby78 szaby78 April 27, 2011 April 27, 2011 at 8:04:03 PM UTC link Permalink

Is that a reddit-like site?

{{vm.hiddenReplies[5909] ? 'expand_more' : 'expand_less'}} hide replies show replies
Daga007 Daga007 April 27, 2011 April 27, 2011 at 8:50:23 PM UTC link Permalink

Yep, but in Spanish. http://www.meneame.net/

Duernu Duernu April 27, 2011 April 27, 2011 at 9:10:49 PM UTC link Permalink

Actually It's like Digg

yasnak1977 yasnak1977 April 26, 2011 April 26, 2011 at 2:08:12 AM UTC link Permalink

I came from meneame too.

{{vm.hiddenReplies[5884] ? 'expand_more' : 'expand_less'}} hide replies show replies
heffeque heffeque April 26, 2011 April 26, 2011 at 4:39:26 AM UTC link Permalink

Me too! (^_^)

TRANG TRANG April 26, 2011 April 26, 2011 at 9:16:48 AM UTC link Permalink

> as a matter of fact, I was the one who uploaded
> the "meneo" XD

Oh, so it was you who almost made our server crash :P Just kidding, thank you!! :D

And welcome to all the new members who found us via meneame.net and who are reading this. I hope you guys will stick around :)

oscarpi oscarpi April 26, 2011 April 26, 2011 at 3:57:45 AM UTC link Permalink

antes de nada felicitaros me parece un proyecto impresionante, mi admiración y aplauso. Soy programador y llevo pensando en este sistema hace años. Yo llegué a las siguientes conclusiones que espero que sirvan para algo:

1)Además de que las personas puedan introducir frases, necesita algún sistema automático para indexar la web y capturar frases y proponer a los usuarios que las traduzcan (las más frecuentes por ejemplo)

2)Necesita un entorno fácil para traducir textos largos que en definitiva luego serán un conjunto de frases nuevas introducidas en el sistema.

3)Algún tipo de "metafrases" o lenguaje formal o expresiones regulares simple de tal forma que se pueda especificar "soy amigo de <nombre propio>"

4)Categorías de palabras. Por ejemplo la categoría <color> que puede sustituirse por cualquiera de los colores

5)Integración de traductor de palabras que proponga sugerencias y un diccionario a los términos

6)Cierto metalenguaje: No es lo mismo un texto obtenido de una transcripción /subtitulo de un film que una frase de un libro. Sería muy poderoso que Tatoeba fuera el sistema empleado para traducir subtítulos.

7) Establecer el concepto "Tatoeba ready": Cualquier texto que tiene todas sus frases descritas en al menos un lenguaje y pasa inmediatamente un proceso de traducción automático. ¿Es mi twitt traducible al ruso o al japonés?¿Es una entrada de la wikipedia Tatoeba ready?

8) Sistema para ir escribiendo y saber si el sistema valida el texto en uno (o varios) idiomas determinados. Ej: Quiero que mi texto sea traducible al Árabe pero yo no sé Árabe, y acepto encajar mis frases para que sea traducible al Árabe. Se presupone que con el tiempo Tatoeba tendrá millones de frases

9)Caso de uso "Esquema de feedbak": Alguien crea un texto(periódico, blog, etc), una vez finalizado lo introduce en Tatoeba y el sistema le dice las frases que ya existen y las que faltan para una traducción completa (por ejemplo al inglés), entonces el usuario introducirá las frases que faltan haciendo que su texto sea traducible por Tatoeba. Posteriormente puede publicar su texto en su web dejando un enlace a la traducción que genera Tatoeba

10) El mismo caso anterior puede ser empleado para un traductor de un lenguaje minoritario: Un traductor quiere traducir un texto del New York Times al Gallego y solo ha de introducir las frases que faltan, finalmente cualquiera que acceda a Tatoeba a partir del texto original, disfrutará de la traducción en gallego.

11) Cierta análisis del origen de las frases. Las frases tienen su contexto. Sería interesante que en frases célebres aparezca el autor. Si una frase aparece en un libro de José Saramago sería interesante saberlo, como también si esa frase es empleada en suramerica pero no en España, o es del siglo XV o si pertenece a una canción. Etiquetas que identifiquen Lenguaje coloquial, formal, educado, etc

12) En base a todo esto analizar, finalmente Tatoeba pueda analizar un texto y poder mostrar cierto conocimiento: "Es lenguaje coloquial, las frases son muy frecuentes, los terminos son poco empleados, etc"

13) Tatoeba como contenedor de las tablas de traducciones usuales en los programas, de tal forma que si mi quiero traducir mi programa al arabe las pueda tomar de Tatoeba. Típicas frases como "Cerrar sesión" o "Ir a la ayuda" o "Reiniciar programa". Finalmente mi programa solo tendría que tomar una tabla de indices almacenada en Tatoeba para funcionar en otro lenguaje. Gnome como sistema ¿Tatoeba READY?

{{vm.hiddenReplies[5885] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG April 26, 2011 April 26, 2011 at 9:28:34 AM UTC link Permalink

Gracias por compartir tu ideas, oscarpi :) No tengo mucho tiempo ahora, pero intentaré responderte esta semana.

{{vm.hiddenReplies[5890] ? 'expand_more' : 'expand_less'}} hide replies show replies
oscarpi oscarpi April 27, 2011 April 27, 2011 at 10:48:02 AM UTC link Permalink

Os escribí en Español, como decía que se podía escribir en cualquier lenguaje ... Gracias por prestarme atención :-)
Suerte en vuestro proyecto

sysko sysko April 26, 2011 April 26, 2011 at 6:26:58 AM UTC link Permalink

(I've google translated it, shame on me), let's make it the first wall message "Tatoeba ready" :)

(really sorry that I don't speak a single word of Spanish)

jorgearestrepo jorgearestrepo April 27, 2011 April 27, 2011 at 2:55:44 PM UTC link Permalink

Me parece excelente tu aporte. Ojalá se pudieran implementar prontamente al menos algunas de tus ideas.

inmachan inmachan April 27, 2011 April 27, 2011 at 8:22:54 AM UTC link Permalink

Buenas! Tengo una duda respecto a la creación de oraciones, espero estar haciéndola como toca y donde toca:
He creado tres oraciones en total, las dos primeras son traducciones y la tercera propia. Pues bien, esta última no aparece en ningún lado excepto consultando mi perfil. ¿Solo pueden aportarse oraciones nuevas contribuyendo antes con varias traducciones? ¿O es que he hecho algo mal?

{{vm.hiddenReplies[5901] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pharamp Pharamp April 27, 2011 April 27, 2011 at 10:13:24 AM UTC link Permalink

Tu tercera frase ("Si fuera invisible no tendría que vestirme.") aparecerá el la búsqueda, por ejemplo, de "vestirme", "invisible" etc. después de un poquito de tiempo. El database se actualiza una vez cada semana, si me recuerdo bien por el sábado, su actualización no es todavía automática :)

{{vm.hiddenReplies[5903] ? 'expand_more' : 'expand_less'}} hide replies show replies
inmachan inmachan April 27, 2011 April 27, 2011 at 10:32:04 AM UTC link Permalink

Ahh! Entendido. Muchas gracias!