پروفائل
جملے
ژخیرہ الفاظ
ریویو
تندیراں
پسنداں
تبصرے
sysko دے جملیاں تے تبصرے
وال سنیہے
لوگو
آڈیو
نقلاں
sysko دے جملیاں دا ترجمہ کرو

@xtofu80 But simple does not imply fast, and in fact exact sentences match in a nearly 500 000 sentences database is simple but at all not fast. otherwise I will have not choose the more complex but faster duplicate removal script ^^
@feuDRenais, yep good idea, I will add it in the todo list, but to be honnest, don't expect it before a looooooong time.
@CK it can be an idea, but the question is "which sentence to add in this smaller set ?" maybe basic sentences "i love you" etc. but it will not solve the problem entirely
@sacredceltic, yep I think so, except one user recently, since I'm in tatoeba, the only reasons of duplicate were the one you give, either because due too much "indirect" to be viewed, or because people did not search before adding (for not so common sentences, I can understand that one does not check existence of every single sentence he adds)
so far the best solution I've found, is the duplicate removal script, run once a week, it handles every case (even if its look identical but are not in the same languages), keeps link, tags, audio
the two problems are the following
it's not real time
it is dependant of the database structure and so need to slightly modified each time we add new feature linking to sentences (which happen every 6 months ^^⁾
anyway a not real time solution will always the second drawback because you can't know if people add tags/ add to list/ add to [add here whatever future feature] ,
and a real time solution will need to be really fast. (fast < 0.1s)
the fact is I was extremly busy this week, so I didn't find the time to readapt the duplicate removal script, but this is now my current priority.
Personnaly I think real time is not really so much important, and once a week (or if you want 2 time a week) is enough yet.

exact, our fault, for the moment we store the user selected language only when changing which explain why on the same page it will keep the last selected language on translation popup (by default autodect)
we will try to fix asap, waithing this a quick and dirty solution is to first translate one sentence in some page and select the language you want (lets say Esperanto) , and then do your search, then the default option will be "Esperanto" (but I know it's not convenient :$, I will try to look at this tomorow)

no,it's currently generated on the fly, and in fact it's not because they're misdected that they're romanized uncorrectly, but because the software we use (adso) as some known bugs with traditional sentences containing some specific characters. So even storing the traditional/simplified status and chaning it will not solve the problem.
I've emailed the adso guys and he promised that the next version of adso will correct this problem. In the meantime we plan to store romanization, so when it will be done (but I can't promise when it will be done:( ), we will be able to correct them.

For the first problem as blay_paul said, the search engine is updated once a week, due to perfomance/architecture reason we can't update it in real time yet.
For the second problem, I need to create first the index the new languages we added these week, and I waw a bit busy so I didn't find time to do it :$
I will do my best to do it today.

done

sorry, now it's should be ok:)

Yep, you're right, I will do it :)

@Blay_paul, now the comment page can be filtered by language of the commented sentence
for example only comments on japanese sentence http://tatoeba.org/fre/sentence_comments/index/jpn

Même si je ne suis pas toujours pas d'accord avec Sacredceltic, son précédent message ne parlait absolument pas de conspiration.
Tâchons d'éviter ce genre d'attaques ad hominem, qui ne font qu'envenimer les choses.

So for those who wants to know, the question was about "does choosing directly the language instead of "autodetect" while adding sentences will slow down the process of sentence adding, and will it slowdown the server"
my answer is that as for the moment we use the Google's API for language detection (even if we plan to use our own system, I'm not fan of depending on google stuff) it will not slowndown the server, but it maybe speed up the time it takes to add a sentence (but I wonder if it's "visible" for a user)

*I think we should NOT restrict

I think we should restrict people on the language used, because I know some contributors which are not confident with english, or at least not confident with the vocabulary of the question/suggestion they want to do.
So I think we should let the decision of the language to the one who ask the question, if he wants/can do it in english. It would be a pitty to make someone no ask a question only because he can't express itself in English.

pour l'instant la détection de la langue est faite via l'API google (cela pour des raisons historiques, nous attendons une refonte prochaine du site, pour passer sur nos propres algorithmes de détection automatique)
cela ne sollicite donc pas plus le serveur, mais préciser la langue rend l'ajout plus rapide vu qu'il n'y une chose en moins à faire. Après je n'ai jamais pris le temps de comparer les deux, pour voir si le gain de temps coté utilisateur était "observable".
l'option en fait est surtout pour les langues non supportés par l'API de google, ou produisant beaucoup de mauvaise détection (comme le shanghaïen, le latin etc.)

the fact is the "educational" use exception imply that you attribute the source, it would not be problem if sentences where by tatoeba user on tatoeba for tatoeba only, the problem come the fact we release those sentences under a licence which authorize all possible use, which imply commercial and non educational use, moreover the export for the moment does not include attribution to the original author, so even for educational purpose, you need to attribute.

I agree with you, and we will still be able to judge if the system works bad in some "extreme" case (I have none in mind, but my point was that even if it's automatic, the human judgement will still be here to decide), so I think sacredceltic idea is not bad.

sure :)

I think it's about more being able have a way to clearly make the difference between a "just added sentence/ waiting for validation" and a "you can trust us, it's natural and correct sentence" rather than discouraging non native to contribute in a given language.
@sacredceltic, yep I agree with you we do need such a system, but as for automatic correction, the main problem is the time the developper of tatoeba have, and the power of server behind, because such an algorithm, even if easy to write on paper, and not so hard to write in code, is tricky to make without needing some computation, checking in the database etc. and unfortunately the server with the current feature, and current activity has already reached its limit.
So unfortunately this will take some time to implement until we found a fast/optimized way to do this.

maybe 6 months ^^ you can view it by clicking on "view more" at the top of the "last contributions" :)

for this the quick and dirty solution I see is to create a homepage for modos, which "last contribution in" "last comment on sentence in language " "last link/unlink", for the moment we only have this http://tatoeba.org/eng/contributions/index/eng
that's why this week we will discuss with Trang about starting a new version of tatoeba, more specific to contributor preference (language/status etc.) to take in account the multiplication of languages, and the increasing speed of contribution.

Pour les espaces fines insécables avant les ?!;: etc. afin que tout le monde gagne du temps, et la règle étant très souvent méconnue par les personnes apprenant le français, et même par les natifs (moi le premier jusqu'à il y a peu), et vu que sous certains systèmes d'exploitation, les insérer n'est pas trivial, je vais essayer de très rapidement faire un script qui toutes les semaines corrigent cette faute de typo dans les phrases françaises. Et quand j'aurais plus de temps je commencerai quelque chose de plus générale et en "temps réel" pour corriger/notifier ce genre de petits oublis.