sysko دی وال دے سنیہےــ تاتیبہ

sysko {{ icon }}

keyboard_arrow_right

پروفائل

keyboard_arrow_right

جملے

keyboard_arrow_right

ژخیرہ الفاظ

keyboard_arrow_right

ریویو

keyboard_arrow_right

تندیراں

keyboard_arrow_right

پسنداں

keyboard_arrow_right

تبصرے

keyboard_arrow_right

sysko دے جملیاں تے تبصرے

keyboard_arrow_right

وال سنیہے

keyboard_arrow_right

لوگو

keyboard_arrow_right

آڈیو

keyboard_arrow_right

نقلاں

translate

sysko دے جملیاں دا ترجمہ کرو

sysko August 13, 2010 August 13, 2010 at 11:15:22 PM UTC

link

پرمالنک

@xtofu80 But simple does not imply fast, and in fact exact sentences match in a nearly 500 000 sentences database is simple but at all not fast. otherwise I will have not choose the more complex but faster duplicate removal script ^^

@feuDRenais, yep good idea, I will add it in the todo list, but to be honnest, don't expect it before a looooooong time.

@CK it can be an idea, but the question is "which sentence to add in this smaller set ?" maybe basic sentences "i love you" etc. but it will not solve the problem entirely

@sacredceltic, yep I think so, except one user recently, since I'm in tatoeba, the only reasons of duplicate were the one you give, either because due too much "indirect" to be viewed, or because people did not search before adding (for not so common sentences, I can understand that one does not check existence of every single sentence he adds)

so far the best solution I've found, is the duplicate removal script, run once a week, it handles every case (even if its look identical but are not in the same languages), keeps link, tags, audio

the two problems are the following
it's not real time
it is dependant of the database structure and so need to slightly modified each time we add new feature linking to sentences (which happen every 6 months ^^⁾

anyway a not real time solution will always the second drawback because you can't know if people add tags/ add to list/ add to [add here whatever future feature] ,
and a real time solution will need to be really fast. (fast < 0.1s)

the fact is I was extremly busy this week, so I didn't find the time to readapt the duplicate removal script, but this is now my current priority.
Personnaly I think real time is not really so much important, and once a week (or if you want 2 time a week) is enough yet.

sysko August 9, 2010 August 9, 2010 at 9:05:35 PM UTC

link

پرمالنک

exact, our fault, for the moment we store the user selected language only when changing which explain why on the same page it will keep the last selected language on translation popup (by default autodect)
we will try to fix asap, waithing this a quick and dirty solution is to first translate one sentence in some page and select the language you want (lets say Esperanto) , and then do your search, then the default option will be "Esperanto" (but I know it's not convenient :$, I will try to look at this tomorow)

sysko August 9, 2010 August 9, 2010 at 5:13:20 PM UTC

link

پرمالنک

no,it's currently generated on the fly, and in fact it's not because they're misdected that they're romanized uncorrectly, but because the software we use (adso) as some known bugs with traditional sentences containing some specific characters. So even storing the traditional/simplified status and chaning it will not solve the problem.
I've emailed the adso guys and he promised that the next version of adso will correct this problem. In the meantime we plan to store romanization, so when it will be done (but I can't promise when it will be done:( ), we will be able to correct them.

sysko August 8, 2010 August 8, 2010 at 10:59:06 AM UTC

link

پرمالنک

For the first problem as blay_paul said, the search engine is updated once a week, due to perfomance/architecture reason we can't update it in real time yet.
For the second problem, I need to create first the index the new languages we added these week, and I waw a bit busy so I didn't find time to do it :$

I will do my best to do it today.

sysko August 8, 2010 August 8, 2010 at 10:54:54 AM UTC

link

پرمالنک

done

sysko August 8, 2010 August 8, 2010 at 10:51:41 AM UTC

link

پرمالنک

sorry, now it's should be ok:)

sysko August 8, 2010 August 8, 2010 at 10:46:47 AM UTC

link

پرمالنک

Yep, you're right, I will do it :)

sysko August 7, 2010 August 7, 2010 at 9:10:44 PM UTC

link

پرمالنک

@Blay_paul, now the comment page can be filtered by language of the commented sentence
for example only comments on japanese sentence http://tatoeba.org/fre/sentence_comments/index/jpn

sysko August 5, 2010 August 5, 2010 at 10:13:28 AM UTC

link

پرمالنک

Même si je ne suis pas toujours pas d'accord avec Sacredceltic, son précédent message ne parlait absolument pas de conspiration.
Tâchons d'éviter ce genre d'attaques ad hominem, qui ne font qu'envenimer les choses.

sysko August 4, 2010 August 4, 2010 at 10:53:08 PM UTC

link

پرمالنک

So for those who wants to know, the question was about "does choosing directly the language instead of "autodetect" while adding sentences will slow down the process of sentence adding, and will it slowdown the server"
my answer is that as for the moment we use the Google's API for language detection (even if we plan to use our own system, I'm not fan of depending on google stuff) it will not slowndown the server, but it maybe speed up the time it takes to add a sentence (but I wonder if it's "visible" for a user)

sysko August 4, 2010 August 4, 2010 at 10:49:33 PM UTC

link

پرمالنک

*I think we should NOT restrict

sysko August 4, 2010 August 4, 2010 at 10:47:49 PM UTC

link

پرمالنک

I think we should restrict people on the language used, because I know some contributors which are not confident with english, or at least not confident with the vocabulary of the question/suggestion they want to do.
So I think we should let the decision of the language to the one who ask the question, if he wants/can do it in english. It would be a pitty to make someone no ask a question only because he can't express itself in English.

sysko August 4, 2010 August 4, 2010 at 10:43:46 PM UTC

link

پرمالنک

pour l'instant la détection de la langue est faite via l'API google (cela pour des raisons historiques, nous attendons une refonte prochaine du site, pour passer sur nos propres algorithmes de détection automatique)
cela ne sollicite donc pas plus le serveur, mais préciser la langue rend l'ajout plus rapide vu qu'il n'y une chose en moins à faire. Après je n'ai jamais pris le temps de comparer les deux, pour voir si le gain de temps coté utilisateur était "observable".
l'option en fait est surtout pour les langues non supportés par l'API de google, ou produisant beaucoup de mauvaise détection (comme le shanghaïen, le latin etc.)

sysko August 4, 2010 August 4, 2010 at 10:02:38 AM UTC

link

پرمالنک

the fact is the "educational" use exception imply that you attribute the source, it would not be problem if sentences where by tatoeba user on tatoeba for tatoeba only, the problem come the fact we release those sentences under a licence which authorize all possible use, which imply commercial and non educational use, moreover the export for the moment does not include attribution to the original author, so even for educational purpose, you need to attribute.

sysko August 3, 2010 August 3, 2010 at 3:23:19 PM UTC

link

پرمالنک

I agree with you, and we will still be able to judge if the system works bad in some "extreme" case (I have none in mind, but my point was that even if it's automatic, the human judgement will still be here to decide), so I think sacredceltic idea is not bad.

sysko August 3, 2010 August 3, 2010 at 3:19:30 PM UTC

link

پرمالنک

sure :)

sysko August 3, 2010 August 3, 2010 at 2:22:48 PM UTC

link

پرمالنک

I think it's about more being able have a way to clearly make the difference between a "just added sentence/ waiting for validation" and a "you can trust us, it's natural and correct sentence" rather than discouraging non native to contribute in a given language.
@sacredceltic, yep I agree with you we do need such a system, but as for automatic correction, the main problem is the time the developper of tatoeba have, and the power of server behind, because such an algorithm, even if easy to write on paper, and not so hard to write in code, is tricky to make without needing some computation, checking in the database etc. and unfortunately the server with the current feature, and current activity has already reached its limit.
So unfortunately this will take some time to implement until we found a fast/optimized way to do this.

sysko August 3, 2010 August 3, 2010 at 11:38:53 AM UTC

link

پرمالنک

maybe 6 months ^^ you can view it by clicking on "view more" at the top of the "last contributions" :)

sysko August 3, 2010 August 3, 2010 at 11:08:22 AM UTC

link

پرمالنک

for this the quick and dirty solution I see is to create a homepage for modos, which "last contribution in" "last comment on sentence in language " "last link/unlink", for the moment we only have this http://tatoeba.org/eng/contributions/index/eng
that's why this week we will discuss with Trang about starting a new version of tatoeba, more specific to contributor preference (language/status etc.) to take in account the multiplication of languages, and the increasing speed of contribution.

sysko August 3, 2010 August 3, 2010 at 10:50:58 AM UTC

link

پرمالنک

Pour les espaces fines insécables avant les ?!;: etc. afin que tout le monde gagne du temps, et la règle étant très souvent méconnue par les personnes apprenant le français, et même par les natifs (moi le premier jusqu'à il y a peu), et vu que sous certains systèmes d'exploitation, les insérer n'est pas trivial, je vais essayer de très rapidement faire un script qui toutes les semaines corrigent cette faute de typo dans les phrases françaises. Et quand j'aurais plus de temps je commencerai quelque chose de plus générale et en "temps réel" pour corriger/notifier ce genre de petits oublis.

بھلا کوئی مدد دی لوڑ ہے؟

ڈیویلپرز

تعارف

وال (کل 1397) تے syskoدا سنیہا

بھلا کوئی مدد دی لوڑ ہے؟

ڈیویلپرز

تعارف