Register Log in
language English

chevron_right Register

chevron_right Log in


chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio


chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

sysko sysko January 20, 2011 January 20, 2011 at 7:37:32 PM UTC link Permalink

As Ludoviko asked me in a thread about the progress of the new version, and yep it's true I don't communicate a lot about the progress (except always saying "well it will be possible in the new version) so basically

The sentence database is somewhat finish, and debugged (memory leak etc.) thanks to the help of Qdii. So what's already possible with it
* view all the translations of one sentence (even the 20th degree translation), so it will be no more a problem in the next version of tatoeba
* real time detection of duplicate (even when correcting a sentence) so here again
* perfomance improvement, if we only talk about sentence+translation retrieving (not talking about html generation etc.) it's very damn fast now, kinda 10 000 time faster for the complex queries we make in tatoeba, so here again I hope it will fix the "well i don't plan to add this or that feature because the server is already over busy"

Framework / website itself

I've said it on my twitter, we're moving from cakephp to a c++ framework, during some time we tought about django, but well seems me and biptaste are not made for this and django will have not solve the performance issue. It took us some times to started to be confident with it and set up the general architecture. But now it starts to works fine and we're going to reimplment pages now (that's one of the reason I didn't talk too much about it, because for the moment the progress we've made was mainly code stuff, so nothing "visual" tough it represent a huge part of the work)
For the geeks among us who're wondering if write tatoeba in c++ with an "obscure" framework is not some stupid decision that only increase the developping time, I will say no, in fact not so much.
1 - anyway we were about to learn a new framework, so I think an important part of the development time is spend into understanding the framework rather than "typing code", and even if the community for this framework is very small, it has one subtil advantage, the main/only developer is really accessible so we can ask him directly questions about our personnal problem, he's very reactive and we can be sure of the reliability of his answer (after all it's his project:p)
2 - As I've said one of the problem of the current Tatoeba is performance, we're not making money on Tatoeba, I'm still a student and Trang used to be, so we don't have money to spend on renting server etc. so as we're developing it for free, for fun, "how many times" we spend in developing it is not an issue, the real issue is hardware, so improving by 2 the perfomance means we can handle 2 more time users without needing new hardware / needing to spent more money (tough for the moment we're kindly host by the French FSF, but well they don't have illimited ressource, and we don't want to abuse of it). So making it now in c++ will assure us we will not need to do that in the future, so I think on the long term it will save us time/money

Also need to add that now all the feature we will add will have an api counterpart, so it will ease development of third party application using tatoeba.

So it's where we are so far, not so much "visual" stuff to present, but the motor is already on a good way, and i think it was the most difficult / "not rewarding" part. Oh forget to say we also spend some times to set up some collaborativ tools as a redmine, a git repository etc. on my server,

For the moment as I think the 3 main feature of "no limit in depth translation" and "real time duplicate detection", and "speed up" are the current 3 major problem of tatoeba. The first release of the new version will maybe bring nothing more new (except some little improvement there and there), so don't wait huge difference or brand new features. But after that we will be able to have a more frequent release cycle which will introduce new features one by one / integrate
all the request all of you made on the wall / emails
So real time research / problem with tag autocompletion will maybe part of the first new release, but if they're not they will appears in the following weeks after this first release.

For more technicals details, I think i really need to start a "what's behind tatoeba" blog, to talk about geek stuffs ^^

{{vm.hiddenReplies[4796] ? 'expand_more' : 'expand_less'}} hide replies show replies
contour contour January 21, 2011 January 21, 2011 at 8:27:05 AM UTC link Permalink

So you'll be using a newly developed database and writing the webpage in C++?
Not to say it won't work, but I hope you'll be keeping good backups. ^_^;

{{vm.hiddenReplies[4797] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko January 21, 2011 January 21, 2011 at 12:05:55 PM UTC link Permalink

you'll see ;-)

debian2007 debian2007 January 21, 2011 January 21, 2011 at 1:36:21 PM UTC link Permalink

It means every tickets like optimalization and "the small things" should be fixed in the c++ framework; is it worth send a patch to CakePHP framework? Where will be the SVN of the new framework? >:D I like to learn new things, that is why I am asking.

{{vm.hiddenReplies[4800] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko January 21, 2011 January 21, 2011 at 9:37:50 PM UTC link Permalink

the cakephp version will still live a month at least i think, so if you already have it, or if it's something not so complicated to do, yep you can submit the patch yep.

For the new version, yep as for the current, I really believe in open source, even for the code of websites, so yep it will be open to everyone under the same licence as the current one (AGPL), and as I really think we're not going to move to an other framework (except if a ASM framework exists :p, tough I'm not sure it can be faster than gcc optimized binary^^) so as soon as we get something stable and documented, part of our "duty time" will move from "coding" to "manage to have tools to permit collaborativ works also on the code itself, not only on the data", as myself I wished i could have some "open website" to study to learn how "real" websites are made. I really hope in a near future tatoeba will not only be a place to build an open corpus, but also a place to build open tools to exploit the corpus (and the website is part of them), for the greater good of common knowledge.

ludoviko ludoviko January 23, 2011 January 23, 2011 at 9:59:38 PM UTC link Permalink

Thank you very much, sysko. I am very glad that rather soon we will be able to "view all the translations of one sentence (even the 20th degree translation)" and have "real time detection of duplicate (even when correcting a sentence)". Really just marvellous!