Burada Tatoeba'nın nasıl kullanılacağı, hatalar veya garip davranışların nasıl raporlanacağı gibi genel sorular sorabilir ya da en basitinden topluluğun geri kalanı ile kaynaşabilirsiniz.
Soru sormadan önce SSS'yi okuduğunuzdan emin olun.
En son mesajlar
- 3 saat önce, AlanF_US tarafından
- 3 saat önce, AlanF_US tarafından
- 4 saat önce, AlanF_US tarafından
- 10 saat önce, orion17 tarafından
- 22 saat önce, Hybrid tarafından
- 23 saat önce, saeb tarafından
- 1 gün önce, Hybrid tarafından
- 1 gün önce, saeb tarafından
- 2 gün önce, Hybrid tarafından
- 2 gün önce, gleki tarafından
Duvar (645 başlık)
Silja posted this message, which got deleted when I was trying to delete one of mine:
Suggestion: Please add "Not directly translated into language" drop-down to "translate the sentences of this user" (I mean this page: http://tatoeba.org/fin/activiti...ences_of/Silja)
it's been a long time since my last post here..
I have a problem, I want to browse all sentences in a language with translation but sometimes untranslated sentences are also shown up. So could you tell me how to get only translated ones shown up?
** GSOC Progress Report weeks 4-8 **
I'm sorry again that I haven't been writing more reports and keeping you
guys in the loop. But basically I was in a position were I broke everything
pretty badly and didn't just want to write a report every week that said
"I'm a complete failure and nothing works". Anyway, the backend is almost done
and I'm in a better position now, so let's talk about pytoeba's current status
and future shall we?
- User model centralization
- User private messaging
- Wall posts
- User voting
- User status/language web of trust
- python-social-auth integration
- haystack integration
- xapian-haystack indices for sentences, users, messages, comments
- tastypie integration for models
- basic tastypie integration for the python api (still needs work)
- major sql optimizations to the python api
- major numpy optimizations to the graphing backend on CPython
- alternative dependencies that are all pure python with no C required.
So we're now basically compatible with any python flavor or platform
with no extra compilation.
- iplus1 integration
- views (dropped during week 4 completely)
- js interface (there's still time for this and will be the top priority for the
rest of the time I have during gsoc)
- internationalization (well, no interface yet)
- multilingual stemming for haystack/xapian indices/queries
(unfortunately this needs an invasive patch into haystack's/xapian haystack's
code, so this won't be covered during gsoc)
So let's compare this with the original goals in my proposal which I'll copy
verbatim and mark with ✔ for complete support, ✘ for no support and ~ for
Replicate the current functionality of the website:
✔ CRUD operations on sentences, comments, and wall posts
✔ logging of all operations on sentences
~ user profiles, status, and permissions
~ browsing of sentences, tags, and users
✔ searching for sentences
✘ integration of transliteration
✘ internationalization of the entire website
Expand the website's functionality:
✔ a corrections framework
~ a corrections page with latest corrections that have been accepted, rejected, forced, or are overdue
~ uploading of audio
✔ advanced queries
✔ fetching translations up to the nth depth
✔ a user proficiency web of trust
✔ a user status web of trust
~ speed blocking/deletion of malicious users
~ a full fledged forum that can also be viewed as a wall
~ requesting words and translations
✘ sentence subscription system
✘ a user following system
✘ a whole new customisable user page that acts as a notification system for latest actions on subscribed sentences and latest actions from followd users, etc...
✘ rss and atom feeds for most pages that are essentially list views
✔ Enhance the database schema by using graph algorithms and/or integrating a graph database
Fully cache database queries and templates using memcached, and integrate an outside caching system such as varnish for server generated pages
✔ Build a set of python functions to manipulate the database and perform all functionalities that constitute an inner api, and use it in all view code and in future modules that extend the website
~ Build a RESTful api on top of the python api and django orm through tastypie
✘ Rewrite the UI to be crossplatform, dynamic, client-side, API compliant code.
~ Write a battery of tests that cover all the codebase
~ Provide administrative scripts for tasks such as importing/exporting db fixtures, importing/exporting csvs, adding new languages, extracting/updating mo/po files, search indexing, deployment on a development or production machine using vagrant/ansible
✘ Provide Help bot scripts that clean up corrections, among other things
✘ Provide an interface to all the admin scripts and help bot scripts where they can be executed manually or a cron job can be added and tweaked using the interface for them
This was done in ~8 weeks and still not close to covering everything. So I'm
essentially 2 weeks behind, actually 3 considering I'm still working on the
backend this week too. Testing stopped being a top priority, and code review
kind of died since liori was on vacation for like 2 weeks and had a family thing
for another week recently. Also the whole "huge patch" thing just made him less likely to give me any comprehensive feedback, so the feedback I got on the recent patches was a bird's eyeview of certain things I'm doing in the codebase that he doesn't like. But to really understand what happened, I'll have to walk you through all the things I decided to do after week 3, which is basically obsessing over adding more features, optimizing everything to death, and doing more interesting things like reading the codebases of third-party libraries instead of just making a website, which brings us to...
- The Road to Hell (featuring lool0 not dante):
The road to hell is interesting. At the gates you're told to abandon all hope
and in the deepest parts you watch satan eating brutus, cassius, and judas. And
at each level a sin with a poetic punishment.
- The Road to Hell and the feature craze:
Very early on I decided testing was making everything too slow, and decided
to keep adding as many features as possible with no regards to if anything
even worked. You can imagine this lead to major breakage, and it did to the
point that I made so many changes to the schema that every feature i developed
was not even manually testable, to manually test anything I have to isolate the
feature and use the last stable commit from week 3. I only started recovering
from this last week, but most tests still need to be readapted for them to
start passing. Only 6 of them pass as of the time of this report.
- The Road to Hell and the optimization craze:
You'd expect that having libraries that do everything for you would speed
development time so much you'd be done with everything in a week and then
you'd be sleeping for the next 11 weeks. But noooo... Enter the optimization
craze. Let's just dispell a great myth called the 'reusable app' first. In
the python/django world, there's this idea that it's great to invest time
in making these pythonic generic pluggable apps that can work with any other
django app out of the box. There's tons of these around and they can probably
do anything you can think of. You just need a few lines of configuration
and you're done, you get all of its features for free. This is the base of
the contrib package in django's core in fact. The auth system, the admin system,
- The Road to Hell and Reusability, Abstactions, and the ORM:
But let me tell you why python and django are always accused of not
being able to scale for shit. It can be summed up in one word: 'Leaky Abstractions'.
Django tries too hard to be pythonic and ends up wrapping the database in
a very pythonic abstraction known as the "ORM". What this does in the long
run is, the more generic the app is the worse the queries it will make for
you. At the heart of the problem is this thing called the ContentType system.
This basically allows things like "generic foreign keys" where you can have
a relation to a table dynamically without ever having to write and generate
the schema for it. This means more joins for every query. More work for your
database. Worse response time, and less scalability. It's the same problem
you get with multitable inheritance (which btw, some reusable apps i looked
at during the last 4 weeks actually used), death by joins.
It also happens to be the core reason behind all tatoeba's timeouts, inefficient queries involving joins that involve too many rows and the goddamn awful mysql that sucks at using indices, and sucks at joins to the point that 3 selects are better than a join (which really defeats the point of a relational database completely). But anyway going back to the story...
Instead of just using the damn apps and going to sleep, I was very curious
and every app I integrated or tried to integrate, I go ahead and read all
of its source code or at least the parts I care about the most. And of course,
you can imagine the horrors... So the end result was rip out or reimplement
core models and/or override classes if you can and include the package as is.
And that was the mantra. It took a week to look at userena, django-messages,
django-guardian, python-social-auth, etc... and adapt that and be happy with
the results, sort of. But this means anyone else trying to improve this code
will need to know the dependencies quite well, the python api will still be
usable without any of this knowledge though.
- The Road to Hell and python the turtle:
The last 2 weeks went to optimizing the scipy backend. You see i was getting
to know the graphing code of scipy better and then learned quite a lot about
numpy. And then realized... Python gets in the way and defeats lots of the
optimizations in scipy. So I went ahead and learned numpy properly and then
rewrote all the backend using numpy structures instead of python structures.
To say the least the results were amazing. 1 ms to redraw the entire graph
(3 million nodes and 6.6 million links). And 4 µs to add or remove a link
to the graph. I don't think I was ready for this so I tried to see if it was
possible to eliminate the Link table completely and was very disappointed
that I couldn't because I still needed to bookkeep nodes and links and build
the graph which after trying to optimize to death, took 5 secs and 70 mbs of
RAM. So I concluded it was possible but i needed to make a serious design change to what i store in the Link table so I can avoid link bookkeeping entirely
because it slowed shit too much. But I won't have time to implement this
during gsoc but it should be possible to only store information about the
connected components in a subgraph instead of links and eliminate link
bookkeeping entirely from the code. Of course, we can also eliminate the building of the graph everytime we need to calculate distances by just using a graph database like tatodb or orientdb just to map out the nodes. They build the graph once and use traversing algorithms instead of what I'm doing which is building the subgraph everytime and calculating the shortest distances (a really big hack on top of a relational db).
- The Road to Hell and raw sql on django:
Another thing I worked on for about a week was bulk operations. To give you
an overview, this is where we insert/update/delete lots of data at the same
time instead of one query each. It's basically the holy grail of relational
databases as they can do this quite well. They really scale with bulk operations. So the first obstacle is the realization that django's orm doesn't have any low level abstraction to build sql queries in a flexible way. This of course came after going through the Query, QuerySet, and sql compiling code in django's core, which also revealed another interesting problem, the bulk operations offered by the orm at the moment are broken. .delete() on the queryset does 2 queries, a select to get the ids and a subsequent DELETE on that. It, along with the bulk_insert are batched with a non-configurable batch size of 500 or so, so multiple queries will be issued that break your updates into 500 rows at a time. The .update() on the queryset is broken in a more interesting way. It updates one field in the exact same way on all the rows in the queryset. So there's no way to have true bulk updates where you need to change multiple rows in multiple different ways, a change per row or whatever. So I finally broke down and did the unthinkable. Build the sql query and parametrize it dynamically in python and issue it on the raw cursor connection to the db. This means you're on your own, mapping things to objects will be hard, django won't do it for you for sure. At the end of the week I had bulk_insert, bulk_update, and bulk_delete functions that did what I want and more and took a few milliseconds to affect hundreds of thousands of rows.
- Future Roadmap:
P.S. Almost everyone else is done with the core of their gsoc projects. I'm negotiating integration with harsh and will need to take a closer look at pallav's scripts to integrate them as well. Also, pytoeba's backend compiles on android! so we can hope to support offline features once the angular app is ported to mobile using phonegap or appcelerator titanium, something which I'm very interested in doing. This should also work on ios and windows phone as python has been compiled successfully on these platforms as well but I haven't actually tried. Here's to a brighter future for tatoeba, cheers :)
week 1: http://tatoeba.org/eng/wall/sho...#message_19654
week 2: http://tatoeba.org/eng/wall/sho...#message_19768
week 3: http://tatoeba.org/eng/wall/sho...#message_19821
weeks 4-8: http://tatoeba.org/eng/wall/sho...#message_20001
patches pending review:
project template for testing:
You learned the leaky abstraction problem the hard way. I think it’s a problem every developper (or even anyone creating things in order to ease others’ work) should be aware of. Here is a nicely written article about leaky abstractions http://www.joelonsoftware.com/a...tractions.html
Good luck for the future of pytoeba.
Thank you for your work. Do you think that you will be able to fix the Bad Gateway problem?
For the current website, no. Fixing it for good involves sinking a lot of time into optimizing queries and getting the current pages to be easier to cache, but I'm focused on pytoeba atm. I wrote about what needs to be done here:
So if anyone has the time and necessary knowledge to make it happen it would be great.
Thank you. It's too bad because today Tatoeba was down for 7-8 hours (and I'm not sure it's over yet...) Edit: I see sentences from 24 hours ago at the bottom of the latest contributions, so that's about how long it has been down now.
long timeouts are out of our control (no amount of optimizations will fix this), someone will just have to talk to the FSF and convince them to move us to another machine that's not heavily used by them.
Thank you. Do you think that like maybe Google could give you a new machine? I know that they're helping you out and I think that they have a lot of servers to make Google work. Maybe they could spare one for Tatoeba?
On the database side , why still use mysql ? Use Firebird , have a look at the latest release 2.5.3 or the upcoming 3.0 release . We have databases with beyond 60 Million rows and up to 20 joined tables and indexes work fine too !
For local testing we loaded Tatoeba data into Firebird tables with separate views
for languages with more than 2000 sentences , all is working nicely.
Maybe something to contemplate for the future development cycle...
The same reason we're not using postgresql instead and I've taken the time to look into this, we just have lots of sql that would need to be ported because it's mysql specific. The easiest thing we can do is try to make use of mariadb's BNLH joins since we won't have to port any code and I've written migration scripts for this but it seems no one had the time to see them through on the server or benchmark the resulting setup properly. But of course if you have time to port things to firebird or postgresql in the meantime that would be great too. As far as pytoeba is concerned this won't be a problem in the future, the code is portable across any database backend that django supports or can be made to support with a third party app or a custom django backend.
That's why database engines' agnostic frameworks have been invented...the framework should generate the SQL, not the programmers. I develop on MySQL and deploy on PostgreSQL. Vive la liberté !
It comes with a price though, I think rails has a less leaky ORM abstraction than django but still, there's probably gonna be better ways of writing queries that the ORM can't express. I think I might port some of the raw queries in pytoeba to peewee or sqlalchemy in the future since they have low level abstractions that tightly fit sql.
I want an esperanto autoconverter of diacritic symbols here in Tatoeba. Few keyboards allow typing them and installing a separate software is not always possible.
The French wikipedia features something like this: http://fr.wikipedia.org/w/index...al%3ARecherche
Click on the search field then click on the keyboard icon that drops down. I don't know anything about Esperanto, but I've been able to input diactrics with the ''Esperanto q sistemo" and typing "s q" "u q".
gleki, does something like this fulfill your needs? What about the other Esperanto keyboards available on Wikipedia?
На Tatoeba.org такого нет ни для одного языка. Вот здесь есть автоконвертер, если печатать в строке поиска, используя букву X, он автоматически преобразовывает.
Hello, I speak English and I'm interested in learning Chinese and Russian.
I just joined and I'm interested in taking a look around, perhaps posting some of my English sentences and the Lang8 corrected Chinese and Russian versions
Today, the thought occurred to me if it wouldn't be possible to replace the current "Wall" by some kind of forum where (members can talk about whatever they like and) each discussion has a separate thread in which posts are arranged in a similar fashion as comments on sentences, i.e. one below the other, but without the width getting narrower. The Wall, as it is now, is not really suitable for long discussions, and every Internet community should have a forum. ☺
That's a good idea.
So, all the discussions will be there, not in comments to sentences.
The more I think about this, the more it seems to me like an essential feature Tatoeba needs more than anything since a long time ago. I think people involved in Tatoeba all have their own goals, things they care about and things they don’t. A forum-like structure would allow to federate people with similar goals. Goals like “make more English-Japanese pairs”, “add more sentences in language X with rare words”, “grow the corpus of language X”, “fix mistakes in sentences of language Y” etc. People who want to ask specialists of a given field (like X-Y translators, natives of language X…) could post on a specific topic to efficiently reach them.
Instead of this, Tatoeba feels to me more like a place where everyone is working on his/her side, and communicating mostly privately with users they got to know incidentally from the sentences authorship or comments. This is not working great. (Please correct me if you don’t share that feeling.)
I wasn’t here by the time this Wall system was installed, but I wonder what were the reasons behind crafting ours thing like this, instead of the so popular forum structure with topics, that has been implemented hundreds of times already.
> I wasn’t here by the time this Wall system was installed, but I wonder what were
> the reasons behind crafting ours thing like this, instead of the so popular forum
> structure with topics, that has been implemented hundreds of times already.
Because we didn't have any specific categories in mind. We just wanted a place for people to communicate with each other, but we didn't really know how to separate the content. There was also very few contributors back then, so there wasn't a real need to split the threads between different categories.
Anyway, it's been a while that I've been thinking about how transition the Wall into something more forum-ish, because I don't think it's appropriate anymore either.
It's not too difficult, it's more or less a matter of adding a column "category" in the "wall" table, and reviewing how things are displayed. But someone needs to take the time to work on it.
But even before thinking about categories, personally, the first step I'd want to work on is to "collapse" all these threads and give the possibility to have a title for each thread (there's actually a column "title" in the "wall" table).
If you're interested to work on this subject, let me know. We can probably find a weekend to discuss about it and code it :)
Un hilo para preguntar sobre dudas en un idioma por ejemplo ingles o sobre una categoria ingles tecnico informatico, seria muy util.
La gente interesada en ayudar o en ese asunto podria subscribirse y verlo cada vez que aparece una duda de ese tipo.
Actualmente se puede preguntar en los comentarios de cada frase, pero hay que confiar en la casualidad de que la persona interesada lo vea en un lapso de tiempo corto.
Mucha gente y cada vez mas piensa que seria tremendamente util un foro.
Alguien puede traducir este mensaje si lo ve util?
AFAIK this is a problem of presentation, the database table for this would maybe just need a category field or something. It's mostly just a matter of who has time to work on this for the current website. btw, it seems like there's some terrible tree building in the php code to make this threading thing work on the wall, and it seems like everyone hates this "feature" :D
** Tatoeba update (July 20, 2014) **
# Small layout change
* The logo and search bar have been redesigned a little bit. The logo does not have the "Tatoeba.org" text in it anymore, so there is now more space for the content.
* The search bar has now an adaptable height. This probably has no visible impact for most people, but for those who use some specific font, the search bar would look a bit broken due to the higher height of these fonts, and that's now fixed.
# Clicking on a log entry highlights the corresponding translation
* On the sentence's page (for instance http://tatoeba.org/eng/sentences/show/1), you can now click on the logs to highlight the corresponding translation. This allows to identify more easily the text of the translation, since the logs only mention the translation's ID.
# Optimization of "latest contributions" in a specific language
* You may have experienced a "Gateway" error when trying to view the latest contributions in a specific language (for instance http://tatoeba.org/contributions/latest/fra). The query has been optimized to avoid this error.
That's it for this update.
On a side note, we are aware that Tatoeba has been pretty slow these days. If you are interested in helping us optimize the website, you are always welcome.
Tatoeba keeps on loging me out even though I have "remember me" check-box selected. Is there anything in this update that could have caused this?
Highlighting doesn't work for me. The sceen is just scrolled up or down (maybe the clicked translation is supposed to be in the centre of the page after this, but I can't see any pattern here) and there is no highlighting. I'm using Firefox 30.0.
People often report issues with logging in, maybe your problem is related but I don't think it comes from this update, maybe from an earlier update. At the moment we don't have any other solution than telling people to clear their browser's cache.
Regarding highlight, it's also a cache issue. Your browser was using the "old" CSS, which doesn't contain the colors for the selected log and translation. I've added a fix for it, if you try again it should be fine now.
It doesn't work well if the sentence is hidden (according to the user settings) (or if the sentence is not yet shown since page is partially different cached - new logs, old sentences...). But, highlight still remains on the previous sentence (if our click on the "hidden" sentence isn't the first click). So, I would recommend that any click on log first removes previous highlight and than try to set the right one. Otherwise, it could be dangerous to provide wrong info (previous sentence as a correct information), since we might want to base our decisions on that.
Thanks, removing cookies seems to work within Tatoeba. But if I try to access Tatoeba through one of CK's pages (for example http://a4esl.com/temporary/tato...nslated/silja/), I'm getting thrown out when I try to translate the sentence. Even if I open one of those sentences in a new window (not through frame in CK's page), Tatoeba logs me out. Strange...
Now also highlighting works fine.
I just love that new feature about clicking on the log entry. It will help a lot when we need to split mixed translations.
Boa noite! Ultimamente, tem aparecido bastante para mim a seguinte mensagem:
504 Gateway Time-out
Último incidente: Hoje, 19/7/2014, 22h38 (BRT)
Alguém saberia dizer o que está acontecendo?
Desculpe o incômodo.
Esta data é muito significativa para mim. Há exatamente um ano — no dia 17 de julho de 2013 — eu me tornei membro colaborador de Tatoeba. Considero um privilégio especial pertencer a esta comunidade admirável, que cresce em ritmo cada vez mais acentuado, adquirindo dia a dia uma importância verdadeiramente global no panorama da maravilhosa aventura das línguas. Aqui tenho feito bons amigos e amigas, à medida que me vou relacionando com essas pessoas, algumas possivelmente movidas por interesses diversos, mas todas certamente unidas em sadia e elogiável atividade, de inestimável alcance educativo. Tenho 81 anos de idade e só espero que no restante de minha vida eu possa continuar desfrutando do precioso convívio e do enriquecedor intercâmbio cultural que venho mantendo com todos vocês, meus dignos e queridos companheiros, aqui em Tatoeba.
Tiu dato estas tre signifa por mi. Ekzakte antaŭ unu jaro — la 17an de Julio de 2013 — mi fariĝis kunlaboranta membro de Tatoebao. Mi konsideras specialan privilegion aparteni al tiu mirinda komunumo, kiu kreskas je ritmo pli kaj pli rapida, kaj akiras ĉiutage vere tutmondan gravecon en la panoramo de la mirinda aventuro de la lingvoj. Ĉi tie mi estas farinta bonajn amikojn kaj amikinojn, laŭmezure, kiel mi rilatas kun tiuj homoj, iuj eble pelataj de malsamaj interesoj, sed certe ĉiuj kunigitaj en sana kaj laŭdinda agado, je netakseble eduka atingo. Mi estas 81 jarojn aĝa, kaj nur esperas, ke en la resto de mia vivo mi povos daŭrigi ĝui la multekostan kunulecon kaj riĉigantan kultur- interŝanĝon, kiujn mi estas subtenanta kun vi ĉiuj, miaj karaj kaj dignaj kamaradoj, ĉi tie en Tatoebao.
Gratulon al vi, diligentaj kunlaborantoj!
Laŭ la statistiko
Carlos Alberto ĉiun tagon skribis 78 frazojn,
Sacredceltic 73 frazojn (sed tion jam dum kvar jaroj!).
Vivu cent jarojn kaj restu fidelaj al ni!
Eu nunca me senti tão seguro com o corpus português antes de sua chegada, Carlos. Que esse um ano se converta em muitos mais.
Aunque no sepa hablar portugués, lo puedo leer hasta cierto punto y quiero agradecerle a Carlos por el esfuerzo que hace como colaborador y por lo bueno que trae no solo al corpus portugués sino al corpus ruso también. Me alegro de qué estés con nosotros. ¡Felicidades!
Ricardo14 wanted to see these stats, so I generated them.
Perhaps others would be interested in this, too.
The 132 Members Who Have Tagged Sentences
I think it would be interesting to know who 'OK' tagged sentences and in what languages...
How can there be like ~4000 different tags then? Did every user really introduce around 30 new tags?
>How can there be like ~4000 different tags then?
I'm not sure I understand this question.
However, there were 4053 tag names in last week's exported data.
>Did every user really introduce around 30 new tags?
Using the following file, I generated these stats.
94 members have created tag names that are currently in use.
Here are the Top 30.
793 - Esperantostern
552 - Scott
477 - al_ex_an_der
357 - CK
265 - sacredceltic
207 - Demetrius
102 - blay_paul
95 - FeuDRenais
94 - wallebot
81 - Pfirsichbaeumchen
66 - Pharamp
61 - arcticmonkey
58 - Swift
56 - alexmarcelo
55 - carlosalberto
54 - shanghainese
54 - marafon
43 - papabear
41 - MUIRIEL
40 - Shishir
37 - jakov
28 - Rovo
26 - tommy_san
24 - Guybrush88
23 - saeb
23 - autuno
22 - sysko
21 - fekundulo
21 - Eldad
18 - Nero
Keep in mind that the majority of the tags indicate the authors of quotes (beginning with "by").