Wall (7,146 threads)
Tips
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
frpzzd
2 days ago
EugeneGS
2 days ago
frpzzd
2 days ago
EugeneGS
3 days ago
frpzzd
3 days ago
gillux
3 days ago
frpzzd
5 days ago
sharptoothed
6 days ago
marafon
7 days ago
Pfirsichbaeumchen
7 days ago

Question, which places more strain on the server: generating sentences using a keyword query or using the random sentence generator?

random sentences for sure, mysql doesn't like random at all ^^ we're on the way to try to make it faster

Would be really cool if we could add audio someday to the example sentences ;-)

We plan to do so, you will have more details and maybe a proof of concet at the beginning of April :)

Looking forward to that!

Did you change something with the database dump? This Saturday's jpn_indices contain invalid utf8 characters and the affected lines seem to be truncated.
The following sentence ids have problems: 83767, 91272, 140460, 146080, 152054, 190707, 195118, 199753, 205628, 211131, 213530, 235850

Ah, indeed, indeed. I had changed the 'text' field from varchar to varbinary, but kept the length to 500. That's why those entries were truncated. I've fixed it and did a new export of the jpn_indices.
> This Saturday's jpn_indices [...]
How do you know about that by the way? I don't remember making it official yet, that the download files would be upadted on Saturdays. (or did I? o.o)

1000+ sentences in arabic.
I'd like 2 thank everyone that has ever thanked everyone. On behalf of all of us you've thanked I say thank u for thanking us.
lol dane cook is brilliant :D

:D thank you^^.

I believe in ghosts. I believe in aliens. But theres no way u will ever persuade me into believing in alien ghosts. Ridiculous.
I believe in the sentence method. I believe in language websites. But theres no way u will ever persuade me into believing in sentence websites. Ridiculous
yay! first tatoeba joke :P (hmm I wonder if I can consider this a wall abuse..)

TRANG says:
omg you're so funny, stop "abusing" the wall :D

I should just mention I never said that :P
But I do think it. Well, especially the "abusing the wall" part, because now I'm working on figuring out how to paginate this wall. Certainly there will be more abuse.

Just wanted to let everyone know, Tatoeba has been updated.
http://blog.tatoeba.org/2010/03...13th-2010.html
Enjoy :)

*shock* just figured out TRANG is a she, he he.

Ah, who told on me, that was supposed to be a secret.

lol, I never said this before but I actually stalked this website for quite a while before I finally decided to join, and I always imagined that you'd be like these programmers who like anime and have studied japanese for 5 yrs in their university..you know..with a cool blog about every obsessive detail of their life...and eyeglasses...you know the whole shabang... :D
P.S. guys like that do really exist :D

@blay_paul
There's a lot of english sentences that are grammatically correct but I don't think anyone will ever say them, use them, or even see them in any english media...you know they're just "out of this world". What do you think we should do with these? Should we just ignore them for the moment, and focus on those that are totally wrong?
my take is, I'm gonna stay away from translating these and stop reporting them as wrong. I'm just hoping arabic natives can use sentences I'm translating to learn english.
what do you guys think? trang? sysko?

> There's a lot of english sentences that are grammatically
> correct but I don't think anyone will ever say them, use
> them, or even see them in any english media.
I think that it's more correct to say "any _current_ English media". The Tanaka corpus is old, and it used even older sources of sentences. Quite a few of them would not be out of place in books published before 1940, but are rather confusing to those of us in 2010.
I think those that are old-fashioned or highly idiomatic should be kept as demonstrating historical usage but should not be used as guides to writing English (or for translating into English). I think they are good candidates for an [Old-fashioned] tag or something. ;-)
Another problem is those that are written like dictionary entries (lots of 'one' usage) and those that are not really whole sentences. I think these are worth improving, as time permits, but are probably not a high priority for translation into other languages.

I agree to tag them in the future as "old fashioned" "40's english" "book-style" etc... rather than just "modernize"/"oralize" them

I am not sure how promising this is, but there is a Japanese-German sentence database hosted by the University of Hiroshima (Katsumi Iwasaki). It seems to have been created in 2004, without major updates since then. Maybe there could be a collaboration with tatoeba, thus increasing the number of sentence pairs. Of course, I am not sure about whether they want to publish the corpus, I am especially unfamiliar with data policies in Japan.
Here are the links to the search engine, the data description and the researcher's website:
http://www.vu.hiroshima-u.ac.jp/deutsch/index.php
http://home.hiroshima-u.ac.jp/k...a/database.htm
http://home.hiroshima-u.ac.jp/katsuiwa/

And there's one also for spanish, I think it's free:
http://sentences.spanish-only.com/
Maybe there could be a collaboration with tatoeba with that too :)

for the spanish, yep I need to contact the guy for a long time, but hmmm never find the motivation to write a email :blush: I will try to do so, I promise

you can do it sysko! :)=)

Some more suggestions.
I know that time is limited, so I shall try to keep a high ratio of usefulness to time required to implement. ;-)
1. Add a prominent link from the Tatoeba Project home page to the Tatoeba Project Blog. Actually I think it's worth adding a "Links" item to the list of headings on the top of the page. Useful links would include popular dictionaries, language sites, and sites that host collections of sentences.
2. Wish list. Maybe best as a blog article? I think it would be nice to have an idea of what features are planned, how likely they are to be implemented and how soon. Users could comment on possible features and suggest new ones.
3. Active dictionary links. This would be a long term and high effort suggestion but I think it would be useful to have active linking available from words in example sentences. Some languages (Japanese, Chinese) would require more effort than others, but I think it would be well worth it in the long run.

1. Like sysko said, the top menu has reached its limits. Someone with a 20 characters username (which I think is the maximum length) and using the French interface... might actually not even "fit" up there if (s)he's a Linux user... Something we'll have to check.
What I can do for now though, is to have a link to the blog from the "What's new" (along with the Twitter link). And in the blog, there's a "Links" section which only has Tatoeba, but I can add other things.
2. You will have more hints on what we are working on in the next blog post. I can't be writing about everything we have in mind because there are just so many things. But I can at least mention what's planned for the next few weeks :)
3. This is actually not very easy to implement because each sentence itself is already a link, and clicking on it leads to the page where you can see the comments and the logs. But I agree that it would be definitely useful.

> This is actually not very easy to implement because each
> sentence itself is already a link, and clicking on it
> leads to the page where you can see the comments and the
> logs.
Yeah, I thought about that. What you could do, though, is implement the links in 'tooltip' style windows. For Japanese it could look rather like ...
/成る(する)
何をしていますか。
click on 成る(する) to get the full dictionary entry in a separate window / tab.

1 - in french version and also regarding to ergonomic issue, 7 items is already a maximum numbers, but in the same time I agree it will be better to have the links in more visible place, but what if we add a wiki, "dialogs" and so ? So I think it need us to review what is needed, where, and to make is as much pratical as possible, I don't really the top menu to be over bloated (but who wants ^^)
2 - In a first time yep it can be a temporary solution waiting a wiki (after finishing all the "small issue" I makke it my 1st priority)
3 - For chinese, adso (which is definitely my swiss army knife for chinese) give th possibility to segment a chinese sentences into "words", at least consistant n-grams, so it would not be "so" difficult, and I'm sure such tool exist also for japanese

> I'm sure such tool exist also for japanese
It does, but it's not 100% accurate. In any case this is the primary reason for the existence of the 'index' data for Japanese sentences.