Wall (7 179 threads)
Tips
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
Ooneykcall
18 hours ago
LeviHighway
21 hours ago
TATAR1
23 hours ago
Ooneykcall
1 days ago
LeviHighway
1 days ago
Thanuir
4 days ago
LeviHighway
5 days ago
frpzzd
5 days ago
LeviHighway
5 days ago
LeviHighway
5 days ago
Right. So. I'm not 13 years old. It was an honest mistake. Here's the problem:
I was wondering if Tatoeba had any sort of resistance to profanity. I thought something like "damnit" would be a common enough thing. So I MEANT to SEARCH for "damn". Turns out I added it as a sentence instead. Same for "fuck" because it took me two tries to realise I was using the wrong text box.
So those can be deleted outright. I didn't see any way to do it so I abandoned the sentences instead in case there is such a way and someone else wants to adopt them to get them deleted. 380292 and 380290.
Apologies.
Hahaha, it's fine. It's alos our mistake, it means we need to change the form to make it clearer that it adds a new sentence.
I will delete your entries. There's currently no way for users to delete sentences, only admins can. The only solution when you want to "delete" a sentence is to replace it by a sentence that you actually want to keep.
As for profanity, we don't have anything against it, but we'd rather avoid it until we set up a mechanism to filter out sentences that are "not safe" for kids.
Good to know. And actually it's still pretty much entirely my fault. I searched at the top but then I think I stopped paying attention so when it sent me to the page saying "Nope, but you can add a sentence:", I thought I was searching again. It's not the form's issue. It's my attention span's issue.
And for profanities , we have some "colorful" sentences (spoiler : "search XXX in the search engine")
stemming should be working again for most languages when using the search engine
i.e search "think" should also return "thinking" "thought" etc. same for French / Spanish / Italian / Russian etc.
by the way it will not work with Ukrainian but I was wondering if using the russian stemmer will produce "better than nothing" result ? Demetrius, Dorenda ?
still looking for Arabic and georgian stemmers
Probably it will. But maybe there is a way to adapt the Russian stemmer into a Ukrainian one (or at least something more fit to Ukrainian)? I have no idea how those things work or how much work it would be, but if it's feasible, I could help with that.
globally how the stemmer works for russian is explained here http://snowball.tartarus.org/al...n/stemmer.html , I admit I haven't read it entirely, as I've no notion in Russian (and moreover they provided something which work out of the box for this).
So I dunno how "easy' it is to adapt this to Ukrainian.
It looks doable. I'd just have to adapt it to the Ukrainian alphabet, change the endings into their Ukrainian counterparts and add/remove some endings that either of the two languages doesn't have.
So I'd have to just change that piece of script on the blue background, right?
yep this one http://snowball.tartarus.org/al...em_Unicode.sbl to be more precise :) thanks
Okay, I adapted it. The results won't always be right, though, cause sometimes it's just not possible to see from the form of a word what type of word it is and thus what belongs to the ending. For example, "koromyslo" is a noun, so only "o" should be removed, but the script will think it's a past tense verb and remove "lo". I tried to choose the least bad options...
Anyway, is there some way to test it? And where should I send it?
And one more question. How can I make the thing also remove the superlative prefix '{n}{a}{i'}' from the beginning of words?
send us the file to our email address team [at] tatoeba [dot] org, and i will see how to integrate it.
to be honnest i don't really how it works (A) at least I will contact the guys of this project to see what can we do:),
but it's already great if you have adapted it to Ukrainian
Congrats on the new server! I can already feel the site is 100x faster. oh and I'm in love with the new inbox, great update!...now are we cool or are we cool :)
You're cool. :)
It's so much faster, great! :D
(And I just loved that note we got while the site didn't work. :))
*psst* Trang (or sysko)
I need to replace
それら<space>
with
其れ等{それら}<space>
but the <space> doesn't seem to work for the 'replace with' field. (At least there aren't any spaces in the preview).
There are 87 instances that need to be replaced in the index field so I don't really want to do it manually.
You have to use an actual space in the "Replace" field, not the <space> tag :)
The reason why you have to type <space> in the "Search" is because trailing spaces are not taken into account in the search, for some reason. But the "Replace" field accepts trailing spaces (normally...).
> You have to use an actual space in the "Replace" field, not the <space> tag :)
I tried it both ways - no spaces in the preview display.
I've found what the problem is, though. The preview button only works ONCE. If the old preview is still displayed then it doesn't do anything when you click the preview button with a different string.
> The preview button only works ONCE.
Ah right, I forgot to warn you about this. The "preview" function may work more than once, but I have yet to figured out the conditions for it to work/not work a second time. In your case, I'm guessing it didn't work because of the < and >...
About names, can we translate them if there's an obvious correspondence? I'm talking about names like Peter, Mary, etc.
Good question. Personally, I never translate them, because I think Ann should be called Ann, and not Anne, no matter if she is in France or in the UK at the moment ;).
But I often see translations of names on Tatoeba...
Hmmm. So my younger sister should change her name from Anne to Ann?
My parents got it wrong? 8-)
Actually Anne is about as common as Ann among English-speaking people. Canonical spellings are a thing of the past. and we've always had Graham and Graeme, Roger and Rodger, etc.
That's not what I meant.
I just meant that I would call your sister like your parents call her and not translate her name in my language (or in any other language).
With Japanese you should 'transliterate' to katakana so Paul becomes ポール (for instance). When going from Japanese to English there are a number of variations to consider.
You can.
As far as I'm concerned, I have the same opinion as Muiriel. But we won't forbid translations of names. I don't see any good reason to forbid it anyway.
In general I agree that a person should be called by his/her own name, no matter where he/she is, but some languages have more of a tendency to translate names (as I read somewhere lately, when they speak about George Bush in Scottish Gaelic, they call him Seòras Bush, for example, while in Dutch we would (nowadays) just leave his name the way it is), so I think you should also consider how common it is for the language you're translating into to translate names or to use the foreign version.
And then there is the next problem... Suppose an English sentence about Peter has been translated into Russian by someone who decided to translate the name. So now we have a Peter and a Pyotr. If someone translated the Russian sentence into Ukrainian, it would look silly not to make it Petro, since that's how they do it: Ukrainians use different versions of their name depending on what language they are speaking. Now if I wanted to translate any of these sentences without translating names, I'd have to make three translations. Or I could just choose one of them and link my translations to all other three sentences, but it would be strange to have a Dutch sentence with Pyotr as a translations of an English sentence about Peter, for example. So I would choose a name that is common in Dutch: Peter, or maybe Pieter or even Petrus.
Long story, but what I wanted to say is: it all depends on the situation and the language you're translating into. :)
same example for the French^^: They pronounce George Bush as if it was a French name. Too strange for me as German - we would never call him Georg Busch :D.
oh god I would never translate peter into arabic. The arabic version sounds awful :P
Sentence Annotation page
Could you put up a "Changes saved" message on the page after you click the 'save' button? Otherwise it's easy to forget whether you've saved the work you've done or not.
I second that request. Also a log of changes would be really good.
> Could you put up a "Changes saved" message on the page after you click the 'save' button?
Yes, I'll take care of this after we have moved to our new server.
> Also a log of changes would be really good.
I'll try to do that for the end of the month.
MeCab dictionary usage.
I see that MeCab installs (by default) with IPADIC.
Looking at this page
http://mahoro-ba.net/e1316.html
it would seem that Unidic may give a superior result if it can be used. I plan to do a little experimentation to see if I can improve the parsing capabilities of MeCab from the default setup.
In this regard I would be grateful if someone could recommend a USER FRIENDLY free database that SUPPORTS JAPANESE CHARACTERS.
SQLite?
> SQLite?
Sounds familiar. Actually I installed that on my previous computer (although in the end different software suited me better for what I was working on then). I gave it a try again, but it's not user friendly enough for me (I'm from the graphical interface generation ;-)
MySQL Workbench + Server looks promising, I'm giving that a try now.
Disappointed in MySQL Workbench. It's /nearly/ there, but not quite. :-( I'm seriously considering buying Access 2007 now.
I could probably use Excel 2007 for some of it - but it really isn't a good idea to 'make pretend' that a spreadsheet is a database.
mysql + phpmyadmin ?
I think I'll probably start off with Excel 2007 (because I'm very familiar with it) then gradually migrate the content to MySQL. MySQL isn't bad but there are too many gaps in the thin GUI veneer provided by Workbench. Like having to resort to command line SQL stuff to import data from text. I miss the Access wizards for that sort of thing.
google spreadsheet?
> google spreadsheet?
Sounds rather too 'spreadsheety'. ;-)
I've got Excel 2007 for that. (I'd use Access 2007, but I couldn't afford the professional version of Office)
Now if anyone feels like donating it ... :-)
plus you can get others to collaborate...in real time
It wouldn't work for what I want to do - the maximum number of rows* is too small.
* Technically maximum number of cells, as the rows allowed varies depending on how many columns are used.
it's way neater :), i heart it :P
Update. I tried out Unidic and it reads こういう風 correctly.
Also note that the page linked above shows that you can get auto-generated audio for Japanese sentences. Obviously a human voice would be best, but auto-generated would be a good start (seeing we have so many sentences to deal with). The Unidic voice example is very good.
Unidic has some copyright issues. Kokken have wrapped it up in some typically stupid requirements. NAIST (from whence ChaSen and MeCab
come) have frozen IPADIC (which also has copyright issues) and concentrate on NAIST-JDIC which is much more kosher freeware.
Later this year I'll be starting work on building a super-large dictionary for MeCab/Chasen for a project I'm involved in. I probably won't be able to make a public release of it as I'll be using lexical material from commercial sources and I've signed all sorts of agreements. I'll explore if I can get a copy to Tatoeba.
Missing sentence? (Possibly recently deleted)
ここは天気が良ければとても良い眺めが得られます。
Here, if the weather's good, you can get a lovely view.
Still there. Japanese is 77859 (owned by you) and the English is 325859. It can't be found by searching for the words, for some reason. Any idea why, Trang? I often can't get to Japanese sentences when using the text as a search key, and I have to go in via the number.
Because it hasn't been indexed by the search engine. I haven't launched the indexation process for a while...
The index has been updated, we've switched from lucene to sphinx for the search engine, and we will try to soon make it real-time updated :)
There are two sets of sentences, one saying that Latin is a highly inflected language, and the other saying that Latin is a dead language, and they're linked. I think the Polish and Ukrainian sentences that link them should be unlinked, so that they become two seperate sets, but both owners of these sentences are not trusted users. Can you do that, TRANG, or someone else?
http://tatoeba.org/eng/sentences/show/352492
Okay, done. I could make zipangu a trusted user as well, but he hasn't been back in a while...
Removing sentences from my list of favourites doesn't seem to work. Is it just me, or do others have the same?