bitmu (to 7183 boxna toi)
te sidju se stidi
i ba'o lo ka retsku vau do tcidu e'o lo cafne se retsku
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
small_snow
pu za lo djedi be li 2
gillux
pu za lo djedi be li 2
sharptoothed
pu za lo djedi be li 2
fatimamarques
pu za lo djedi be li 2
fatimamarques
pu za lo djedi be li 2
AlanF_US
pu za lo djedi be li 2
marafon
pu za lo djedi be li 2
PaulP
pu za lo djedi be li 2
Vortarulo
pu za lo djedi be li 2
Ooneykcall
pu za lo djedi be li 4
[not needed anymore- removed by CK]
Hmmm. "He adopted a war orphan and is bringing her up as a foster daughter."
Comment on the sentence itself and Francis will get a message.
Bump.
Examples.gz, how often is it updated?
On the Monash FTP Archive page it says:
[...] It is updated daily from the server site.
* examples.gz (8371412 bytes) the file.
But I think that is no longer accurate. The one I've just downloaded hasn't been updated from a week ago. Also could the ID numbers given in examples.gz be from the Japanese sentences, not the English sentences? The English ones aren't unique so the IDs are pretty much useless.
You'll have to ask this to Jim for this because I don't think anyone else has the answer here ^^
It may be faster to simply send him an email...
Could you make available a download with the stuff Jim uses for WWWJDIC? i.e.
a) All Japanese sentences that have an Index field linked. (With sentence number of Japanese sentences)
b) All English fields that are mentioned in the 'Meaning' field. (With sentence number of English sentences)
c) All index fields. (With 'Meaning' field).
I haven't had time to update the "downloads" page yet, but the file data Jim uses can be downloaded here:
http://tatoeba.org/app/webroot/files/downloads/
(wwwjdic.csv)
The fields are:
jpn_sentence_id, eng_sentence_id, jpn_text, eng_text, jpn_index
Just one more question - how often do you update the files on the download page?
On this page:
http://tatoeba.org/app/webroot/files/downloads/
Once a week. On Saturdays around 9AM France time.
On the download page that you can access from the link at the bottom, never. I have to update that page though, to link to the files in http://tatoeba.org/app/webroot/files/downloads/.
That should do nicely. Thanks.
The other thing is that I'd like to do a complete check and revamp of the index field. To be certain of not losing any data I'd need you to lock the index field so I can download / fix up / upload without any changes happening on your side.
I'm still working on things so I probably won't be ready for a week or so.
Can you keep me in the loop. I change the odd index when making a correction to a Japanese sentence. Also, when I do the weekly download I run it though a utility that checks that the index and sentence agree. That way I can detect when others have changed a sentence. I usually have to update the index and occasionally add to the list of names to be ignored (e.g. ムーリエル and 赤ずきん this week.)
In addition, I have a list of words from Collin McCulley which had mismatches between the index and dictionary. I have cleaned up most of them, but still have ~100. We need some way of tracking when they get out of kilter, mainly when dictionary entries need qualifying.
@Paul, yes, I can easily block the access to the indices to everyone.
I download the file Trang sets up once a week. I check it over then set it up in the WWWJDIC system. At that stage the "examples.gz" file, etc. is rebuilt.
I missed the bit about the ID numbers. I use the English sentence number because 90%+ of the corrections coming from WWWJDIC users are to the English sentence, so it makes sense for WWWJDIC to link there.
As discussed on another forum, I could put in both - e.g.#ID=375963_12345. I have a major change half done in WWWJDIC which is blocking other changes. Once it is clear (maybe a week or so) I can make that sentence number change. I'll enable WWWJDIC users to select whether they want to link to the Japanese or English.
Example export system.
Japanese sentences with multiple English translations are (sometimes?) being exported with both versions.
For example:
4924 73899 「これが探していたものだ」と彼は叫んだ。 "This is what I was looking for," he exclaimed. 此れ[01]{これ} が 探す{探していた} 物|1(もの)[01]{もの} だ と|1 彼|2(かれ)[01] は|1[01] 叫ぶ{叫んだ}
4924 1513 「これが探していたものだ」と彼は叫んだ。 "This is what I was looking for!" he exclaimed. 此れ[01]{これ} が 探す{探していた} 物|1(もの)[01]{もの} だ と|1 彼|2(かれ)[01] は|1[01] 叫ぶ{叫んだ}
On the sentence annotations page only 73899 is given as the 'meaning' field with the Index field. So either a) The meaning field in the sentence annotations page isn't used, or b) There can be two or more 'meaning' field values, but only one is shown on the sentence annotations page.
I've noticed that, and I have assumed it wasn't used. When I notice cases such as that one, I have changed the English so they become identical, and hope it will lead to the removal of one of them
That's OK for WWWJDIC, and for cases where one is mistaken or they are both very similar.
It's not good 'Tatoeba practice' though.
In the case you quoted I thought it *was* the Tatoeba practice. They only differ by an exclamation mark.
Where they differ in more substantial ways, e.g. choice of personal pronoun where the Japanese has none, I guess there is a case for both being kept, and that's a situation where an index is best tied to a sentence-pair rather than just to the Japanese.
Tying to sentence pairs has another problem. The number of sentences in German and French is getting to the stage where it would be nice to include them in WWWJDIC. I'd be looking to having "examples_de" and examples_fr" extracts along with indices.
> In the case you quoted I thought it *was* the Tatoeba
> practice. They only differ by an exclamation mark.
The case I quoted, yes. There are, however, at least a handful where both alternatives are valid, significantly different, and illustrate something about the English language.
.csv format in downloads.
Just a note for those using the downloads. You use \ as the escape character.
This is a line from your csv file:
"4923";"1512";"「信用して」と彼は言った。";"\"Trust me,\" he said.";"信用 為る|1(する){して} と|1 彼|2(かれ)[01] は|1 言う{言った}"
This is how it appears when loaded into Excel.
4923 1512 「信用して」と彼は言った。 \Trust me,\" he said." 信用 為る|1(する){して} と|1 彼|2(かれ)[01] は|1 言う{言った}
Excel uses double " marks when escaping quotes. The same line in csv for Excel would be...
"4923";"1512";"「信用して」と彼は言った。";"""Trust me,"" he said.";"信用 為る|1(する){して} と|1 彼|2(かれ)[01] は|1 言う{言った}"
Which imports to Excel as follows:
4923 1512 「信用して」と彼は言った。 "Trust me," he said. 信用 為る|1(する){して} と|1 彼|2(かれ)[01] は|1 言う{言った}
I think the 'escaping with extra quote mark' may be the more standard version ...
Right. So. I'm not 13 years old. It was an honest mistake. Here's the problem:
I was wondering if Tatoeba had any sort of resistance to profanity. I thought something like "damnit" would be a common enough thing. So I MEANT to SEARCH for "damn". Turns out I added it as a sentence instead. Same for "fuck" because it took me two tries to realise I was using the wrong text box.
So those can be deleted outright. I didn't see any way to do it so I abandoned the sentences instead in case there is such a way and someone else wants to adopt them to get them deleted. 380292 and 380290.
Apologies.
Hahaha, it's fine. It's alos our mistake, it means we need to change the form to make it clearer that it adds a new sentence.
I will delete your entries. There's currently no way for users to delete sentences, only admins can. The only solution when you want to "delete" a sentence is to replace it by a sentence that you actually want to keep.
As for profanity, we don't have anything against it, but we'd rather avoid it until we set up a mechanism to filter out sentences that are "not safe" for kids.
Good to know. And actually it's still pretty much entirely my fault. I searched at the top but then I think I stopped paying attention so when it sent me to the page saying "Nope, but you can add a sentence:", I thought I was searching again. It's not the form's issue. It's my attention span's issue.
And for profanities , we have some "colorful" sentences (spoiler : "search XXX in the search engine")
stemming should be working again for most languages when using the search engine
i.e search "think" should also return "thinking" "thought" etc. same for French / Spanish / Italian / Russian etc.
by the way it will not work with Ukrainian but I was wondering if using the russian stemmer will produce "better than nothing" result ? Demetrius, Dorenda ?
still looking for Arabic and georgian stemmers
Probably it will. But maybe there is a way to adapt the Russian stemmer into a Ukrainian one (or at least something more fit to Ukrainian)? I have no idea how those things work or how much work it would be, but if it's feasible, I could help with that.
globally how the stemmer works for russian is explained here http://snowball.tartarus.org/al...n/stemmer.html , I admit I haven't read it entirely, as I've no notion in Russian (and moreover they provided something which work out of the box for this).
So I dunno how "easy' it is to adapt this to Ukrainian.
It looks doable. I'd just have to adapt it to the Ukrainian alphabet, change the endings into their Ukrainian counterparts and add/remove some endings that either of the two languages doesn't have.
So I'd have to just change that piece of script on the blue background, right?
yep this one http://snowball.tartarus.org/al...em_Unicode.sbl to be more precise :) thanks
Okay, I adapted it. The results won't always be right, though, cause sometimes it's just not possible to see from the form of a word what type of word it is and thus what belongs to the ending. For example, "koromyslo" is a noun, so only "o" should be removed, but the script will think it's a past tense verb and remove "lo". I tried to choose the least bad options...
Anyway, is there some way to test it? And where should I send it?
And one more question. How can I make the thing also remove the superlative prefix '{n}{a}{i'}' from the beginning of words?
send us the file to our email address team [at] tatoeba [dot] org, and i will see how to integrate it.
to be honnest i don't really how it works (A) at least I will contact the guys of this project to see what can we do:),
but it's already great if you have adapted it to Ukrainian
Congrats on the new server! I can already feel the site is 100x faster. oh and I'm in love with the new inbox, great update!...now are we cool or are we cool :)
You're cool. :)
It's so much faster, great! :D
(And I just loved that note we got while the site didn't work. :))
*psst* Trang (or sysko)
I need to replace
それら<space>
with
其れ等{それら}<space>
but the <space> doesn't seem to work for the 'replace with' field. (At least there aren't any spaces in the preview).
There are 87 instances that need to be replaced in the index field so I don't really want to do it manually.
You have to use an actual space in the "Replace" field, not the <space> tag :)
The reason why you have to type <space> in the "Search" is because trailing spaces are not taken into account in the search, for some reason. But the "Replace" field accepts trailing spaces (normally...).
> You have to use an actual space in the "Replace" field, not the <space> tag :)
I tried it both ways - no spaces in the preview display.
I've found what the problem is, though. The preview button only works ONCE. If the old preview is still displayed then it doesn't do anything when you click the preview button with a different string.
> The preview button only works ONCE.
Ah right, I forgot to warn you about this. The "preview" function may work more than once, but I have yet to figured out the conditions for it to work/not work a second time. In your case, I'm guessing it didn't work because of the < and >...
About names, can we translate them if there's an obvious correspondence? I'm talking about names like Peter, Mary, etc.
Good question. Personally, I never translate them, because I think Ann should be called Ann, and not Anne, no matter if she is in France or in the UK at the moment ;).
But I often see translations of names on Tatoeba...
Hmmm. So my younger sister should change her name from Anne to Ann?
My parents got it wrong? 8-)
Actually Anne is about as common as Ann among English-speaking people. Canonical spellings are a thing of the past. and we've always had Graham and Graeme, Roger and Rodger, etc.
That's not what I meant.
I just meant that I would call your sister like your parents call her and not translate her name in my language (or in any other language).
With Japanese you should 'transliterate' to katakana so Paul becomes ポール (for instance). When going from Japanese to English there are a number of variations to consider.
You can.
As far as I'm concerned, I have the same opinion as Muiriel. But we won't forbid translations of names. I don't see any good reason to forbid it anyway.
In general I agree that a person should be called by his/her own name, no matter where he/she is, but some languages have more of a tendency to translate names (as I read somewhere lately, when they speak about George Bush in Scottish Gaelic, they call him Seòras Bush, for example, while in Dutch we would (nowadays) just leave his name the way it is), so I think you should also consider how common it is for the language you're translating into to translate names or to use the foreign version.
And then there is the next problem... Suppose an English sentence about Peter has been translated into Russian by someone who decided to translate the name. So now we have a Peter and a Pyotr. If someone translated the Russian sentence into Ukrainian, it would look silly not to make it Petro, since that's how they do it: Ukrainians use different versions of their name depending on what language they are speaking. Now if I wanted to translate any of these sentences without translating names, I'd have to make three translations. Or I could just choose one of them and link my translations to all other three sentences, but it would be strange to have a Dutch sentence with Pyotr as a translations of an English sentence about Peter, for example. So I would choose a name that is common in Dutch: Peter, or maybe Pieter or even Petrus.
Long story, but what I wanted to say is: it all depends on the situation and the language you're translating into. :)
same example for the French^^: They pronounce George Bush as if it was a French name. Too strange for me as German - we would never call him Georg Busch :D.
oh god I would never translate peter into arabic. The arabic version sounds awful :P
Sentence Annotation page
Could you put up a "Changes saved" message on the page after you click the 'save' button? Otherwise it's easy to forget whether you've saved the work you've done or not.
I second that request. Also a log of changes would be really good.
> Could you put up a "Changes saved" message on the page after you click the 'save' button?
Yes, I'll take care of this after we have moved to our new server.
> Also a log of changes would be really good.
I'll try to do that for the end of the month.