Wall (7,138 threads)
Tips
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
deniko
4 days ago
deniko
4 days ago
frpzzd
4 days ago
araneo
5 days ago
deniko
5 days ago
deniko
5 days ago
deniko
5 days ago
deniko
5 days ago
deniko
5 days ago
mraz
11 days ago

Quick fix idea.
This is a simple idea that should make things a little simpler. Each Japanese sentence should have zero, or one, set of index data. Spurious extra sets can be left over when duplicates are merged.
If you could generate the equivalent of the wwwjdic.csv file including only records with more than one set of index data the day _before_ you export the whole file (e.g. on Friday of each week) then me and Jim would have a chance to fix things before the weekly update.

Quick request with regard to sentence edit behaviour.
There's one thing that bothers me with how the sentence editing function works - if you look at another tab (Firefox) the edit in progress disappears.
So what happens is that I'm half way through translating a sentence when I decide to check something, and when I go back it's all gone and I have to remember it from scratch.

[not needed anymore- removed by CK]

I'm not sure they need correcting as such. It's just a sign that a typical English bible uses 'him' more and a typical Japanese bible uses 'イエス' more. What you can do is note "[Bible, Psalms 166:68]" (or whatever) if it is an actual quote and you can work out where it's from.

[not needed anymore- removed by CK]

> If you want to do something like "[Bible, Psalms
> 166:68]", then it would probably be faster to delete
> all the existing Bible sentences and find public domain
> version of the Bible in various languages and dump lots
> of those sentences into the database.
Using PD Bibles as sources sounds like a good idea to me, but the existing sentences have the advantage of already having index data so I wouldn't just delete them.

[not needed anymore- removed by CK]

> This Google search only gets "Tanaka Corpus" results.
Yeah, that's because it should be "いうところによれば"
> Doing similar searches might be an interesting
> approach to check the accuracy and/or naturalness
> of sentences in this database.
I think that's pretty much a standard approach here. Google has quite a few quirks that you need to know to get the best results, but overall it's pretty good.

Could we have an official decision on using -1 in the meaning field to mean 'not for WWWJDIC' ?
*bump*

As I said in my email, I'm okay with it. It all depends on Jim :)

[not needed anymore- removed by CK]

Every language includes a different set of ambiguities, omitted information and conventions. If you attempt to remove these ambiguities, etc. in the sentences being translated then you are going to create bias in the vocabulary used and will also be likely end up with unnatural sentences.
The typical example for this is pronouns in Japanese. If you start adding あなた to sentences that, when translated, use 'you' in the English then there will far more あなたs used in those examples than you would ever find in natural Japanese.

When I went to the "trusted" index edit page for 97078, I found there were two sets of indices! Is this a result of automerge? I can't think it is valid.

> I found there were two sets of indices!
> Is this a result of automerge?
Probably. It's not the first time that's happened.
> I can't think it is valid.
I'm not sure it would be an easy fix - if you've got two identical sentences with an index each the computer isn't going to know which one is right and which one is spurious. What you should do is delete the index of one of the sentences you make identical in advance - but I obviously didn't remember to do it for that one.

Something is screwy with a batch of indices. For example, Last week there was:
地球は球の形をしている
地球 は 球 乃{の} 形(かたち)[01] を 為る(する){している}
The latest download has it changed to:
地球 は 玉(たま) 乃{の} 形(かたち)[01] を 為る(する){している}
Presumably that 玉(たま) was meant to be 球(たま)?

> Presumably that 玉(たま) was meant to be 球(たま)?
Actually it was meant to be 玉(たま){球}. (If it was supposed to be 球(きゅう) before then, 'Oops.')
As you know, where there are multiple headwords in WWWJDIC only one is used for the indexing so that there is only one [EX] link.

Well, I changed them to 球(たま) and 弾(たま). I'll now go and do a global replace with 玉(たま){球} and 玉(たま){弾}.

[not needed anymore- removed by CK]

Actually I believe at least some sentences you’ve given don’t have any spelling mistakes. I can’t find any in 2263, 18732, 18867, 20594.
And please leave British forms like ‘decentralised’ intact. I like them. :)

I agree with Demetrius. Some of the 'mistakes' are just british spelling differences. But there are some real mistakes in there, like in 2263 where it should say 'sine and cosine' instead of 'sinus and cosinus'.

[not needed anymore- removed by CK]

I don't see why not. As long as any corrections are advisory, not automatically made.

personnaly when inputing sentence I use a firefox plugin which acts as a spell checker

Ditto. We can't make that mandatory though, and it doesn't help the ones that have already got through.
Could you get a spell check to make up a list of sentences that are flagged as having spelling mistakes? If you could then it could be split into, say, batches of 500 and handed out to individual users.
One important note - we're again seeing how the 'adoption' system is greatly slowing down attempts to correct sentences on a wide scale. CK's got two or three dozen with '/alternatives' added by a well meaning user *cough*human600*cough* and he can't fix any of them himself.
I think that there's a case for a few users to be granted higher access rights when it's been shown (and agreed) that they know the system here well and would use them responsibly.

> One important note - we're again seeing how the 'adoption' system is greatly slowing down attempts to correct sentences on a wide scale.
You feel it's slowing down because the status of moderator hasn't been established yet. Probably also because you were used to having full access to the corpus. I can understand it somehow frustrates you, not to be able to correct a sentence because it already belongs to someone.
But I cannot imagine Tatoeba without this adoption/ownership system. On a larger scale, it would be a terrible mess if everyone could edit anyone's mistakes. Posting comments makes people come back to Tatoeba, it makes them learn from their mistakes, it makes them responsible of what they contribute. To me, it's a very important part of the collaborative aspect.
Of course, some people rather feel annoyed by it, some people will never come back, and it blocks certain sentences. But the problem doesn't come from adoption/ownership, it comes from the way permissions are set. As you pointed out, we need more people with higher access rights, people who can edit everything, and delete sentences (i.e., we need moderators).
sysko has asked me if we could integrate the 'moderator' status last week. I said no because there were too many things to test already. But we can have that for next week (or perhaps even this week, depending on how productive we are).

[not needed anymore- removed by CK]

> For one of the admins to do it right now,
> shouldn't take too long.
I estimate 6 hours for the English. Less if they can find a dictionary that includes both American and British spellings.

> personnaly when inputing sentence I use a
> firefox plugin which acts as a spell checker
But not in that sentence, right? ;-)

[not needed anymore- removed by CK]

Personally, I can live without !! but I'm holding out for the occasional !?