Τοίχος (7.134 νήματα)
Συμβουλές
Πριν να κάνετε μια ερώτηση, σιγουρευτείτε ότι διαβάσατε το FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
mraz
3 μέρες πριν
mraz
3 μέρες πριν
Dovud123
3 μέρες πριν
sharptoothed
3 μέρες πριν
frpzzd
11 μέρες πριν
hecko
11 μέρες πριν
frpzzd
12 μέρες πριν
araneo
12 μέρες πριν
gillux
12 μέρες πριν
araneo
12 μέρες πριν

Sentence 155946 has two sets of indices, which point to different English sentences. Odd.

164243 is the same. Is this because duplicates are being deleted?

Yes, if they're ones I set up to be deleted I try to delete one of them in advance, but one or two slip out.

[not needed anymore- removed by CK]

1324 is bad. I missed that one.
It probably should be one of
That's /my/ line!
That's MY line!
(as there is a valid need of represented emphasis in that sentence).
I don't think 73507 and 73508 are likely to cause any trouble, but they are in violation of the 'no annotations guideline' and aren't even in the (grandfathered in) wwwjdic meta information format.

[not needed anymore- removed by CK]

1) Separate handling for Meta information in Tatoeba sentences is already in the todo list.
2) [M] and [F] tags were removed from the English sentences and applied to the Japanese in the last update done to the Tanaka Corpus before control was turned over to Tatoeba. Unfortunately recent events suggest the last update didn't make it to Tatoeba. This hasn't been fixed because of 1)
3) When the meta information system is redone it is planned to re-evaluate the basis of the [M] and [F] tags. 僕 alone will almost certainly not be worth an [M] tag (due to developments in modern Japanese). Although I would disagree that beginners of Japanese necessarily know about 'boku' and 'kimi' being used in masculine speech.

Spurious line feeds not removed from Index data input.
See the Index data for sentence 101622.

Also, could records with a 'meaning' field of -1 be excluded from the wwwjdic.csv file?

Yes.

Yes, there were a heap of them in the last wwwjdic.csv. There were also two blank lines. The first time that has happened. (Spurious line feeds?)

Also 315382 came through with "\N" as the Japanese and English. (Just an index...)

Actually, there were about 20 with \N for the English, Japanese or both.

I think they related to manual deletions or something. I think they were fixed in Tatoeba shortly after the index data download was updated.

Indeed... I'll trim the input before it gets saved.
Other than that there was another index with an extra new line. I corrected both.

Great - thanks for both of those. They'll make my life easier, once a week. ;-)

Could we have a duplicate removal script run soon to, please?

Okay, it's done.
By the way, is there any reason why you add these "For duplicate removal script" comments?
When the sentences are merged, all the comments of the deleted sentence are moved to the remaining sentence...
If you are posting these comments to keep track, it is best to also indicate the id's of the sentences that have to be merged, not just that it has to be deleted ^^

> By the way, is there any reason why you add these
> "For duplicate removal script" comments?
Only so that Jim (and other users) can see what I have changed on the basis that it is a 'near duplicate' before the merge happens. Basically it's so people have a chance to complain.
I delete the comments post merge when I come across them (which is quite often, for technical reasons).

I thought you guys checked your facebook group (ok I'll shutup :P). well, 2 issues I wanna raise (I know...others already brought it up...he he plaigiarism):
http://bit.ly/cO4t8E
http://bit.ly/cg3rXJ

I proposed a long time ago to implement the possibility in sentence comments to write something like @saeb, and it will warn you (trough a private message, or a dedicated section) this way asking for someone helps on a sentence will be easier / or to involve someone when chatting about "how to correct this sentence" ? (by the way nice picts :p)

BTW personal messages don't attract much attention. Is it possible to change design somehow when you have unread ones?

I agree. An email alert that there are messages would be good. And
of course one for "@JimBreen" in a comment too.

it's planned for this release (I will try to do both), for other users who fear "tatoeba spam" you have an option in your profile do desactivate email (though we need to make it more precise to be able to desactivate email notification for each kind, PM, comments, etc.)
this way, as Pharamp would like this (the others tell us what you think) she would like to warned when a translation is added to one of the sentences she likes, so maybe when precise filtering will be possible, have the possible to activate email sending when someone translate a favorite sentences ? (that will give a reason for the existence of favorites ^^)

Tribute to sysko:
http://bit.ly/aJ13uT

yeah

glad to know you have plans :)

tribute to Trang:
http://bit.ly/9rtoRS

still need to think of one for sysko...and baptiste :P

I am learning japanese, but I still dont know how to use Tatoeba.org!
does it all depends on asking others Questions!?

I'm not sure what idea you had of Tatoeba, but perhaps reading this will clear things up:
http://blog.tatoeba.org/2009/11...-language.html

ok, I read that thanx
so its a project more than a learning tool!
so far found it interesting =)
I will contribute to tatoeba as much as i can ^^

In fact I would rather say it's a learning tool, I would say a "learning by example" and "learning by producing" tool, rather than a complete "learning language" method
glad to see you like it :)

Do you plan on having a learning component to Tatoeba?

I would say that asking questions about the meanings of sentences / words included in Tatoeba is a perfectly valid means of learning, and taking part in the community.

Learning component?
Why complicate the project?
I just take the sentences and use them for the 10,000 sentences method. Does a hammer come with the nail? :p

IMHO what will be useful is an advanced search capabilities.
And I'm looking forward to a tagging system. I feel it should be immensely useful.

But asking questions does really help. :)
I'm really grateful to blau_paul, for he answers to most of mine.

What is the official position on puctuation?
Is ‘’ better than ''?

‘’ is sucky for WWWJDIC. Jim still uses EUC-JP by default for example display and I think it has issues with smart quotes. (Or maybe Jim just hates them ;-)

I've already changed ' to ’ somewhere... -_-
Isn't it a matter of two sed commands anyway?

WWWJDIC can display in UTF8, but files are in EUC-JP internally, and "smart quotes" are not supported there. I don't have a problem if they appear, as I can switch them to regular ones as I convert the wwwjdic.csv file.

zMoo has also been using smart quotes in his sentences.
We don't have a policy on this yet, but chances are we'll end up using smart quotes. For now, you can do whatever. But it's probably simpler for you to use straight quotes. They'll be converted when the time comes.

Minor list glitch.
If you look at the list page it says 'DELETE ME!' has 34 entries, but it actually only has seven. I think that deleting a sentence does not remove it from the 'count' in lists. There are probably lost of 'null entries' against IDs in lists and possibly favourites.

I noticed something similar yesterday with my "Dutch sentences to be translated into any language" list. It now says it has 102 sentences, when actually there are only 20. There aren't over a hundred sentences that are or have been in the list, though, so I don't think it has something to do with deleting sentences.

I have updated the lists counts. We don't know why your list ended up with 102, dorenda... But if it happens again, let us know, and try to give as many details as possible ^^

Yes indeed, deleting sentences doesn't remove from list. We know this but since it's not extremely important, we didn't fix it yet.