clear
{{language.name}} 言語が見つかりません
swap_horiz
{{language.name}} 言語が見つかりません
search

掲示板(スレッド数:5,770)

ヒント

質問をする際は、必ずあらかじめよくある質問をお読みください。

なお、Tatoebaは文明的な討論を行うために健全な雰囲気を維持することを目指しています。悪質な行為に対するルールも併せてお読みください。

最近の書き込み subdirectory_arrow_right

lbdx

44分前

subdirectory_arrow_right

Thanuir

1時間前

subdirectory_arrow_right

gillux

1時間前

subdirectory_arrow_right

AlanF_US

3時間前

subdirectory_arrow_right

CK

15時間前

feedback

AmarMecheri

1日前

subdirectory_arrow_right

lbdx

1日前

subdirectory_arrow_right

AlanF_US

1日前

subdirectory_arrow_right

CK

1日前

subdirectory_arrow_right

CK

1日前

Demetrius Demetrius 2010年4月17日 11:48 2010年4月17日 11:48 link permalink

I feel like adding a bit of Belarusian sentences. Could you please add it?

I’ll add sentences in Official Belarusian, so I suggest marking Belarusian with the current flag of Belarus.

Those who use Classical Belarusian don’t like that flag anyway. :)

{{vm.hiddenReplies[512] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
TRANG TRANG 2010年4月25日 12:38 2010年4月25日 12:38 link permalink

Belarusian added :)

Could you tell us more about the differences between "Official Belarusian" and "Classical Belarusian"? Is there (will there be) a need to support both someday? Should they be two distinct "languages"? Or is it more like the case of Chinese (traditional vs. simplified)? Is it possible to convert automatically between official to classical Belarusian?

{{vm.hiddenReplies[587] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
Demetrius Demetrius 2010年4月26日 9:37 2010年4月26日 9:37 link permalink

It is called ‘Classical Orthography’, but in fact it is a different language standard. The two forms of Belarusian stemmed from a Soviet language reform that has (arguably) brought Belarusian closer to Russian (on the other hand, academic Belarusian is, arguably, closer to Polish).

Official Belarusian is taught at schools, but those who use Belarusian every day often prefer Classical one. Classical is rather widespread in the Internet. Laws accompanying a new (minor) reform of the official Belarusian in 2007 also have in fact banned the use Classical Belarusian in press.

Apart from merely orthographical differences (сьнег/śnieh vh снег/snieh; робісся/robiśsia vs робішся/robišsia), there are lexical and grammatical ones.

A big problem is the transcription of loanwords. They aren’t simply written differenlty, they are pronounced differently. In Official Belarusian they are (somewhat inconsistently) borrowed from Russian or using similar transcription system, whilst in Classical they are borrowed from West-European languages directly: метр/mietr vs мэтр/metr for meter, опера/opiera vs опэра/opera, сымбаль/symbal vs сімвал/simvał, Атэны/Athens vs Афіны/Afiny.

There are also some grammatical forms acceptable in Classical Belarusian but considered dialectal in official (synthetic future tense: рабіцьму/rabićmu vs. буду рабіць/budu rabić), and different tendencies towards forming some forms (Gen. pl. of мова can be моў or моваў; classical prefers the latter whilst official the former).

There are also some words words considered Russisms/Polonisms in one variant and widespread in the other (працэнт/pracent vs адсотак/adsotak; цячэнне/ciačeńnie vs плынь/plyń).

Automatic conversion is possible, but there is no aviable open-source software to do this. The only one I know is “Litara”, a plugin for MS Word 2000 (http://pravapis.tut.by/), which is closed-source and is unlikely to be ported to new versions of Word because of Microsoft’s new policy (now it’s necessary to obtain permission from MS).

Wikipedia has 2 versions (be and be-x-old).

{{vm.hiddenReplies[597] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
TRANG TRANG 2010年4月26日 18:44 2010年4月26日 18:44 link permalink

Thanks for the explanation!

Dorenda Dorenda 2010年4月25日 16:23 2010年4月25日 16:23 link permalink

As far as I know, it's mostly a matter of spelling, but I'll let it to Demetrius to tell you more about it. :)

What I wanted to say: Belarusian is not displayed in the list of numbers of words in each language on the main page.

{{vm.hiddenReplies[592] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
sysko sysko 2010年4月25日 18:26 2010年4月25日 18:26 link permalink

now it should be displayed :)

{{vm.hiddenReplies[593] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
Dorenda Dorenda 2010年4月25日 18:59 2010年4月25日 18:59 link permalink

Yep. Yay, 2 sentences already! :p

sysko sysko 2010年4月18日 0:22 2010年4月18日 0:22 link permalink

No problem, but it will be added next week (the change of server makes us a bit busy ^^)

{{vm.hiddenReplies[516] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
Pharamp Pharamp 2010年4月18日 10:19 2010年4月18日 10:19 link permalink

I saw also that Icelandic isn't listed :(
I don't think I'm really good at it for making ten sentences, but I can try :)

{{vm.hiddenReplies[518] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
Swift Swift 2010年4月25日 14:15 2010年4月25日 14:15 link permalink

I was actually contemplating requesting for Icelandic...

{{vm.hiddenReplies[588] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
sysko sysko 2010年4月25日 14:37 2010年4月25日 14:37 link permalink

and now your dream comes true :) Iceland is officialy supported by Tatoeba :)
have fun ;-)

{{vm.hiddenReplies[589] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
Swift Swift 2010年4月25日 14:40 2010年4月25日 14:40 link permalink

That was fast...!

Dorenda Dorenda 2010年4月17日 23:07 2010年4月17日 23:07 link permalink

Good idea. I was hoping you were going to add some Belarusian. :)

blay_paul blay_paul 2010年4月22日 12:55 2010年4月22日 12:55 link permalink

> http://tatoeba.org/app/webroot/files/downloads/
> [Updated] Once a week. On Saturdays around 9AM France time.

If you get this message, could you do one early? :-)

I've made a quite a lot of changes on the Tatoeba side and I want to see if they came out OK.

Also could you put a block on the Japanese index field from when you update the download files next till you import an updated version I send you?

{{vm.hiddenReplies[554] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
TRANG TRANG 2010年4月24日 23:05 2010年4月24日 23:05 link permalink

Actually the exports are going to be delayed to Sunday evening. Didn't have time to work on that today ^^'

{{vm.hiddenReplies[586] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
JimBreen JimBreen 2010年4月26日 13:21 2010年4月26日 13:21 link permalink

Is the 25 April "wwwjdic.csv" OK to use? I'll download and check it, but I won't install until you or Paul say it's clear.

{{vm.hiddenReplies[598] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
TRANG TRANG 2010年4月26日 13:32 2010年4月26日 13:32 link permalink

Yes please check it. I did the export yesterday but I was too tired to notify you about it.

I changed the way the data is retrieved, accordingly to the email I had sent you a few weeks ago (i.e. I'm using the meaning_id now).

Also, the double quotes are now escaped with a double quote, like Paul explained here:
http://tatoeba.org/eng/wall/sho...33#message_533

If there is any problem, let me know.

{{vm.hiddenReplies[600] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
JimBreen JimBreen 2010年4月26日 13:53 2010年4月26日 13:53 link permalink

Well, my conversion blew up. It wasn't just the double quotes you changed - you dropped the quotes around the sentence numbers.

E.g. it used to be:

"74008";"329712";"この天気とは気長に付き合っていくしかない。";"You have.....

Now it is:

74008;329712;"この天気とは気長に付き合っていくしかない。";"
You have....

Can you put them back? I was relying on the ";" as a field separator.

{{vm.hiddenReplies[601] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
TRANG TRANG 2010年4月26日 18:37 2010年4月26日 18:37 link permalink

Alright, I re-exported the WWWJDIC file with the quotes around the ids.

If you prefer the fields to be separated with a TAB, we can also do that in the future.

{{vm.hiddenReplies[607] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
JimBreen JimBreen 2010年4月27日 0:51 2010年4月27日 0:51 link permalink

I'm happy with it the way it is.

{{vm.hiddenReplies[622] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
JimBreen JimBreen 2010年4月27日 1:25 2010年4月27日 1:25 link permalink

OK, some feedback on the latest:

(a) you also changed the way of signalling empty
sentences from \N to N.

(b) in a bizarre twist, sentence 249593 has been delivered with the Dutch sentence instead of the Japanese one!

A: Op onze website, http://www.example.com, staat alle informatie die je nodig hebt. Our Web site, http://www.example.com will tell you all you need to know.#ID=249593
B: 私ども 乃{の} ウェブサイト~ は 貴方(あなた)[01]{あなた} に 必要[01]{必要な} 情報 を 全て 伝える{お伝え} 為る(する)[10]{します}

Looking at these in the database I can't see a reason why it happened.

(c) the 41 with missing sentences are interesting. Most (36) have Japanese and indices but no English. Presumably they had English once. Five have English and the Japanese indices, but no Japanese sentence. E.g.

"205059";"42301";"N;"That's all right.";"其れ[01]{それ} は|1 申し分 無い{ない}

205059 doesn't exist. 42301 is there, but it is linked to 205057. This seems to be a case of broken links, perhaps during deletion of duplicates.

Does anyone want to see the list of the 41?

(d) is it OK to amend some indices. About 5 don't currently match the sentence text.

{{vm.hiddenReplies[623] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
blay_paul blay_paul 2010年4月27日 4:47 2010年4月27日 4:47 link permalink

Not Trang, and commenting at 3am - so take with large helping of salt.

(a) Not nice, but there should be very few empty sentences (or none). This ties in with another point, so wait till the end of the post ;-)

(b) Bizarre twist is bizarre. I know why it happened though. Look at the log on http://tatoeba.org/jpn/sentences/show/164914 Jeroen changed the Japanese sentence to a Dutch sentence (don't know why). This left the index data intact.

(c) Yeah, I've mentioned that deleting Japanese sentences should also delete associated index data.

> Does anyone want to see the list of the 41?

I do.

(d) As long as you post the Japanese sentence ID for those sentences here.

Now the point I promised earlier. I think the code / SQL used to generate wwwjdic.csv needs to be changed in the following way:

* Include all sentences marked Japanese that have associated index data.
* Link to the English sentence (singular) given by the 'meaning' field in the index data.

I think that would eliminate many of the empty sentences and multiple outputs of the current version. NOTE that I'm currently around 3/4 finished doing a first check-up of the jpn_index.csv data so I would recommend Trang wait until I've sent that back in before doing any re-export.

{{vm.hiddenReplies[624] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
JimBreen JimBreen 2010年4月27日 8:14 2010年4月27日 8:14 link permalink

(d) OK

85518 - dropped a trailing よ
151863 - 私達 -> 我々
152372 - 肉屋 -> 肉 や
165426 - added {町はずれ}
197934 - removed {はな} since you changed it to 鼻

TRANG TRANG 2010年4月26日 14:05 2010年4月26日 14:05 link permalink

Ah yes, that too. I took them out because sysko told me it was not CSV compliant to have quotes around numbers, which makes sense because quotes indicate "this is a string" (vs. "this is an int").

But of course, if it's easier for you to have the quotes, I can put them back (when I get back home).

{{vm.hiddenReplies[602] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
JimBreen JimBreen 2010年4月26日 14:14 2010年4月26日 14:14 link permalink

Yes, please.

It would make it a LOT simpler if there was a consistent inter-field sequence. Having ; alone gives me a problem as they occur often in the middle of English sentences. (I don't care whether a field is numeric or not - they are all just strings to me.)

{{vm.hiddenReplies[603] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
sysko sysko 2010年4月26日 14:17 2010年4月26日 14:17 link permalink

so TAB wouldn't be easier to parse?

{{vm.hiddenReplies[604] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
JimBreen JimBreen 2010年4月27日 0:51 2010年4月27日 0:51 link permalink

TAB is fine. In fact I convert ";" to TAB as a first step.

{{vm.hiddenReplies[621] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
blay_paul blay_paul 2010年4月27日 4:49 2010年4月27日 4:49 link permalink

TAB is fine with me too (not that that matters much ;-)

blay_paul blay_paul 2010年4月26日 14:18 2010年4月26日 14:18 link permalink

The theory is that no ; inside text field delimiters is to be counted as a field separator in .csv format. Having said that I don't care at all whether the ID numbers are handled as 'field' or 'text' and I'm sure nobody else would have trouble converting field type if required.

I did find the \" notation fiddly though, so I'm glad that's gone.

blay_paul blay_paul 2010年4月26日 13:31 2010年4月26日 13:31 link permalink

You can go ahead. It's the index (and 'meaning' field) that I'm working on so just bear in mind that any changes you make to the index data will be overwritten later (unless you let me know about them).

TRANG TRANG 2010年4月22日 23:03 2010年4月22日 23:03 link permalink

Okay, I've updated the jpn_indices.csv. If you needed also the wwwjdic.csv file to be updated, you'll have to wait until Saturday...

I forgot we still haven't finished configuring everything on the new server, I can't export into CSV from there yet. I'd have to import the database into my local version, then do the export into CSV, then re-export if I wanted to update the CSV (which I did for the jpn_indices).

Well, as a result, the usual 9AM Saturday exports might be delayed to the evening.

{{vm.hiddenReplies[563] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
blay_paul blay_paul 2010年4月22日 23:09 2010年4月22日 23:09 link permalink

No worries. Most of what I'm doing only needs the jpn_indices.csv itself, the rest can probably wait until Saturday. I'm off to sleep myself now, so I'll give you a progress report at the end of tomorrow or something. :-)

TRANG TRANG 2010年4月22日 18:58 2010年4月22日 18:58 link permalink

Yes, I'll do this some time before going to sleep (and I sleep early don't worry).

CK CK 2010年4月24日 4:10, 編集 2019年10月25日 8:01 2010年4月24日 4:10, 編集 2019年10月25日 8:01 link permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[579] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
TRANG TRANG 2010年4月24日 9:48 2010年4月24日 9:48 link permalink

It's interesting you're suggesting this now, because I'll be meeting my team in a couple of hours and this is one thing we are going to discuss :)

My current idea, instead of using votes, is to use the adoption system. There aren't enough active members for a vote system to run smoothly, so it is not worth yet being implemented.

However, we can already have a quick first stage of proof-reading using the features we already have, that is, by encouraging users to adopt "orphan sentences" (those that don't belong to anyone). While adopting, of course, you would correct mistakes if there are any. But you can simply adopt in a way to say "I've seen this sentence, it's correct".

To make the whole process quicker, I was thinking of using the lists. I could generate lists of 200 sentences and add orphan sentences to this list. These sentences would be automatically attributed to a certain user while being added to the list. This user then can simply read the sentences and remove them from the list as they are being checked/corrected.

This would require very little coding, so it would be a good solution until we get to integrate a "vote" system of any kind.

{{vm.hiddenReplies[580] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
blay_paul blay_paul 2010年4月25日 15:46 2010年4月25日 15:46 link permalink

I, also, don't think there are enough members for a vote system to work. However I do think that there should be a way for new sentences and changed sentences to be 'validated'.

When changing (or adding) a sentence it is marked 'unvalidated' until someone (other than the person who changed it) confirms the change as valid. There should also be a way to view "Unvalidated sentences (in language X)".

It would also be nice if recently validated sentences (say from the last week) in a certain language could be viewed.

TRANG TRANG 2010年4月24日 9:49 2010年4月24日 9:49 link permalink

> I love what you are doing with this website

Thanks =) We love what we are doing as well ;)

blay_paul blay_paul 2010年4月23日 21:29 2010年4月23日 21:29 link permalink

Work in progress

There are currently lots of index records that need altering because words have been added / removed / changed in Edict.

For example
* headwords that were unique may no longer be unique (so readings need to be added).
* Entries that were once two separate dictionary records may have been merged into one (so indexing needs to be changed to one keyword only as well)
* Adding, removing and merging words may leave the |1, |2, ... notations needing an update.
* There is also a great deal of general checking to do.

{{vm.hiddenReplies[570] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
JimBreen JimBreen 2010年4月24日 0:46 2010年4月24日 0:46 link permalink

The first two of these can be checked automatically, although it's not a small task. Re the "|1, |2, ... notations", are they still needed?

{{vm.hiddenReplies[572] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
blay_paul blay_paul 2010年4月24日 3:46 2010年4月24日 3:46 link permalink

> The first two of these can be checked automatically,
> although it's not a small task.

Probably best not done automatically. Checking like that always brings up a bunch of things to check/correct in Edict. (c.f. today's, and yesterday's submissions ;-)

It also familiarises me with the words/readings in use in the examples (all 28,000 odd of them :-P

> Re the "|1, |2, ... notations", are they still needed?
I still need them (or something equivalent). For sanity checking, as much as anything. Perfectionists (and people using Microsoft products) still need them. (No comments about the intersect being a null set, please)

blay_paul blay_paul 2010年4月23日 21:25 2010年4月23日 21:25 link permalink

Unique identification of a JMDICT entry.

This is technical stuff, only really of interest for people who want to deal with the 'index' field of Japanese sentences.

The obvious way to identify a dictionary entry as used in WWWJDIC + Edict is by the entry number (duh!). However that's not how it's done in the index field. Why? Because 2147630 is not 'human friendly' for whoever is creating and editing the index fields. (i.e. you can't look at 2147630 and know what word it refers to)

You could identify by the headword of the dictionary entry - あっという間に will only match one record. However there are around 3,000 dictionary entries where that will not be enough. 前(まえ) is not the same entry as 前(ぜん). So, for ambiguous kanji headwords, you include the reading of the word as well.

You've now reached the basic method used by Jim when he links WWWJDIC to example records. However that is not enough to identify 100% of the entries in JMdict uniquely. There are kana headwords like 'は' that are present in more than one entry.

はあ; は (int) (1) yes; indeed; well; (2) ha!; (3) what?; huh?; (4) sigh

は (prt) (1) (pronounced わ in modern Japanese) topic marker particle; (2) indicates contrast with another option (stated or unstated); (3) adds emphasis; (P)

Note that the second [EX] link in the はあ; は entry is actually for the particle は. There are a few like this that can only be fixed by re-doing the indexing system that Jim uses (in a way that would be more complicated and slower than he wants to deal with).

Now we come to the notation used in Tatoeba (and that I used 'at home' when maintaining the Tanaka example collection. Every headword that is not unique has the notation |1, |2, |3, added.

So instead of 前(さき) 前(ぜん) 前(まえ)
you might have 前|1(さき) 前|2(ぜん) 前|3(まえ)

Some points to note:
* The numbers are assigned in order by JMDict entry ID. So 前|1(さき) (Entry 1387210) comes before 前|2(ぜん) (Entry 1392570) and 前|3(まえ) (Entry 1392580)
* Because I used Access (and Excel) some characters are treated as 'equivalent' by Microsoft that are not actually identical. So ヽ|1 ヾ|2 ゝ|3 ゞ|4 and 々|5 all need to be distinguished by the numerical notation.
* The order of headwords / readings is significant in JMDict - most common headwords are supposed to come first, least common last. When indexing the preference is to use the first headword / reading when possible.

NOTE that the point of the index data is to uniquely identify a dictionary entry, NOT to reflect 100% accurately the dictionary form and reading of the word being indexed.

e.g. If you used 挙がります in a sentence the index would include
上がる{挙がります}
it would not include
挙がる{挙がります}

Apart from anything else this ensures that there is only one [EX] link from dictionary entries.

Finally the square brackets note the sense of the word
上がる[02] = Second sense of the word 上がる
and the curly brackets show the /exact match/ for the indexed word in the sentence.

僕は学校の成績が上がった。
僕|2(ぼく)[01] は|1 学校 乃{の} 成績 が 上がる{上がった}

*Psst* Trang - maybe this should be on a page somewhere?

blay_paul blay_paul 2010年4月23日 20:44 2010年4月23日 20:44 link permalink

Busy signal

Just to note that I'm going to be really busy with fixing lots and lots of stuff with the Japanese Index field for the next few days.

If you left a comment on a sentence you'd like me to read, please PM me because I won't have time to deal with it now.

CK CK 2010年4月22日 15:58, 編集 2018年12月3日 23:18 2010年4月22日 15:58, 編集 2018年12月3日 23:18 link permalink

[removed by CK, since this no longer applies]

{{vm.hiddenReplies[557] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
blay_paul blay_paul 2010年4月22日 16:17 2010年4月22日 16:17 link permalink

Just looking at a Google search on
"と言った(の)も同然だ"
it looks like the version with の (the second one) should be got rid of.

CK CK 2010年4月22日 15:14, 編集 2019年10月25日 8:02 2010年4月22日 15:14, 編集 2019年10月25日 8:02 link permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[555] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
JimBreen JimBreen 2010年4月22日 15:22 2010年4月22日 15:22 link permalink

Hmmm. "He adopted a war orphan and is bringing her up as a foster daughter."

Comment on the sentence itself and Francis will get a message.

blay_paul blay_paul 2010年4月18日 15:48 2010年4月18日 15:48 link permalink

Bump.

Examples.gz, how often is it updated?

On the Monash FTP Archive page it says:

[...] It is updated daily from the server site.
* examples.gz (8371412 bytes) the file.

But I think that is no longer accurate. The one I've just downloaded hasn't been updated from a week ago. Also could the ID numbers given in examples.gz be from the Japanese sentences, not the English sentences? The English ones aren't unique so the IDs are pretty much useless.

{{vm.hiddenReplies[525] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
TRANG TRANG 2010年4月18日 17:06 2010年4月18日 17:06 link permalink

You'll have to ask this to Jim for this because I don't think anyone else has the answer here ^^
It may be faster to simply send him an email...

{{vm.hiddenReplies[527] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
blay_paul blay_paul 2010年4月18日 17:14 2010年4月18日 17:14 link permalink

Could you make available a download with the stuff Jim uses for WWWJDIC? i.e.
a) All Japanese sentences that have an Index field linked. (With sentence number of Japanese sentences)
b) All English fields that are mentioned in the 'Meaning' field. (With sentence number of English sentences)
c) All index fields. (With 'Meaning' field).

{{vm.hiddenReplies[529] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
TRANG TRANG 2010年4月18日 17:20 2010年4月18日 17:20 link permalink

I haven't had time to update the "downloads" page yet, but the file data Jim uses can be downloaded here:

http://tatoeba.org/app/webroot/files/downloads/
(wwwjdic.csv)

The fields are:
jpn_sentence_id, eng_sentence_id, jpn_text, eng_text, jpn_index

{{vm.hiddenReplies[530] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
blay_paul blay_paul 2010年4月19日 20:57 2010年4月19日 20:57 link permalink

Just one more question - how often do you update the files on the download page?

{{vm.hiddenReplies[544] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
TRANG TRANG 2010年4月20日 9:46 2010年4月20日 9:46 link permalink

On this page:
http://tatoeba.org/app/webroot/files/downloads/
Once a week. On Saturdays around 9AM France time.

On the download page that you can access from the link at the bottom, never. I have to update that page though, to link to the files in http://tatoeba.org/app/webroot/files/downloads/.

blay_paul blay_paul 2010年4月18日 18:05 2010年4月18日 18:05 link permalink

That should do nicely. Thanks.

The other thing is that I'd like to do a complete check and revamp of the index field. To be certain of not losing any data I'd need you to lock the index field so I can download / fix up / upload without any changes happening on your side.

I'm still working on things so I probably won't be ready for a week or so.

{{vm.hiddenReplies[531] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
JimBreen JimBreen 2010年4月19日 7:48 2010年4月19日 7:48 link permalink

Can you keep me in the loop. I change the odd index when making a correction to a Japanese sentence. Also, when I do the weekly download I run it though a utility that checks that the index and sentence agree. That way I can detect when others have changed a sentence. I usually have to update the index and occasionally add to the list of names to be ignored (e.g. ムーリエル and 赤ずきん this week.)

In addition, I have a list of words from Collin McCulley which had mismatches between the index and dictionary. I have cleaned up most of them, but still have ~100. We need some way of tracking when they get out of kilter, mainly when dictionary entries need qualifying.

TRANG TRANG 2010年4月19日 19:14 2010年4月19日 19:14 link permalink

@Paul, yes, I can easily block the access to the indices to everyone.

JimBreen JimBreen 2010年4月19日 7:38 2010年4月19日 7:38 link permalink

I download the file Trang sets up once a week. I check it over then set it up in the WWWJDIC system. At that stage the "examples.gz" file, etc. is rebuilt.

JimBreen JimBreen 2010年4月19日 7:59 2010年4月19日 7:59 link permalink

I missed the bit about the ID numbers. I use the English sentence number because 90%+ of the corrections coming from WWWJDIC users are to the English sentence, so it makes sense for WWWJDIC to link there.

As discussed on another forum, I could put in both - e.g.#ID=375963_12345. I have a major change half done in WWWJDIC which is blocking other changes. Once it is clear (maybe a week or so) I can make that sentence number change. I'll enable WWWJDIC users to select whether they want to link to the Japanese or English.

blay_paul blay_paul 2010年4月18日 21:22 2010年4月18日 21:22 link permalink

Example export system.

Japanese sentences with multiple English translations are (sometimes?) being exported with both versions.

For example:

4924 73899 「これが探していたものだ」と彼は叫んだ。 "This is what I was looking for," he exclaimed. 此れ[01]{これ} が 探す{探していた} 物|1(もの)[01]{もの} だ と|1 彼|2(かれ)[01] は|1[01] 叫ぶ{叫んだ}
4924 1513 「これが探していたものだ」と彼は叫んだ。 "This is what I was looking for!" he exclaimed. 此れ[01]{これ} が 探す{探していた} 物|1(もの)[01]{もの} だ と|1 彼|2(かれ)[01] は|1[01] 叫ぶ{叫んだ}

On the sentence annotations page only 73899 is given as the 'meaning' field with the Index field. So either a) The meaning field in the sentence annotations page isn't used, or b) There can be two or more 'meaning' field values, but only one is shown on the sentence annotations page.

{{vm.hiddenReplies[534] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
JimBreen JimBreen 2010年4月19日 7:52 2010年4月19日 7:52 link permalink

I've noticed that, and I have assumed it wasn't used. When I notice cases such as that one, I have changed the English so they become identical, and hope it will lead to the removal of one of them

{{vm.hiddenReplies[538] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
blay_paul blay_paul 2010年4月19日 9:13 2010年4月19日 9:13 link permalink

That's OK for WWWJDIC, and for cases where one is mistaken or they are both very similar.

It's not good 'Tatoeba practice' though.

{{vm.hiddenReplies[540] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
JimBreen JimBreen 2010年4月19日 9:27 2010年4月19日 9:27 link permalink

In the case you quoted I thought it *was* the Tatoeba practice. They only differ by an exclamation mark.

Where they differ in more substantial ways, e.g. choice of personal pronoun where the Japanese has none, I guess there is a case for both being kept, and that's a situation where an index is best tied to a sentence-pair rather than just to the Japanese.

Tying to sentence pairs has another problem. The number of sentences in German and French is getting to the stage where it would be nice to include them in WWWJDIC. I'd be looking to having "examples_de" and examples_fr" extracts along with indices.

{{vm.hiddenReplies[541] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
blay_paul blay_paul 2010年4月19日 9:38 2010年4月19日 9:38 link permalink

> In the case you quoted I thought it *was* the Tatoeba
> practice. They only differ by an exclamation mark.

The case I quoted, yes. There are, however, at least a handful where both alternatives are valid, significantly different, and illustrate something about the English language.