Menuo
A question about punctuation ... not relevant to "Tatoeba day"
I actually have a few questions:
1. The "em dash"
In typesetting (and in html) there are two types of dash: the "n dash", which usually appears between numbers (eg, 1-9), and the "em dash" that's longer and usually appears between words, generally to indicate a pause (colon or semicolon) or parenthetical sentence. I mean, for example, "late againーwhat a surprise!".
When I use HTML/XML for this kind of thing, I just use the "mdash" HTML glyph, but here I've ended up just switching over to Japanese input mode and entering a hyphen, which get rendered as in my example above.
So my question is whether this is a good/correct way to get that symbol into a sentence?
2. Localisation of quotation marks
In English and Irish (which are my main languages), it's simply a case of using the double-apostrophe key, whereas other languages (eg, Spanish, German) use different characters to delimit quotations.
My question is whether English-style quotes are acceptable in general, or whether the localised text should always follow local rules (eg, 「」 for Japanese) ?
This is more a question about any automatic tools that process the data rather than what I should be typing in (ie, I guess I should always use the correct language-specific punctuation).
3. double-width space characters
If I make a mistake and include a double-width space character (due to being in Japanese input mode) instead of the usual space, will it mess up the value stored in the corpus? Or are these automatically converted to regular spaces (and coalesced, should I hit space twice)?
1. I'd say this is not the correct way. Even though they look similar, they are not detected as the same symbol, and might look different with other fonts.
- http://unicode-table.com/en/30FC/ (the Japanese dash that you've entered in "late againーwhat a surprise!")
- http://unicode-table.com/en/2014/ (the em dash)
2. In theory each language uses its own standard quotes. In practice there could be English quotes in non-English sentences because we don't pre-process and standardize punctation.
3. As said above, we currently don't standardize punctuation so if you enter a double-width space it will be saved as double-width space.
Thanks, I'll have to be careful with points 1 and 3 then.