menu
Tatoeba
language
Registriĝi Ensaluti
language Esperanto
menu
Tatoeba

chevron_right Registriĝi

chevron_right Ensaluti

Foliumi

chevron_right Montri hazardan frazon

chevron_right Foliumi laŭ lingvo

chevron_right Foliumi laŭ listo

chevron_right Foliumi laŭ etikedo

chevron_right Foliumi sonregistraĵojn

Komunumo

chevron_right Muro

chevron_right Listo de ĉiuj membroj

chevron_right Lingvoj de la membroj

chevron_right Denaskaj parolantoj

search
clear
swap_horiz
search
kemushi69 kemushi69 2016-majo-22 2016-majo-22 02:08:51 UTC link Konstanta ligilo

A question about punctuation ... not relevant to "Tatoeba day"

I actually have a few questions:

1. The "em dash"

In typesetting (and in html) there are two types of dash: the "n dash", which usually appears between numbers (eg, 1-9), and the "em dash" that's longer and usually appears between words, generally to indicate a pause (colon or semicolon) or parenthetical sentence. I mean, for example, "late againーwhat a surprise!".

When I use HTML/XML for this kind of thing, I just use the "mdash" HTML glyph, but here I've ended up just switching over to Japanese input mode and entering a hyphen, which get rendered as in my example above.

So my question is whether this is a good/correct way to get that symbol into a sentence?

2. Localisation of quotation marks

In English and Irish (which are my main languages), it's simply a case of using the double-apostrophe key, whereas other languages (eg, Spanish, German) use different characters to delimit quotations.

My question is whether English-style quotes are acceptable in general, or whether the localised text should always follow local rules (eg, 「」 for Japanese) ?

This is more a question about any automatic tools that process the data rather than what I should be typing in (ie, I guess I should always use the correct language-specific punctuation).

3. double-width space characters

If I make a mistake and include a double-width space character (due to being in Japanese input mode) instead of the usual space, will it mess up the value stored in the corpus? Or are these automatically converted to regular spaces (and coalesced, should I hit space twice)?

{{vm.hiddenReplies[26455] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
TRANG TRANG 2016-majo-22 2016-majo-22 18:05:01 UTC link Konstanta ligilo

1. I'd say this is not the correct way. Even though they look similar, they are not detected as the same symbol, and might look different with other fonts.

- http://unicode-table.com/en/30FC/ (the Japanese dash that you've entered in "late againーwhat a surprise!")
- http://unicode-table.com/en/2014/ (the em dash)


2. In theory each language uses its own standard quotes. In practice there could be English quotes in non-English sentences because we don't pre-process and standardize punctation.


3. As said above, we currently don't standardize punctuation so if you enter a double-width space it will be saved as double-width space.

{{vm.hiddenReplies[26464] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
kemushi69 kemushi69 2016-majo-22 2016-majo-22 18:36:04 UTC link Konstanta ligilo

Thanks, I'll have to be careful with points 1 and 3 then.