Menu

Forgive me, if this has already been discusses.
I just happened to see these two French phrases:
http://tatoeba.org/deu/sentences/show/1048946 (Viens ici !)
http://tatoeba.org/deu/sentences/show/1275472 (Viens ici !)
The difference is the width of the space before the exclamation mark. In the first sentence it's a shorter space while in the second one it's longer. The visibility of this depends on the font.
The problem with those sentence is, that the automatical merging script won't merge them, because they're different for it. A similar problem occurs of course with typographical vs. simple apostrophes (’ vs. ') and between different kinds of quotation marks ("example" vs. „example“ in German). Or dashes as in Russian, the long — vs. the short -.
I have heard that French keyboards (or text editors?) sometimes automatically create the obligatory space before ! and ?, but not everyone writes French with a French keyboard or text editor, so this might explain why these spaces are sometimes wider, sometimes shorter.
I wonder if it was a good idea for a script to either automatically change one of those spaces to fit the other, or to teach the merging script to treat them the same.
For (German) "quotations" vs. „quotations“ it's perhaps more difficult to implement, also because some people might prefer to use the simple " keys, while others insist in using the orthographically correct „“ ones.
But I suspect that there are a bunch of sentence twins out there, with just this difference.
What's your opinion?

About substituting French spaces before questions for the standard ones, it was done once and I guess could be done again.
http://tatoeba.org/spa/wall/show_message/4586
In the case of punctuation in other languages... I guess the same thing could be done, although I'd recommend to do so only in the cases when the current punctuation is plainly wrong. For example, in Spanish it's allowed to use "" instead of «», but it isn't to use - instead of the long — (although, having each of them different uses but both of them being used in Spanish, in this case it wouldn't be possible to change them automatically).

Ah, thanks!
Yes, I think for -, – and — it's a bit more difficult. One would perhaps have to decide for each case and each language distinctly.
As for French: nice, I didn't know such a script was used already. The sentence pair above is actually the first time that I came across these different spaces in otherwise identical sentences.
By the way, a totally different question:
How often is the merger script used? And how long does it take for it to go through the whole database (or just through English or German or Klingon or Esperanto, for instance)? Also, how many sentence pairs get merged in one go, usually? Just asking out of curiosity.