menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
Vortarulo Vortarulo February 18, 2013 February 18, 2013 at 10:04:43 PM UTC flag Report link Permalink

Forgive me, if this has already been discusses.
I just happened to see these two French phrases:

http://tatoeba.org/deu/sentences/show/1048946 (Viens ici !)
http://tatoeba.org/deu/sentences/show/1275472 (Viens ici !)

The difference is the width of the space before the exclamation mark. In the first sentence it's a shorter space while in the second one it's longer. The visibility of this depends on the font.
The problem with those sentence is, that the automatical merging script won't merge them, because they're different for it. A similar problem occurs of course with typographical vs. simple apostrophes (’ vs. ') and between different kinds of quotation marks ("example" vs. „example“ in German). Or dashes as in Russian, the long — vs. the short -.

I have heard that French keyboards (or text editors?) sometimes automatically create the obligatory space before ! and ?, but not everyone writes French with a French keyboard or text editor, so this might explain why these spaces are sometimes wider, sometimes shorter.
I wonder if it was a good idea for a script to either automatically change one of those spaces to fit the other, or to teach the merging script to treat them the same.

For (German) "quotations" vs. „quotations“ it's perhaps more difficult to implement, also because some people might prefer to use the simple " keys, while others insist in using the orthographically correct „“ ones.
But I suspect that there are a bunch of sentence twins out there, with just this difference.

What's your opinion?

{{vm.hiddenReplies[15640] ? 'expand_more' : 'expand_less'}} hide replies show replies
Shishir Shishir February 18, 2013 February 18, 2013 at 10:28:57 PM UTC flag Report link Permalink

About substituting French spaces before questions for the standard ones, it was done once and I guess could be done again.

http://tatoeba.org/spa/wall/show_message/4586

In the case of punctuation in other languages... I guess the same thing could be done, although I'd recommend to do so only in the cases when the current punctuation is plainly wrong. For example, in Spanish it's allowed to use "" instead of «», but it isn't to use - instead of the long — (although, having each of them different uses but both of them being used in Spanish, in this case it wouldn't be possible to change them automatically).

{{vm.hiddenReplies[15641] ? 'expand_more' : 'expand_less'}} hide replies show replies
Vortarulo Vortarulo February 18, 2013 February 18, 2013 at 10:39:02 PM UTC flag Report link Permalink

Ah, thanks!
Yes, I think for -, – and — it's a bit more difficult. One would perhaps have to decide for each case and each language distinctly.

As for French: nice, I didn't know such a script was used already. The sentence pair above is actually the first time that I came across these different spaces in otherwise identical sentences.

By the way, a totally different question:
How often is the merger script used? And how long does it take for it to go through the whole database (or just through English or German or Klingon or Esperanto, for instance)? Also, how many sentence pairs get merged in one go, usually? Just asking out of curiosity.