menu
Tatoeba
language
Đăng ký Đăng nhập
language Tiếng Việt
menu
Tatoeba

chevron_right Đăng ký

chevron_right Đăng nhập

Duyệt

chevron_right Hiện câu ngẫu nhiên

chevron_right Duyệt theo ngôn ngữ

chevron_right Duyệt theo danh sách

chevron_right Duyệt theo thẻ

chevron_right Duyệt âm thanh

Cộng đồng

chevron_right Tường

chevron_right Danh sách thành viên

chevron_right Ngôn ngữ thành viên

chevron_right Người bản xứ

search
clear
swap_horiz
search
Vortarulo Vortarulo 18 tháng 2, 2013 22:04:43 UTC 18 tháng 2, 2013 flag Report link Permalink

Forgive me, if this has already been discusses.
I just happened to see these two French phrases:

http://tatoeba.org/deu/sentences/show/1048946 (Viens ici !)
http://tatoeba.org/deu/sentences/show/1275472 (Viens ici !)

The difference is the width of the space before the exclamation mark. In the first sentence it's a shorter space while in the second one it's longer. The visibility of this depends on the font.
The problem with those sentence is, that the automatical merging script won't merge them, because they're different for it. A similar problem occurs of course with typographical vs. simple apostrophes (’ vs. ') and between different kinds of quotation marks ("example" vs. „example“ in German). Or dashes as in Russian, the long — vs. the short -.

I have heard that French keyboards (or text editors?) sometimes automatically create the obligatory space before ! and ?, but not everyone writes French with a French keyboard or text editor, so this might explain why these spaces are sometimes wider, sometimes shorter.
I wonder if it was a good idea for a script to either automatically change one of those spaces to fit the other, or to teach the merging script to treat them the same.

For (German) "quotations" vs. „quotations“ it's perhaps more difficult to implement, also because some people might prefer to use the simple " keys, while others insist in using the orthographically correct „“ ones.
But I suspect that there are a bunch of sentence twins out there, with just this difference.

What's your opinion?

{{vm.hiddenReplies[15640] ? 'expand_more' : 'expand_less'}} ẩn câu trả lời hiển thị câu trả lời
Shishir Shishir 18 tháng 2, 2013 22:28:57 UTC 18 tháng 2, 2013 flag Report link Permalink

About substituting French spaces before questions for the standard ones, it was done once and I guess could be done again.

http://tatoeba.org/spa/wall/show_message/4586

In the case of punctuation in other languages... I guess the same thing could be done, although I'd recommend to do so only in the cases when the current punctuation is plainly wrong. For example, in Spanish it's allowed to use "" instead of «», but it isn't to use - instead of the long — (although, having each of them different uses but both of them being used in Spanish, in this case it wouldn't be possible to change them automatically).

{{vm.hiddenReplies[15641] ? 'expand_more' : 'expand_less'}} ẩn câu trả lời hiển thị câu trả lời
Vortarulo Vortarulo 18 tháng 2, 2013 22:39:02 UTC 18 tháng 2, 2013 flag Report link Permalink

Ah, thanks!
Yes, I think for -, – and — it's a bit more difficult. One would perhaps have to decide for each case and each language distinctly.

As for French: nice, I didn't know such a script was used already. The sentence pair above is actually the first time that I came across these different spaces in otherwise identical sentences.

By the way, a totally different question:
How often is the merger script used? And how long does it take for it to go through the whole database (or just through English or German or Klingon or Esperanto, for instance)? Also, how many sentence pairs get merged in one go, usually? Just asking out of curiosity.