menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
FeuDRenais FeuDRenais January 5, 2013 January 5, 2013 at 8:45:26 PM UTC link Permalink

--- Arabic Fonts on Tatoeba ---

I don't know if this is relevant to all languages that use Arabic fonts, but I've noticed that certain letters aren't properly connected with the Trebuchet font used by Tatoeba when it comes to displaying Uyghur.

Would it be possible (I guess I'm asking sysko here) to add a conditional statement in the code to display Uyghur (and maybe other languages if needed) in other font families? For Uyghur, I've known it to be displayed fine in sans-serif.

{{vm.hiddenReplies[15129] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 6, 2013 January 6, 2013 at 2:38:24 AM UTC link Permalink

Does your browser not allow you to specify the fonts it uses? If you tell us which browser you use, perhaps someone will be able to offer a suggestion. If you give us a sentence number and tell us what to look for, you may reap a crowd-source solution (with participation by some who would not recognize a Uyghur sentence without a flag).

From a small acquaintance with Persian I have the impression that the ligature rules for alphabets derived from Arabic differ from language to language. Intuition suggests that the defect you describe is unlikely to be font related. (What I'm thinking is that ligature formation occurs before glyph selection, and fonts don't matter until processing arrives at glyph selection.)

In the same vein I expect Tatoeba's database to be language transparent (except for an occasional inside-or-outside-RTL terminal punctuation mark). If this is the case, then the problem likely arises when sentences are input. It might be useful for you to try contributing a sentence that exhibits the defect and observing whether the problem occurs as you are typing it in.

{{vm.hiddenReplies[15132] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais January 6, 2013 January 6, 2013 at 8:37:57 AM UTC link Permalink

Hello.

It's true that browsers have an effect, and that the problem is partially solved for some fonts in some browsers. Some fonts are, however, more robust than others. Sans-serif, for example, tends to display Uyghur without issue in most, if not all, modern browsers. I suppose my suggestion would be to just use sans-serif for Uyghur since this avoids a lot of issues (perhaps other font-families are even better though).

Perhaps a nicer and easier fix would be to specify the fonts of the different textarea elements instead of using the defaults. Drafting a message in Uyghur is a total nightmare here, as it comes out looking very ugly while being typed.

An example that doesn't work in Firefox would be here: http://tatoeba.org/eng/sentences/show/2125035

In Chrome, the same example comes out fine (though ugly) for the main sentence but is still displayed wrong on the right-hand-side logs.

{{vm.hiddenReplies[15133] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed January 6, 2013 January 6, 2013 at 9:45:43 AM UTC link Permalink

It seems that Tatoeba uses Georgia (belongs to serif typeset) font for displaying main sentence. I'm not sure if this particular font contain all Unicode characters, so browsers may fallback to the font of last resort displaying non-ASCII characters. As different browsers use different font rendering engines, the effect will differ from one browser family to another. In particular, in WebKit based browsers and in the ones based on Gecko the same text will look different.
The only way out I see is to use different font families to display sentences in different alphabets, though it may be kinda tricky task.

{{vm.hiddenReplies[15134] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais January 6, 2013 January 6, 2013 at 10:34:28 AM UTC link Permalink

>> The only way out I see is to use different font families to display sentences in different alphabets, though it may be kinda tricky task.

Right, exactly. I've done this for a site before, and it requires (at least for a novice like me) conditionals in the css commands. In my case, I just have a database table for the languages where each one can have an appropriate font family defined. Conceptually, it's easy, but it would probably require a bit of going through the code to set appropriate families. Might also require database calls which might be tricky depending on how the Tatoeba database is organized (I think I remember Trang telling me that Tatoeba was just one huge table, so maybe that would be tricky...)

The textarea fix would be nice, though, and could probably be done by setting the font family in the general css file (to sans-serif - my recommendation ;-)

(if you have time, of course, sysko)

{{vm.hiddenReplies[15135] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais January 6, 2013 January 6, 2013 at 10:52:18 AM UTC link Permalink

This is what I mean by textarea things being ugly the way it's set up now:

http://www.ccapprox.info/fontscreen1.png

and the difference if you override the textarea font family to make it sans-serif:

http://www.ccapprox.info/fontscreen2.png

Of course, these things might vary with browsers/computers, but just as a suggestion...

{{vm.hiddenReplies[15136] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko January 6, 2013 January 6, 2013 at 12:59:15 PM UTC link Permalink

actually that can be done by doing something like this (on my side, I'll do some test on the new tatoeba code)


<span class="xxx">I'm eating an apple</span>


with xxx being the language ISO code, so that it would be possible to override the font on a "by language" basis, I think that the most elegant solution

{{vm.hiddenReplies[15141] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais January 6, 2013 January 6, 2013 at 1:04:02 PM UTC link Permalink

True. That would work nicely.

Thanks!

{{vm.hiddenReplies[15142] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed January 6, 2013 January 6, 2013 at 5:07:39 PM UTC link Permalink

I've created a simple test page that displays http://tatoeba.org/eng/sentences/show/2125035 using different fonts:
http://jplangtools.com/tatoeba/tatoeba.html

To my untrained eye, there are no much differences :-)

Btw, as halfb1t noticed, sans-serif (as well as serif, monospace, etc.) is not a font itself but rather a typeset. Usually, it can be changed in the application settings to any font you like best.

{{vm.hiddenReplies[15150] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais January 6, 2013 January 6, 2013 at 5:38:14 PM UTC link Permalink

Quite a noticeable difference, if you ask me. Arial and sans-serif are the only ones that display the sentence correctly.

Thank you for doing the test, by the way.

{{vm.hiddenReplies[15151] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed January 6, 2013 January 6, 2013 at 5:43:51 PM UTC link Permalink

UKIJ fonts were taken from here
http://www.ukij.org/fonts/
There are a lot of other fonts there.

FeuDRenais FeuDRenais January 6, 2013 January 6, 2013 at 5:47:16 PM UTC link Permalink

(Well, in Firefox/IE. In Chrome/Opera, Arial and sans-serif are still good, but a few of the other fonts are good/acceptable as well).

Anyway, Arial or sans-serif for Uyghur would be my request if you're going to implement this, sysko.

halfb1t halfb1t January 7, 2013 January 7, 2013 at 2:05:31 AM UTC link Permalink

The test page can be made to look either all good or mostly bad by browser font configuration.

{{vm.hiddenReplies[15160] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed January 7, 2013 January 7, 2013 at 7:33:31 AM UTC link Permalink

You're absolutely right. But we can't and we shouldn't rely upon browser font configuration, neither default not user made. Default configurations may be not optimal (read "ugly") and a user may have no idea how to change it (and, actually, many if not the most of users just don't suspect that font configuration exists). That is, talking about Tatoeba, it's necessary to determine what font family is best to display this or that alphabet and explicitly instruct browsers through CSS to use it.

sacredceltic sacredceltic January 6, 2013 January 6, 2013 at 2:55:26 PM UTC link Permalink

est-ce que ça peut résoudre le fait que les espaces fines me sont toujours invisibles, ce qui est incorrect et m'insupporte au plus haut point ?

{{vm.hiddenReplies[15144] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 7, 2013 January 7, 2013 at 12:30:30 AM UTC link Permalink

Can you give us the number of a sentence that incorporates a thinspace?

{{vm.hiddenReplies[15159] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 7:30:28 AM UTC link Permalink

http://tatoeba.org/epo/sentences/show/2113836

Here, I don't see the thin spaces at all with Chrome on MacOSX.
With Chrome on Windows, I see ugly squares in place of the thin spaces when the sentence is the main, and I don't see anything when it is a linked sentence...

{{vm.hiddenReplies[15163] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 7:50:27 AM UTC link Permalink

...and they also appear as ugly squares on iOS.

{{vm.hiddenReplies[15165] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko January 7, 2013 January 7, 2013 at 2:07:57 PM UTC link Permalink

on android too, because the operating system has a much more limited set of fonts and has they are made by US companies, they didn't take care about that...

{{vm.hiddenReplies[15171] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 2:24:22 PM UTC link Permalink

I wish I could boycott anything that imposes local standards to the world...the time will come !

sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 2:28:11 PM UTC link Permalink

Ceci étant, j'utilise plein d'autres applis qui gèrent du texte en français sous iOS et qui affichent correctement les espaces devant les points doubles. Par exemple : Le Monde sous iOS.
Est-ce à dire qu'ils emploient des espaces entiers ?
Si c'est le cas, Tatoeba doit faire de même...

{{vm.hiddenReplies[15173] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko January 7, 2013 January 7, 2013 at 2:37:04 PM UTC link Permalink

si ce sont des applis natives, elles peuvent certainement embarqué leurs propres polices.

Sinon comme il a été fait remarqué plus haut, on peut a présent avec CSS3 (qui n'était pas encore un standard à l'époque ou l'on a développé le code de tatoeba), aussi "embarqué" la police dans le code css. Ce qui devrait permettre de résoudre le problème, en embarquant pour le français une fonte capable de rendre cette espace.

{{vm.hiddenReplies[15174] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 2:42:51 PM UTC link Permalink

Oui mais alors, comment ça se fait que quand j'impose la fonte dans mon navigateur (j'ai forcé Arial) sous Windows, où j'ai aussi des carrés tout moches, ça ne marche pas non plus ?

sharptoothed sharptoothed January 7, 2013 January 7, 2013 at 3:25:18 PM UTC link Permalink

Please take a look at this test page:
http://jplangtools.com/tatoeba/thinspace.html
What do you see in your browsers?

{{vm.hiddenReplies[15176] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 3:34:44 PM UTC link Permalink

Merci Sharptoothed

On the first 3 lines, I see ugly squares in front of the ?
The 4th line is correct

{{vm.hiddenReplies[15177] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed January 7, 2013 January 7, 2013 at 4:00:18 PM UTC link Permalink

Now I clearly see that the problem is more complex than it seemed. We have to take into account not only the browser type but also the operating system type and version.
The most universal solution is to use HTML entities instead of Unicode symbols. The problem is that this requires on the fly conversion inside the Tatoeba engine.

{{vm.hiddenReplies[15184] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 4:03:39 PM UTC link Permalink

to me, it clearly points to the fact that the W3C hasn't done its job properly...

{{vm.hiddenReplies[15186] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed January 7, 2013 January 7, 2013 at 4:11:41 PM UTC link Permalink

hmmm... actually, W3C did its best at providing various types of HTML entities (see http://www.w3schools.com/tags/ref_symbols.asp ) but all that stuff is more for the coders, not for the end users. Older operating systems seems to have limited support for Unicode and, unfortunately, W3C can't help it.

sysko sysko January 9, 2013 January 9, 2013 at 4:52:22 AM UTC link Permalink

Actually your example with the html entity is not the same as the previous ones as you use a thinspace (U+2009) and the other sentences use a narrow non breakable space (U+202F)

{{vm.hiddenReplies[15251] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed January 9, 2013 January 9, 2013 at 6:58:27 AM UTC link Permalink

Thanks for pointing out, sysko! I suspected that something is wrong :-)
Unfortunately, it seems that there's no separate HTML entity for Narrow No-Break Space so we have to use its HTML code instead (&#8239;) and pray that browser display it right.
I've updated the test page.
http://jplangtools.com/tatoeba/thinspace.html

Amastan Amastan January 7, 2013 January 7, 2013 at 3:37:48 PM UTC link Permalink

I can see every font correctly (Tahoma, Arial, Georgia). I use Google Chrome.

Yal tasefsit zemreɣ ad tt-waliɣ akken iwuta (Tahoma, Arial, Georgia). Sseqdaceɣ Google Chrome.

Je peux voir chaque police correctement affichée (Tahoma, Arial, Georgia). J'utilise Google Chrome.

{{vm.hiddenReplies[15178] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 3:46:05 PM UTC link Permalink

That's really puzzling, since we use the same browser and we don't see the same result...
Is this on Mac OSX ?

sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 3:41:21 PM UTC link Permalink

that's for Chrome on Windows XP

{{vm.hiddenReplies[15179] ? 'expand_more' : 'expand_less'}} hide replies show replies
Shishir Shishir January 7, 2013 January 7, 2013 at 3:51:24 PM UTC link Permalink

I see the same thing that you describe, squares on the first three lines, but not on the fourth one when I use chrome on windows XP.
But I don't see any square when I use firefox or when I use chrome on Windows 7.

{{vm.hiddenReplies[15181] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 3:59:49 PM UTC link Permalink

what a mess...

sharptoothed sharptoothed January 7, 2013 January 7, 2013 at 4:04:01 PM UTC link Permalink

Browsers running on Windows 7 OS seem to work like a charm. My Dolphin brobser (WebKit based) that runs on my Adndroid 4 device displays 1,2 and 4 lines correctly, too.

Amastan Amastan January 7, 2013 January 7, 2013 at 3:55:32 PM UTC link Permalink

Sacredceltic:
J'utilise Windows XP.

I use Windows XP.

{{vm.hiddenReplies[15182] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 4:01:10 PM UTC link Permalink

even more puzzling : same browser, same OS, different effect...

{{vm.hiddenReplies[15185] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed January 7, 2013 January 7, 2013 at 4:16:07 PM UTC link Permalink

It's possible that a localization of particular OS version also matters. And, possibly, the service packs installed, too.

sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 4:32:30 PM UTC link Permalink

@Amastan

which localization of your browser / OS do you use ?

{{vm.hiddenReplies[15190] ? 'expand_more' : 'expand_less'}} hide replies show replies
Amastan Amastan January 7, 2013 January 7, 2013 at 4:54:07 PM UTC link Permalink

Sacredceltic:

My OS uses France (Emplacement: France).

My Google Chrome also uses France as a localization.

{{vm.hiddenReplies[15191] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 4:56:23 PM UTC link Permalink

same as mine...I don't understand the different effect for you and me, then...

{{vm.hiddenReplies[15192] ? 'expand_more' : 'expand_less'}} hide replies show replies
Amastan Amastan January 7, 2013 January 7, 2013 at 5:06:22 PM UTC link Permalink

Have you installed something new on your Mac? :-p Maybe some new program has distrupted your font system :p

{{vm.hiddenReplies[15193] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 5:12:20 PM UTC link Permalink

Here I'm working on a French XP PC machine (I hate it...) with a French up-to-date Chrome.
I use different OS all the time...

halfb1t halfb1t January 7, 2013 January 7, 2013 at 11:32:43 PM UTC link Permalink

On Windows 7, the thinspaces, including &thinsp; render as normal spaces on Firefox, Chrome, and Opera.

&thinsp; renders correctly on all three browsers in this page: http://www.robinlionheart.com/stds/html4/spchars.

{{vm.hiddenReplies[15201] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 11:40:29 PM UTC link Permalink

How is it dependent on the OS, rather than on the browser fonts ?

{{vm.hiddenReplies[15202] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 8, 2013 January 8, 2013 at 1:30:44 AM UTC link Permalink

Browser access to fonts is via the OS (if only through the file system), and the names by which fonts are known to applications is under OS control.

sharptoothed sharptoothed January 8, 2013 January 8, 2013 at 8:13:00 AM UTC link Permalink

Nice page, by the way!
I've made a test page based on it.
http://jplangtools.com/tatoeba/spaces.html

{{vm.hiddenReplies[15215] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 8, 2013 January 8, 2013 at 8:25:43 AM UTC link Permalink

Cool.

Since we started with FeuDRenais's troublesome Uyghur, perhaps it would be nice to include the sample sentence he pointed us to.

A freely available glyph-rich serif font is Charis SIL.

Arial and Times New Roman can be expect to be different on Windows than other OS's. After releasing them to the world, Microsoft updated them, adding many more glyphs. This, I expect, is why SacredCeltic gets boxes for thinspaces on his Mac, while thinspaces display fine for me.

{{vm.hiddenReplies[15216] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed January 8, 2013 January 8, 2013 at 9:06:21 AM UTC link Permalink

This is the page that demonstrates Uyghur sentence rendered using different fonts:
http://jplangtools.com/tatoeba/national.html
UKIJ fonts are being uploaded via CSS on the fly and the other fonts are those the OS contains. There are also "sans" and "sans-serif" lines included so one can see what font his browser actually uses to display those typesets.

Charis SIL font is really great but it's too big in size to be downloaded on-the-fly via CSS. So one have to install it manually and this may be a problem.

{{vm.hiddenReplies[15220] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 8, 2013 January 8, 2013 at 10:00:08 AM UTC link Permalink

The reason I suggested the combination is that both the thinspace problem SC has and the bad-glyph problem FeuDRenais has are the same: font-substitution breakdowns. Web fonts mask the problem without really helping; since, as you point out, they are impractical as a general solution.

Charis SIL may be (a good portion of) the serif part of a complete solution. One way or another, users need to have large fonts on their machines to display Chinese and Kanji. The problem is to find relatively painless, fool-proof recipes that users can execute.

Can you tell if Tatoeba's thinspaces are really thinspaces or the narrow-no-break-spaces they really ought to be?

{{vm.hiddenReplies[15223] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed January 8, 2013 January 8, 2013 at 11:51:47 AM UTC link Permalink

Will this test page help?
http://jplangtools.com/tatoeba/fonts.html

> Can you tell if Tatoeba's thinspaces are really thinspaces or the narrow-no-break-spaces they really ought to be?

Absolutely not. To tell the truth, I don't really care if some space is thin or not-so-thin or if some dash is long or not-so-long. It doesn't prevent me from understanding the sentence. :-)

{{vm.hiddenReplies[15225] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 8, 2013 January 8, 2013 at 12:03:36 PM UTC link Permalink

It's too busy for me to say for sure right away; but yes, I think it will be very helpful; and I plan to look it at in a variety of environments.

Thanks.

alexmarcelo alexmarcelo January 8, 2013 January 8, 2013 at 12:05:28 PM UTC link Permalink

Have you tried Calibri?
http://www.lucasfonts.com/filea...ri-charset.gif

{{vm.hiddenReplies[15227] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed January 8, 2013 January 8, 2013 at 12:15:06 PM UTC link Permalink

Calibri seems to look nice though I prefer Arial.
I've added Calibri to the page. (maybe it worth writing some script that changes fonts on the fly since the page gets too long. :-))

alexmarcelo alexmarcelo January 8, 2013 January 8, 2013 at 12:23:02 PM UTC link Permalink

Thanks! I love that font. You are using justified text, aren't you? Maybe it would be better not to, because spaces are usually stretched or compressed...

{{vm.hiddenReplies[15229] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed January 8, 2013 January 8, 2013 at 12:27:18 PM UTC link Permalink

> You are using justified text, aren't you?

ummm, nope, no justified alignment is used. English example just contains different type of spaces. :-)

{{vm.hiddenReplies[15231] ? 'expand_more' : 'expand_less'}} hide replies show replies
alexmarcelo alexmarcelo January 8, 2013 January 8, 2013 at 12:32:52 PM UTC link Permalink

Oh, I see! Thanks! :-)

halfb1t halfb1t January 8, 2013 January 8, 2013 at 3:20:37 AM UTC link Permalink

With Windows 7, Chrome, Firefox, Opera, and my own HTML/CSS, I see (1) No difference between &thinsp; and the Unicode thinspace character. (2) Differences between the thinness of the thinspace character from font to font--with some fonts it's quite difficult to see the difference between thinspaces and spaces--and of course the difference is diminished at smaller sizes. (3) Differences, even with the same font specified, from browser to browser. I attribute this to differing font substitutions. To test the thesis, I specified Lucida Sans Unicode--which includes the thinspace character--and the thinspaces displayed the same on all three browsers.

The appearance of a square indicates the browser found no thinspace character in any font. Where you have the option of installing Lucida Sans Unicode, that may help.

When I examine the source delivered from http://www.lemonde.fr/ I find ASCII spaces before and after colons. I don't know what to make of that.

{{vm.hiddenReplies[15208] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 8, 2013 January 8, 2013 at 7:32:54 AM UTC link Permalink

Thank you for the detailed explanation.
Alas, Lucida Sans Unicode doesn't seem standard (if anything is across platforms...)
But I think Tatoeba should take that in charge, not the user.

Spaces are an ESSENTIAL part of the reading experience, and in French, an essential part of the reading of double-points, whether thin or plain (as is the case with other languages such as Romanian as was also the case in English, German, ...before the advent of computers and the crunching of the spaces : look at old book editions if in doubt...)

In French, one often cannot apply the correct intonation to the start of a question if one's eyes have not quickly spotted a question mark at the end of the sentence, regardless of the length of the sentence, and this quick spotting is not possible if the question mark is stuck to the last word, as it might be confused with another high letter, in a grotesque, unreadable combination that make one's reading experience just miserable.

That is why professional publishers of ALL French books, magazines and newspapers insert spaces (thin or NOT) before double-points, and have been doing so for centuries, so that French readers do NOT know any other way of reading, since they have grown with it.

I think Tatoeba should reflect French standards when displaying French sentences, regardless of the standards applied to other languages.
It pains me that it doesn't and I have been complaining about it ever since I joined.
If thin spaces cannot be properly rendered on Tatoeba, then standard spaces should be displayed instead and not automatically substituted afterwards for thinspaces, the same way that professional publishers in French use when confronted with the situation.

liori liori January 6, 2013 January 6, 2013 at 3:26:11 PM UTC link Permalink

It would be better to use the “lang” attribute instead, which was designed for this purpose. It is possible to apply CSS rules by matching an arbitrary html attribute, including “lang”. Example: https://dl.dropbox.com/u/52886258/lang.html

{{vm.hiddenReplies[15146] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko January 6, 2013 January 6, 2013 at 3:38:00 PM UTC link Permalink

sure, that did go out of my mind while posting the answer :)

halfb1t halfb1t January 8, 2013 January 8, 2013 at 7:44:45 AM UTC link Permalink

Lucida Grande on the Mac is reputed to be equivalent to Lucida Sans Unicode.

As I've argued elsewhere, Tatoeba _can't_ resolve these issues: all it can do is ship characters, an encoding tag, and CSS. What happens after that depends on the rendering that takes place on our machines; and if we can't completely control it, we sure can mess it up.

{{vm.hiddenReplies[15214] ? 'expand_more' : 'expand_less'}} hide replies show replies
liori liori January 8, 2013 January 8, 2013 at 8:35:30 AM UTC link Permalink

Now that you say this… there is a way to completely control rendering—delivering text as images. If done in addition to normal text, could be useful to at least make it possible for users to double-check if everything's OK on their side.

Well, just throwing an idea into discussion. It seems unfeasible for a short-time solution, and for a long-time one getting CSS and font stuff on the user side correctly would be better…

{{vm.hiddenReplies[15218] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 8, 2013 January 8, 2013 at 8:39:22 AM UTC link Permalink

Right you are. I didn't think of that. And if we eventually end up with a How-To, text-as-images will be very helpful.

halfb1t halfb1t January 8, 2013 January 8, 2013 at 12:46:57 PM UTC link Permalink

Running OS X on a Hackintosh in a Virtual Box under Windows 7. Nothing got Safari to display thinspaces; but Chrome did it for me; and maybe he'll do it for you if you (1) disable the Georgia font family, and (2) set Chrome's Standard Font to Microsoft Sans Serif.

This illustrates my thesis that specifying fonts gets in the way in some environments. OS X's text rendering environment seems to be one of those.

halfb1t halfb1t January 6, 2013 January 6, 2013 at 11:18:29 AM UTC link Permalink

The problem is not letter connection (ligatures), but glyph selection (terminal instead of medial forms).

Can't reproduce the good/bad effect in Chrome. In Firefox (on Windows 7), I can switch between good/bad by changing the default font.

Bad are: Trebuchet, Verdana, MS Sans Serif.
Good are: Segoe UI, Arial, Tahoma.

(Tatoeba should switch the default CSS from Trebuchet to one of the good ones. The Tahoma is very heavy.)

Georgia has no Arabic letters; sans-serif is not a font: it's an HTML fallback. Every browser must do something for sans-serif, but what it does is up to the browser.

{{vm.hiddenReplies[15137] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 11:26:08 AM UTC link Permalink

I tried to change the fonts in the advanced parameters of Chrome. I switched all to Arial. But the squares for thin spaces are still there...

liori liori January 8, 2013 January 8, 2013 at 12:24:37 PM UTC link Permalink

One more thought/idea: it would probably be worth checking how does Wikipedia deal with the problem.

{{vm.hiddenReplies[15230] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 8, 2013 January 8, 2013 at 1:18:44 PM UTC link Permalink

Wikipedia's position is that fonts are your problem. They say on many pages something like "This page contains characters in.... If you don't see them, it's because you don't have the necessary fonts."

{{vm.hiddenReplies[15234] ? 'expand_more' : 'expand_less'}} hide replies show replies
liori liori January 8, 2013 January 8, 2013 at 1:48:44 PM UTC link Permalink

Unless I'm mistaken, in the CSS they just say “font-family: sans-serif” or “font-family: serif”, depending on place (with few additional cases for TeX and source code examples, irrelevant to us). They also provide means for user to check whether the characters displayed on the machine are correct, at least for CJKV [1]. I couldn't however find anything similar for Arabic, except for short notes about few fonts [2].

[1] http://en.wikipedia.org/wiki/He...t_(East_Asian)
[2] http://en.wikipedia.org/wiki/He...l_support#Font

So, Wikipedia opted for default fonts.

{{vm.hiddenReplies[15235] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 8, 2013 January 8, 2013 at 2:16:58 PM UTC link Permalink

But consider this, from the Uyghur Wikipedia:

<div >

This suggests that thoroughly internationalizing Tatoeba's interface may be significantly more difficult than the problems we're wrestling with now.

The links you posted are useful, but the level of detail does not descend to thinspaces, versions of Microsoft Core Fonts, details of browser substitution algorithms, or OS text-rendering environments.

{{vm.hiddenReplies[15237] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 8, 2013 January 8, 2013 at 2:21:24 PM UTC link Permalink

Whoops. The div tag I was trying to quote has this style string: "direction:rtl; font-family: 'UKIJ inchike', 'Alpida_Unicode System', 'UKIJ Tuz Tom', 'Microsoft Uighur', 'uyghur ekran', 'Segoe UI', 'Tahoma'".

{{vm.hiddenReplies[15239] ? 'expand_more' : 'expand_less'}} hide replies show replies
liori liori January 8, 2013 January 8, 2013 at 2:30:36 PM UTC link Permalink

Ah, and this is what I was actually looking for. Tatoeba could probably use it blindly for Uyghur. Great find!

{{vm.hiddenReplies[15241] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 8, 2013 January 8, 2013 at 2:45:12 PM UTC link Permalink

The potential fly in the ointment is that this font spec may be aimed at machines set up to use Uyghur by default; and such machines (and OS's and browsers) may not span the same range as those used by the whole Tatoeba community.

{{vm.hiddenReplies[15242] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais January 9, 2013 January 9, 2013 at 7:11:59 AM UTC link Permalink

I fear that such machines may not really exist...

{{vm.hiddenReplies[15256] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 9, 2013 January 9, 2013 at 9:56:26 AM UTC link Permalink

From a little Web research, it seems you're right. Although a few translation initiatives are in the works, they seem not to be progressing apace.

halfb1t halfb1t January 9, 2013 January 9, 2013 at 9:26:22 AM UTC link Permalink

Here's the font string from the WordPress Uyghur translation effort: 'UKIJ Tuz Tom','Alpida Unicode System','Alkatip Tor','Alp Ekran','Microsoft Uighur',Tahoma,Verdana,Arial,Helvetica.

FeuDRenais FeuDRenais January 8, 2013 January 8, 2013 at 7:35:23 PM UTC link Permalink

All those families, and yet the whole page looks so much nicer if you just put "font-family:sans-serif".

Then again, it's probably worth noting that most Uyghurs are from China, and more than just a few people in China still use IE6. And who knows what looks best when it comes to that mess of a relic...

{{vm.hiddenReplies[15244] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 9, 2013 January 9, 2013 at 12:12:01 AM UTC link Permalink

> the whole page looks so much nicer if you just put "font-family:sans-serif"

> who knows what looks best when it comes to that

These statements are inconsistent, and the first is wrong. What happens "if you just put "font-family:sans-serif" _depends_. This is the crucial point, and understanding that what the effect of font specification is depends on factors forever beyond Tatoeba's control is crucial to understanding what can be done, which is basic to discussing what should be done.

{{vm.hiddenReplies[15246] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais January 9, 2013 January 9, 2013 at 6:03:56 AM UTC link Permalink

I'm sorry, halfb1t. I suppose that I should have stated "the whole page looks so much nicer FOR ME... ON MY COMPUTER... WITH MY BROWSER..."

(I thought those were implicit)

Next time I'll write a wall of text ala you.

My only request here is that Tatoeba try something like sans-serif for all of its Uyghur sentences, since I suspect that this will be better (perhaps much better) than doing nothing (across many computers/browsers).

{{vm.hiddenReplies[15252] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 9, 2013 January 9, 2013 at 6:15:29 AM UTC link Permalink

> I suspect

That's fine; but more helpful would be a little data. Do you have the font Georgia on your computer? What happens if you disable or rename it? What happens if you set your browser's serif font to sans-serif?

{{vm.hiddenReplies[15253] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais January 9, 2013 January 9, 2013 at 7:05:11 AM UTC link Permalink

No clue, haven't tried.

What would you do with this data, though? How could it be made into something implementable?

{{vm.hiddenReplies[15255] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 9, 2013 January 9, 2013 at 7:28:29 AM UTC link Permalink

My experience suggests that there are strict limits on what Tatoeba can accomplish from its end, because the renderings that appear in users' browsers are strongly dependent on the fonts they have installed and the settings they put into their browsers.

The question is "Is my experience too narrow?" Careful reports by others with accurate relevant detail contribute to answering that question;
and when they appear here, they contribute to the community's understanding of the nature of the problem, which of course has important implications for rational solutions.

halfb1t halfb1t January 8, 2013 January 8, 2013 at 2:24:47 PM UTC link Permalink

If Tatoeba opts for specifying fonts per-language, we may be able to crib such specs from Wikipedia, as was suggested by your original remark.

{{vm.hiddenReplies[15240] ? 'expand_more' : 'expand_less'}} hide replies show replies
Balamax Balamax January 8, 2013 January 8, 2013 at 8:38:26 PM UTC link Permalink

Shouldn't these three fonts (Uyghur Arabic, Latin, Cyrilic) appear together automatically for Uyghur, as in the sentences for Uzbek, Japanese or Chinese.

{{vm.hiddenReplies[15245] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 9, 2013 January 9, 2013 at 12:18:22 AM UTC link Permalink

If you're thinking of transliteration you might want to start with the Wikipedia article on Uyghur alphabets.

{{vm.hiddenReplies[15247] ? 'expand_more' : 'expand_less'}} hide replies show replies
Balamax Balamax January 9, 2013 January 9, 2013 at 3:35:27 AM UTC link Permalink

Indeed these three alphabets have equal rights here for every item http://www.uyghurdictionary.org. http://www.omniglot.com/writing/uyghur.htm

{{vm.hiddenReplies[15249] ? 'expand_more' : 'expand_less'}} hide replies show replies
halfb1t halfb1t January 9, 2013 January 9, 2013 at 3:53:18 AM UTC link Permalink

The questions that come to mind are: (1) Which of the five alphabets in the Russian Wikipedia article are to be chosen? (2) Is simple transliteration between all pairs accurate in every case? On the face of it, it seems that transliteration into the Arabic script involves at least the usual initial-medial-final-solitary glyph selection.

These are not insoluble problems, but they need careful identification before they have any chance of being solved. A related and more fundamental issue is support for multiple scripts for a single language in general. This issue has arisen before. I believe it had to do with an Indian language, but I forget which.

The important thing to recognize in this connection is that no solution is likely to be simple.

{{vm.hiddenReplies[15250] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 9, 2013 January 9, 2013 at 9:41:18 PM UTC link Permalink

It's Marathi, which can be written either in Modi or Devanagari alphabets.

halfb1t halfb1t January 7, 2013 January 7, 2013 at 2:07:17 AM UTC link Permalink

The root of the problem is font substitution. The reason Tatoeba's many scripts display as well as they do is font substitution.

There are--as I see it--two very different approaches to ameliorating the problem: (1) Use a glyph-rich font as default. (2) Use a default font (and other fonts) that inject as little information as possible into browsers' font-substitution algorithms.

(1) has numerous problems, which I won't try to enumerate. (2) amounts to specifying no fonts at all.

With (2), feuDRenais should get just what suits him by specifying sans-serif as his default font.

What are the issues? What are the impacts on the internationalization of Tatoeba's interface? On support for additional languages?

In addition to (2), Tatoeba's users would benefit from information: how do I get browser A to correctly display language B? I suggest that such information--readily crowd-sourced from our community--is best organized by browser within language. Sorting first by language also organizes what-fonts-do-need-and-where-can-I-get-them data.

It is conceivable that sans-serif is a better choice than no font specification at all. What might make that true? If some (or most or all) browsers' font-substitution algorithms were simpler in that case, or more predictable, or more similar to one another.

{{vm.hiddenReplies[15161] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais January 7, 2013 January 7, 2013 at 7:41:04 PM UTC link Permalink

>>> With (2), feuDRenais should get just what suits him by specifying sans-serif as his default font.

Not sure if I understood everything you wrote, but if you're implying that a user should just specify the fonts themselves to find what suits them, I disagree. It's not the job of the website user to have to do these things. I'm not so much bothered by Uyghur displaying incorrectly for me personally (since I can still read and understand it) as I am by people who are learning it (in whatever capacity) being forced to read a corrupted version of the script.

A solution that I would propose is to use the specifications that are the most robust over different systems/browsers (sans-serif clearly seems the best for Uyghur so far). If that's what you're saying, too, then cool, we agree.

{{vm.hiddenReplies[15195] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 7, 2013 January 7, 2013 at 8:11:30 PM UTC link Permalink

The problem is, the best solution for one writing system/language is probably not the best for all of them.
So we need to find a system that applies the best option to EVERY SINGLE writing system/language combination...

I'm positively fed up with systems that fit the majority and just neglect all the minorities. They just are NOT acceptable. Democracy is definitely not the rule of the majority only !

{{vm.hiddenReplies[15196] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais January 7, 2013 January 7, 2013 at 8:39:53 PM UTC link Permalink

I think that's what both sysko and liori said, more or less. You just define style rules for each language individually.

It would also be nice to right-align languages that are read right-to-left, but that's probably asking for too much, eh?

{{vm.hiddenReplies[15197] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed January 7, 2013 January 7, 2013 at 9:07:45 PM UTC link Permalink

> It would also be nice to right-align languages that are read right-to-left, but that's probably asking for too much, eh?

Actually, in Gecko-based browsers (Firefox, Seamonkey, etc.) RTL sentences are displayed right-aligned. But it seems this doesn't work for Chrome for some reason.

{{vm.hiddenReplies[15198] ? 'expand_more' : 'expand_less'}} hide replies show replies
MrShoval MrShoval January 7, 2013 January 7, 2013 at 10:57:39 PM UTC link Permalink

In my Chrome the HEB is right adjusted just fine.

{{vm.hiddenReplies[15199] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed January 8, 2013 January 8, 2013 at 8:35:11 AM UTC link Permalink

As for my Chrome on Windows 7, Hebrew sentences looks like this:
http://dl.dropbox.com/u/42772287/Hebrew.png
The same is with Arabic.
What a mess! :-)

{{vm.hiddenReplies[15217] ? 'expand_more' : 'expand_less'}} hide replies show replies
MrShoval MrShoval January 8, 2013 January 8, 2013 at 9:39:11 AM UTC link Permalink

Same page looks neat:
https://www.dropbox.com/s/xt2dm...%202130724.jpg

{{vm.hiddenReplies[15221] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed January 8, 2013 January 8, 2013 at 9:52:02 AM UTC link Permalink

Isn't it strange? :-)

{{vm.hiddenReplies[15222] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US January 9, 2013 January 9, 2013 at 11:30:42 PM UTC link Permalink

When I use Firefox on English-language Windows 7, I see the same behavior as you, sharptoothed. The period incorrectly clings to the right of the Hebrew sentence rather than the left, though only in certain contexts. (If you look at the .png file you posted, you'll see that the period appears correctly in the Hebrew sentence that is indirectly linked to the main one.) I've discussed this with Eldad, but he (like MrShoval) doesn't see the same behavior. Perhaps it depends on the language of the OS.

Eldad Eldad January 7, 2013 January 7, 2013 at 11:14:04 PM UTC link Permalink

In mine, too.

halfb1t halfb1t January 8, 2013 January 8, 2013 at 1:36:53 AM UTC link Permalink

Font substitution can be avoided today only by specifying download-on-the-fly Web fonts. Until glyph-on-demand (on the Ajax model) appears, that remains impractical.

With font substitution in the mix, the rational immediate goal is classical: get to a known state. This is easier said than done; HTML/CSS is inadequate; and no knowable state is independent of OS, browser, and available font set.

The inescapable conclusion is that it is simply not possible to get the ball out of the user's court. It follows that specifying any fonts at all muddies the water.

I'd like to use FeuDRenais's situation as an example. That means I have to do some guessing about his system, so I may be wrong. I take the chance in hope of guessing right and so lending weight to my argument; but in any case the situation I'm going to describe will apply to some systems, even if it fails to apply to FeuDRenais's.

According to sharptooth, the main sentence calls for the font Georgia. Two facts are salient: (1) Georgia lacks Arabic glyphs, and (2) Georgia is a serif font. I'm guessing that all the fonts with good Uyghur glyphs that FeuDRenais has installed are sans-serif fonts and that he has installed a serif font--like Times New Roman--whose Uyghur glyphs are bad. So what happens? His browser looks for Uyghur glyphs in Georgia and doesn't find them. Since Georgia is a serif font, the browser looks for the missing glyphs _in a serif font_ and finds the bad ones in Times New Roman.

If I'm lucky enough to be right, FeuDRenais has (at least) two effective options: (1) He can set his browser's serif font to sans-serif. (2) He can install a serif font with good Uyghur glyphs--like Arab Typesetting--and set his browser's serif font to that.

The point I'm trying to make is that in a situation like this, specifying any particular font complicates the font substitution issue, because attributes of the specified font affect the substitution process.