menu
Tatoeba
language
Regisztráció Bejelentkezés
language Magyar
menu
Tatoeba

chevron_right Regisztráció

chevron_right Bejelentkezés

Böngészés

chevron_right Mutass egy mondatot véletlenszerűen

chevron_right Böngészés nyelv alapján

chevron_right Böngészés lista alapján

chevron_right Böngészés címke alapján

chevron_right Hangfelvételek böngészése

Közösség

chevron_right Üzenőfal

chevron_right Összes tag listája

chevron_right A tagok nyelvei

chevron_right Anyanyelvűek

search
clear
swap_horiz
search
FeuDRenais FeuDRenais 2013. január 5. 2013. január 5. 20:45:26 UTC link Link a hozzászóláshoz

--- Arabic Fonts on Tatoeba ---

I don't know if this is relevant to all languages that use Arabic fonts, but I've noticed that certain letters aren't properly connected with the Trebuchet font used by Tatoeba when it comes to displaying Uyghur.

Would it be possible (I guess I'm asking sysko here) to add a conditional statement in the code to display Uyghur (and maybe other languages if needed) in other font families? For Uyghur, I've known it to be displayed fine in sans-serif.

{{vm.hiddenReplies[15129] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 6. 2013. január 6. 2:38:24 UTC link Link a hozzászóláshoz

Does your browser not allow you to specify the fonts it uses? If you tell us which browser you use, perhaps someone will be able to offer a suggestion. If you give us a sentence number and tell us what to look for, you may reap a crowd-source solution (with participation by some who would not recognize a Uyghur sentence without a flag).

From a small acquaintance with Persian I have the impression that the ligature rules for alphabets derived from Arabic differ from language to language. Intuition suggests that the defect you describe is unlikely to be font related. (What I'm thinking is that ligature formation occurs before glyph selection, and fonts don't matter until processing arrives at glyph selection.)

In the same vein I expect Tatoeba's database to be language transparent (except for an occasional inside-or-outside-RTL terminal punctuation mark). If this is the case, then the problem likely arises when sentences are input. It might be useful for you to try contributing a sentence that exhibits the defect and observing whether the problem occurs as you are typing it in.

{{vm.hiddenReplies[15132] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
FeuDRenais FeuDRenais 2013. január 6. 2013. január 6. 8:37:57 UTC link Link a hozzászóláshoz

Hello.

It's true that browsers have an effect, and that the problem is partially solved for some fonts in some browsers. Some fonts are, however, more robust than others. Sans-serif, for example, tends to display Uyghur without issue in most, if not all, modern browsers. I suppose my suggestion would be to just use sans-serif for Uyghur since this avoids a lot of issues (perhaps other font-families are even better though).

Perhaps a nicer and easier fix would be to specify the fonts of the different textarea elements instead of using the defaults. Drafting a message in Uyghur is a total nightmare here, as it comes out looking very ugly while being typed.

An example that doesn't work in Firefox would be here: http://tatoeba.org/eng/sentences/show/2125035

In Chrome, the same example comes out fine (though ugly) for the main sentence but is still displayed wrong on the right-hand-side logs.

{{vm.hiddenReplies[15133] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sharptoothed sharptoothed 2013. január 6. 2013. január 6. 9:45:43 UTC link Link a hozzászóláshoz

It seems that Tatoeba uses Georgia (belongs to serif typeset) font for displaying main sentence. I'm not sure if this particular font contain all Unicode characters, so browsers may fallback to the font of last resort displaying non-ASCII characters. As different browsers use different font rendering engines, the effect will differ from one browser family to another. In particular, in WebKit based browsers and in the ones based on Gecko the same text will look different.
The only way out I see is to use different font families to display sentences in different alphabets, though it may be kinda tricky task.

{{vm.hiddenReplies[15134] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
FeuDRenais FeuDRenais 2013. január 6. 2013. január 6. 10:34:28 UTC link Link a hozzászóláshoz

>> The only way out I see is to use different font families to display sentences in different alphabets, though it may be kinda tricky task.

Right, exactly. I've done this for a site before, and it requires (at least for a novice like me) conditionals in the css commands. In my case, I just have a database table for the languages where each one can have an appropriate font family defined. Conceptually, it's easy, but it would probably require a bit of going through the code to set appropriate families. Might also require database calls which might be tricky depending on how the Tatoeba database is organized (I think I remember Trang telling me that Tatoeba was just one huge table, so maybe that would be tricky...)

The textarea fix would be nice, though, and could probably be done by setting the font family in the general css file (to sans-serif - my recommendation ;-)

(if you have time, of course, sysko)

{{vm.hiddenReplies[15135] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
FeuDRenais FeuDRenais 2013. január 6. 2013. január 6. 10:52:18 UTC link Link a hozzászóláshoz

This is what I mean by textarea things being ugly the way it's set up now:

http://www.ccapprox.info/fontscreen1.png

and the difference if you override the textarea font family to make it sans-serif:

http://www.ccapprox.info/fontscreen2.png

Of course, these things might vary with browsers/computers, but just as a suggestion...

{{vm.hiddenReplies[15136] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sysko sysko 2013. január 6. 2013. január 6. 12:59:15 UTC link Link a hozzászóláshoz

actually that can be done by doing something like this (on my side, I'll do some test on the new tatoeba code)


<span class="xxx">I'm eating an apple</span>


with xxx being the language ISO code, so that it would be possible to override the font on a "by language" basis, I think that the most elegant solution

{{vm.hiddenReplies[15141] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
FeuDRenais FeuDRenais 2013. január 6. 2013. január 6. 13:04:02 UTC link Link a hozzászóláshoz

True. That would work nicely.

Thanks!

{{vm.hiddenReplies[15142] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sharptoothed sharptoothed 2013. január 6. 2013. január 6. 17:07:39 UTC link Link a hozzászóláshoz

I've created a simple test page that displays http://tatoeba.org/eng/sentences/show/2125035 using different fonts:
http://jplangtools.com/tatoeba/tatoeba.html

To my untrained eye, there are no much differences :-)

Btw, as halfb1t noticed, sans-serif (as well as serif, monospace, etc.) is not a font itself but rather a typeset. Usually, it can be changed in the application settings to any font you like best.

{{vm.hiddenReplies[15150] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
FeuDRenais FeuDRenais 2013. január 6. 2013. január 6. 17:38:14 UTC link Link a hozzászóláshoz

Quite a noticeable difference, if you ask me. Arial and sans-serif are the only ones that display the sentence correctly.

Thank you for doing the test, by the way.

{{vm.hiddenReplies[15151] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sharptoothed sharptoothed 2013. január 6. 2013. január 6. 17:43:51 UTC link Link a hozzászóláshoz

UKIJ fonts were taken from here
http://www.ukij.org/fonts/
There are a lot of other fonts there.

FeuDRenais FeuDRenais 2013. január 6. 2013. január 6. 17:47:16 UTC link Link a hozzászóláshoz

(Well, in Firefox/IE. In Chrome/Opera, Arial and sans-serif are still good, but a few of the other fonts are good/acceptable as well).

Anyway, Arial or sans-serif for Uyghur would be my request if you're going to implement this, sysko.

halfb1t halfb1t 2013. január 7. 2013. január 7. 2:05:31 UTC link Link a hozzászóláshoz

The test page can be made to look either all good or mostly bad by browser font configuration.

{{vm.hiddenReplies[15160] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sharptoothed sharptoothed 2013. január 7. 2013. január 7. 7:33:31 UTC link Link a hozzászóláshoz

You're absolutely right. But we can't and we shouldn't rely upon browser font configuration, neither default not user made. Default configurations may be not optimal (read "ugly") and a user may have no idea how to change it (and, actually, many if not the most of users just don't suspect that font configuration exists). That is, talking about Tatoeba, it's necessary to determine what font family is best to display this or that alphabet and explicitly instruct browsers through CSS to use it.

sacredceltic sacredceltic 2013. január 6. 2013. január 6. 14:55:26 UTC link Link a hozzászóláshoz

est-ce que ça peut résoudre le fait que les espaces fines me sont toujours invisibles, ce qui est incorrect et m'insupporte au plus haut point ?

{{vm.hiddenReplies[15144] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 7. 2013. január 7. 0:30:30 UTC link Link a hozzászóláshoz

Can you give us the number of a sentence that incorporates a thinspace?

{{vm.hiddenReplies[15159] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sacredceltic sacredceltic 2013. január 7. 2013. január 7. 7:30:28 UTC link Link a hozzászóláshoz

http://tatoeba.org/epo/sentences/show/2113836

Here, I don't see the thin spaces at all with Chrome on MacOSX.
With Chrome on Windows, I see ugly squares in place of the thin spaces when the sentence is the main, and I don't see anything when it is a linked sentence...

{{vm.hiddenReplies[15163] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sacredceltic sacredceltic 2013. január 7. 2013. január 7. 7:50:27 UTC link Link a hozzászóláshoz

...and they also appear as ugly squares on iOS.

{{vm.hiddenReplies[15165] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sysko sysko 2013. január 7. 2013. január 7. 14:07:57 UTC link Link a hozzászóláshoz

on android too, because the operating system has a much more limited set of fonts and has they are made by US companies, they didn't take care about that...

{{vm.hiddenReplies[15171] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sacredceltic sacredceltic 2013. január 7. 2013. január 7. 14:24:22 UTC link Link a hozzászóláshoz

I wish I could boycott anything that imposes local standards to the world...the time will come !

sacredceltic sacredceltic 2013. január 7. 2013. január 7. 14:28:11 UTC link Link a hozzászóláshoz

Ceci étant, j'utilise plein d'autres applis qui gèrent du texte en français sous iOS et qui affichent correctement les espaces devant les points doubles. Par exemple : Le Monde sous iOS.
Est-ce à dire qu'ils emploient des espaces entiers ?
Si c'est le cas, Tatoeba doit faire de même...

{{vm.hiddenReplies[15173] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sysko sysko 2013. január 7. 2013. január 7. 14:37:04 UTC link Link a hozzászóláshoz

si ce sont des applis natives, elles peuvent certainement embarqué leurs propres polices.

Sinon comme il a été fait remarqué plus haut, on peut a présent avec CSS3 (qui n'était pas encore un standard à l'époque ou l'on a développé le code de tatoeba), aussi "embarqué" la police dans le code css. Ce qui devrait permettre de résoudre le problème, en embarquant pour le français une fonte capable de rendre cette espace.

{{vm.hiddenReplies[15174] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sacredceltic sacredceltic 2013. január 7. 2013. január 7. 14:42:51 UTC link Link a hozzászóláshoz

Oui mais alors, comment ça se fait que quand j'impose la fonte dans mon navigateur (j'ai forcé Arial) sous Windows, où j'ai aussi des carrés tout moches, ça ne marche pas non plus ?

sharptoothed sharptoothed 2013. január 7. 2013. január 7. 15:25:18 UTC link Link a hozzászóláshoz

Please take a look at this test page:
http://jplangtools.com/tatoeba/thinspace.html
What do you see in your browsers?

{{vm.hiddenReplies[15176] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sacredceltic sacredceltic 2013. január 7. 2013. január 7. 15:34:44 UTC link Link a hozzászóláshoz

Merci Sharptoothed

On the first 3 lines, I see ugly squares in front of the ?
The 4th line is correct

{{vm.hiddenReplies[15177] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sharptoothed sharptoothed 2013. január 7. 2013. január 7. 16:00:18 UTC link Link a hozzászóláshoz

Now I clearly see that the problem is more complex than it seemed. We have to take into account not only the browser type but also the operating system type and version.
The most universal solution is to use HTML entities instead of Unicode symbols. The problem is that this requires on the fly conversion inside the Tatoeba engine.

{{vm.hiddenReplies[15184] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sacredceltic sacredceltic 2013. január 7. 2013. január 7. 16:03:39 UTC link Link a hozzászóláshoz

to me, it clearly points to the fact that the W3C hasn't done its job properly...

{{vm.hiddenReplies[15186] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sharptoothed sharptoothed 2013. január 7. 2013. január 7. 16:11:41 UTC link Link a hozzászóláshoz

hmmm... actually, W3C did its best at providing various types of HTML entities (see http://www.w3schools.com/tags/ref_symbols.asp ) but all that stuff is more for the coders, not for the end users. Older operating systems seems to have limited support for Unicode and, unfortunately, W3C can't help it.

sysko sysko 2013. január 9. 2013. január 9. 4:52:22 UTC link Link a hozzászóláshoz

Actually your example with the html entity is not the same as the previous ones as you use a thinspace (U+2009) and the other sentences use a narrow non breakable space (U+202F)

{{vm.hiddenReplies[15251] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sharptoothed sharptoothed 2013. január 9. 2013. január 9. 6:58:27 UTC link Link a hozzászóláshoz

Thanks for pointing out, sysko! I suspected that something is wrong :-)
Unfortunately, it seems that there's no separate HTML entity for Narrow No-Break Space so we have to use its HTML code instead (&#8239;) and pray that browser display it right.
I've updated the test page.
http://jplangtools.com/tatoeba/thinspace.html

Amastan Amastan 2013. január 7. 2013. január 7. 15:37:48 UTC link Link a hozzászóláshoz

I can see every font correctly (Tahoma, Arial, Georgia). I use Google Chrome.

Yal tasefsit zemreɣ ad tt-waliɣ akken iwuta (Tahoma, Arial, Georgia). Sseqdaceɣ Google Chrome.

Je peux voir chaque police correctement affichée (Tahoma, Arial, Georgia). J'utilise Google Chrome.

{{vm.hiddenReplies[15178] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sacredceltic sacredceltic 2013. január 7. 2013. január 7. 15:46:05 UTC link Link a hozzászóláshoz

That's really puzzling, since we use the same browser and we don't see the same result...
Is this on Mac OSX ?

sacredceltic sacredceltic 2013. január 7. 2013. január 7. 15:41:21 UTC link Link a hozzászóláshoz

that's for Chrome on Windows XP

{{vm.hiddenReplies[15179] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
Shishir Shishir 2013. január 7. 2013. január 7. 15:51:24 UTC link Link a hozzászóláshoz

I see the same thing that you describe, squares on the first three lines, but not on the fourth one when I use chrome on windows XP.
But I don't see any square when I use firefox or when I use chrome on Windows 7.

{{vm.hiddenReplies[15181] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sacredceltic sacredceltic 2013. január 7. 2013. január 7. 15:59:49 UTC link Link a hozzászóláshoz

what a mess...

sharptoothed sharptoothed 2013. január 7. 2013. január 7. 16:04:01 UTC link Link a hozzászóláshoz

Browsers running on Windows 7 OS seem to work like a charm. My Dolphin brobser (WebKit based) that runs on my Adndroid 4 device displays 1,2 and 4 lines correctly, too.

Amastan Amastan 2013. január 7. 2013. január 7. 15:55:32 UTC link Link a hozzászóláshoz

Sacredceltic:
J'utilise Windows XP.

I use Windows XP.

{{vm.hiddenReplies[15182] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sacredceltic sacredceltic 2013. január 7. 2013. január 7. 16:01:10 UTC link Link a hozzászóláshoz

even more puzzling : same browser, same OS, different effect...

{{vm.hiddenReplies[15185] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sharptoothed sharptoothed 2013. január 7. 2013. január 7. 16:16:07 UTC link Link a hozzászóláshoz

It's possible that a localization of particular OS version also matters. And, possibly, the service packs installed, too.

sacredceltic sacredceltic 2013. január 7. 2013. január 7. 16:32:30 UTC link Link a hozzászóláshoz

@Amastan

which localization of your browser / OS do you use ?

{{vm.hiddenReplies[15190] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
Amastan Amastan 2013. január 7. 2013. január 7. 16:54:07 UTC link Link a hozzászóláshoz

Sacredceltic:

My OS uses France (Emplacement: France).

My Google Chrome also uses France as a localization.

{{vm.hiddenReplies[15191] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sacredceltic sacredceltic 2013. január 7. 2013. január 7. 16:56:23 UTC link Link a hozzászóláshoz

same as mine...I don't understand the different effect for you and me, then...

{{vm.hiddenReplies[15192] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
Amastan Amastan 2013. január 7. 2013. január 7. 17:06:22 UTC link Link a hozzászóláshoz

Have you installed something new on your Mac? :-p Maybe some new program has distrupted your font system :p

{{vm.hiddenReplies[15193] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sacredceltic sacredceltic 2013. január 7. 2013. január 7. 17:12:20 UTC link Link a hozzászóláshoz

Here I'm working on a French XP PC machine (I hate it...) with a French up-to-date Chrome.
I use different OS all the time...

halfb1t halfb1t 2013. január 7. 2013. január 7. 23:32:43 UTC link Link a hozzászóláshoz

On Windows 7, the thinspaces, including &thinsp; render as normal spaces on Firefox, Chrome, and Opera.

&thinsp; renders correctly on all three browsers in this page: http://www.robinlionheart.com/stds/html4/spchars.

{{vm.hiddenReplies[15201] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sacredceltic sacredceltic 2013. január 7. 2013. január 7. 23:40:29 UTC link Link a hozzászóláshoz

How is it dependent on the OS, rather than on the browser fonts ?

{{vm.hiddenReplies[15202] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 8. 2013. január 8. 1:30:44 UTC link Link a hozzászóláshoz

Browser access to fonts is via the OS (if only through the file system), and the names by which fonts are known to applications is under OS control.

sharptoothed sharptoothed 2013. január 8. 2013. január 8. 8:13:00 UTC link Link a hozzászóláshoz

Nice page, by the way!
I've made a test page based on it.
http://jplangtools.com/tatoeba/spaces.html

{{vm.hiddenReplies[15215] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 8. 2013. január 8. 8:25:43 UTC link Link a hozzászóláshoz

Cool.

Since we started with FeuDRenais's troublesome Uyghur, perhaps it would be nice to include the sample sentence he pointed us to.

A freely available glyph-rich serif font is Charis SIL.

Arial and Times New Roman can be expect to be different on Windows than other OS's. After releasing them to the world, Microsoft updated them, adding many more glyphs. This, I expect, is why SacredCeltic gets boxes for thinspaces on his Mac, while thinspaces display fine for me.

{{vm.hiddenReplies[15216] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sharptoothed sharptoothed 2013. január 8. 2013. január 8. 9:06:21 UTC link Link a hozzászóláshoz

This is the page that demonstrates Uyghur sentence rendered using different fonts:
http://jplangtools.com/tatoeba/national.html
UKIJ fonts are being uploaded via CSS on the fly and the other fonts are those the OS contains. There are also "sans" and "sans-serif" lines included so one can see what font his browser actually uses to display those typesets.

Charis SIL font is really great but it's too big in size to be downloaded on-the-fly via CSS. So one have to install it manually and this may be a problem.

{{vm.hiddenReplies[15220] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 8. 2013. január 8. 10:00:08 UTC link Link a hozzászóláshoz

The reason I suggested the combination is that both the thinspace problem SC has and the bad-glyph problem FeuDRenais has are the same: font-substitution breakdowns. Web fonts mask the problem without really helping; since, as you point out, they are impractical as a general solution.

Charis SIL may be (a good portion of) the serif part of a complete solution. One way or another, users need to have large fonts on their machines to display Chinese and Kanji. The problem is to find relatively painless, fool-proof recipes that users can execute.

Can you tell if Tatoeba's thinspaces are really thinspaces or the narrow-no-break-spaces they really ought to be?

{{vm.hiddenReplies[15223] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sharptoothed sharptoothed 2013. január 8. 2013. január 8. 11:51:47 UTC link Link a hozzászóláshoz

Will this test page help?
http://jplangtools.com/tatoeba/fonts.html

> Can you tell if Tatoeba's thinspaces are really thinspaces or the narrow-no-break-spaces they really ought to be?

Absolutely not. To tell the truth, I don't really care if some space is thin or not-so-thin or if some dash is long or not-so-long. It doesn't prevent me from understanding the sentence. :-)

{{vm.hiddenReplies[15225] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 8. 2013. január 8. 12:03:36 UTC link Link a hozzászóláshoz

It's too busy for me to say for sure right away; but yes, I think it will be very helpful; and I plan to look it at in a variety of environments.

Thanks.

alexmarcelo alexmarcelo 2013. január 8. 2013. január 8. 12:05:28 UTC link Link a hozzászóláshoz

Have you tried Calibri?
http://www.lucasfonts.com/filea...ri-charset.gif

{{vm.hiddenReplies[15227] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sharptoothed sharptoothed 2013. január 8. 2013. január 8. 12:15:06 UTC link Link a hozzászóláshoz

Calibri seems to look nice though I prefer Arial.
I've added Calibri to the page. (maybe it worth writing some script that changes fonts on the fly since the page gets too long. :-))

alexmarcelo alexmarcelo 2013. január 8. 2013. január 8. 12:23:02 UTC link Link a hozzászóláshoz

Thanks! I love that font. You are using justified text, aren't you? Maybe it would be better not to, because spaces are usually stretched or compressed...

{{vm.hiddenReplies[15229] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sharptoothed sharptoothed 2013. január 8. 2013. január 8. 12:27:18 UTC link Link a hozzászóláshoz

> You are using justified text, aren't you?

ummm, nope, no justified alignment is used. English example just contains different type of spaces. :-)

{{vm.hiddenReplies[15231] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
alexmarcelo alexmarcelo 2013. január 8. 2013. január 8. 12:32:52 UTC link Link a hozzászóláshoz

Oh, I see! Thanks! :-)

halfb1t halfb1t 2013. január 8. 2013. január 8. 3:20:37 UTC link Link a hozzászóláshoz

With Windows 7, Chrome, Firefox, Opera, and my own HTML/CSS, I see (1) No difference between &thinsp; and the Unicode thinspace character. (2) Differences between the thinness of the thinspace character from font to font--with some fonts it's quite difficult to see the difference between thinspaces and spaces--and of course the difference is diminished at smaller sizes. (3) Differences, even with the same font specified, from browser to browser. I attribute this to differing font substitutions. To test the thesis, I specified Lucida Sans Unicode--which includes the thinspace character--and the thinspaces displayed the same on all three browsers.

The appearance of a square indicates the browser found no thinspace character in any font. Where you have the option of installing Lucida Sans Unicode, that may help.

When I examine the source delivered from http://www.lemonde.fr/ I find ASCII spaces before and after colons. I don't know what to make of that.

{{vm.hiddenReplies[15208] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sacredceltic sacredceltic 2013. január 8. 2013. január 8. 7:32:54 UTC link Link a hozzászóláshoz

Thank you for the detailed explanation.
Alas, Lucida Sans Unicode doesn't seem standard (if anything is across platforms...)
But I think Tatoeba should take that in charge, not the user.

Spaces are an ESSENTIAL part of the reading experience, and in French, an essential part of the reading of double-points, whether thin or plain (as is the case with other languages such as Romanian as was also the case in English, German, ...before the advent of computers and the crunching of the spaces : look at old book editions if in doubt...)

In French, one often cannot apply the correct intonation to the start of a question if one's eyes have not quickly spotted a question mark at the end of the sentence, regardless of the length of the sentence, and this quick spotting is not possible if the question mark is stuck to the last word, as it might be confused with another high letter, in a grotesque, unreadable combination that make one's reading experience just miserable.

That is why professional publishers of ALL French books, magazines and newspapers insert spaces (thin or NOT) before double-points, and have been doing so for centuries, so that French readers do NOT know any other way of reading, since they have grown with it.

I think Tatoeba should reflect French standards when displaying French sentences, regardless of the standards applied to other languages.
It pains me that it doesn't and I have been complaining about it ever since I joined.
If thin spaces cannot be properly rendered on Tatoeba, then standard spaces should be displayed instead and not automatically substituted afterwards for thinspaces, the same way that professional publishers in French use when confronted with the situation.

liori liori 2013. január 6. 2013. január 6. 15:26:11 UTC link Link a hozzászóláshoz

It would be better to use the “lang” attribute instead, which was designed for this purpose. It is possible to apply CSS rules by matching an arbitrary html attribute, including “lang”. Example: https://dl.dropbox.com/u/52886258/lang.html

{{vm.hiddenReplies[15146] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sysko sysko 2013. január 6. 2013. január 6. 15:38:00 UTC link Link a hozzászóláshoz

sure, that did go out of my mind while posting the answer :)

halfb1t halfb1t 2013. január 8. 2013. január 8. 7:44:45 UTC link Link a hozzászóláshoz

Lucida Grande on the Mac is reputed to be equivalent to Lucida Sans Unicode.

As I've argued elsewhere, Tatoeba _can't_ resolve these issues: all it can do is ship characters, an encoding tag, and CSS. What happens after that depends on the rendering that takes place on our machines; and if we can't completely control it, we sure can mess it up.

{{vm.hiddenReplies[15214] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
liori liori 2013. január 8. 2013. január 8. 8:35:30 UTC link Link a hozzászóláshoz

Now that you say this… there is a way to completely control rendering—delivering text as images. If done in addition to normal text, could be useful to at least make it possible for users to double-check if everything's OK on their side.

Well, just throwing an idea into discussion. It seems unfeasible for a short-time solution, and for a long-time one getting CSS and font stuff on the user side correctly would be better…

{{vm.hiddenReplies[15218] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 8. 2013. január 8. 8:39:22 UTC link Link a hozzászóláshoz

Right you are. I didn't think of that. And if we eventually end up with a How-To, text-as-images will be very helpful.

halfb1t halfb1t 2013. január 8. 2013. január 8. 12:46:57 UTC link Link a hozzászóláshoz

Running OS X on a Hackintosh in a Virtual Box under Windows 7. Nothing got Safari to display thinspaces; but Chrome did it for me; and maybe he'll do it for you if you (1) disable the Georgia font family, and (2) set Chrome's Standard Font to Microsoft Sans Serif.

This illustrates my thesis that specifying fonts gets in the way in some environments. OS X's text rendering environment seems to be one of those.

halfb1t halfb1t 2013. január 6. 2013. január 6. 11:18:29 UTC link Link a hozzászóláshoz

The problem is not letter connection (ligatures), but glyph selection (terminal instead of medial forms).

Can't reproduce the good/bad effect in Chrome. In Firefox (on Windows 7), I can switch between good/bad by changing the default font.

Bad are: Trebuchet, Verdana, MS Sans Serif.
Good are: Segoe UI, Arial, Tahoma.

(Tatoeba should switch the default CSS from Trebuchet to one of the good ones. The Tahoma is very heavy.)

Georgia has no Arabic letters; sans-serif is not a font: it's an HTML fallback. Every browser must do something for sans-serif, but what it does is up to the browser.

{{vm.hiddenReplies[15137] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sacredceltic sacredceltic 2013. január 7. 2013. január 7. 11:26:08 UTC link Link a hozzászóláshoz

I tried to change the fonts in the advanced parameters of Chrome. I switched all to Arial. But the squares for thin spaces are still there...

liori liori 2013. január 8. 2013. január 8. 12:24:37 UTC link Link a hozzászóláshoz

One more thought/idea: it would probably be worth checking how does Wikipedia deal with the problem.

{{vm.hiddenReplies[15230] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 8. 2013. január 8. 13:18:44 UTC link Link a hozzászóláshoz

Wikipedia's position is that fonts are your problem. They say on many pages something like "This page contains characters in.... If you don't see them, it's because you don't have the necessary fonts."

{{vm.hiddenReplies[15234] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
liori liori 2013. január 8. 2013. január 8. 13:48:44 UTC link Link a hozzászóláshoz

Unless I'm mistaken, in the CSS they just say “font-family: sans-serif” or “font-family: serif”, depending on place (with few additional cases for TeX and source code examples, irrelevant to us). They also provide means for user to check whether the characters displayed on the machine are correct, at least for CJKV [1]. I couldn't however find anything similar for Arabic, except for short notes about few fonts [2].

[1] http://en.wikipedia.org/wiki/He...t_(East_Asian)
[2] http://en.wikipedia.org/wiki/He...l_support#Font

So, Wikipedia opted for default fonts.

{{vm.hiddenReplies[15235] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 8. 2013. január 8. 14:16:58 UTC link Link a hozzászóláshoz

But consider this, from the Uyghur Wikipedia:

<div >

This suggests that thoroughly internationalizing Tatoeba's interface may be significantly more difficult than the problems we're wrestling with now.

The links you posted are useful, but the level of detail does not descend to thinspaces, versions of Microsoft Core Fonts, details of browser substitution algorithms, or OS text-rendering environments.

{{vm.hiddenReplies[15237] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 8. 2013. január 8. 14:21:24 UTC link Link a hozzászóláshoz

Whoops. The div tag I was trying to quote has this style string: "direction:rtl; font-family: 'UKIJ inchike', 'Alpida_Unicode System', 'UKIJ Tuz Tom', 'Microsoft Uighur', 'uyghur ekran', 'Segoe UI', 'Tahoma'".

{{vm.hiddenReplies[15239] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
liori liori 2013. január 8. 2013. január 8. 14:30:36 UTC link Link a hozzászóláshoz

Ah, and this is what I was actually looking for. Tatoeba could probably use it blindly for Uyghur. Great find!

{{vm.hiddenReplies[15241] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 8. 2013. január 8. 14:45:12 UTC link Link a hozzászóláshoz

The potential fly in the ointment is that this font spec may be aimed at machines set up to use Uyghur by default; and such machines (and OS's and browsers) may not span the same range as those used by the whole Tatoeba community.

{{vm.hiddenReplies[15242] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
FeuDRenais FeuDRenais 2013. január 9. 2013. január 9. 7:11:59 UTC link Link a hozzászóláshoz

I fear that such machines may not really exist...

{{vm.hiddenReplies[15256] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 9. 2013. január 9. 9:56:26 UTC link Link a hozzászóláshoz

From a little Web research, it seems you're right. Although a few translation initiatives are in the works, they seem not to be progressing apace.

halfb1t halfb1t 2013. január 9. 2013. január 9. 9:26:22 UTC link Link a hozzászóláshoz

Here's the font string from the WordPress Uyghur translation effort: 'UKIJ Tuz Tom','Alpida Unicode System','Alkatip Tor','Alp Ekran','Microsoft Uighur',Tahoma,Verdana,Arial,Helvetica.

FeuDRenais FeuDRenais 2013. január 8. 2013. január 8. 19:35:23 UTC link Link a hozzászóláshoz

All those families, and yet the whole page looks so much nicer if you just put "font-family:sans-serif".

Then again, it's probably worth noting that most Uyghurs are from China, and more than just a few people in China still use IE6. And who knows what looks best when it comes to that mess of a relic...

{{vm.hiddenReplies[15244] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 9. 2013. január 9. 0:12:01 UTC link Link a hozzászóláshoz

> the whole page looks so much nicer if you just put "font-family:sans-serif"

> who knows what looks best when it comes to that

These statements are inconsistent, and the first is wrong. What happens "if you just put "font-family:sans-serif" _depends_. This is the crucial point, and understanding that what the effect of font specification is depends on factors forever beyond Tatoeba's control is crucial to understanding what can be done, which is basic to discussing what should be done.

{{vm.hiddenReplies[15246] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
FeuDRenais FeuDRenais 2013. január 9. 2013. január 9. 6:03:56 UTC link Link a hozzászóláshoz

I'm sorry, halfb1t. I suppose that I should have stated "the whole page looks so much nicer FOR ME... ON MY COMPUTER... WITH MY BROWSER..."

(I thought those were implicit)

Next time I'll write a wall of text ala you.

My only request here is that Tatoeba try something like sans-serif for all of its Uyghur sentences, since I suspect that this will be better (perhaps much better) than doing nothing (across many computers/browsers).

{{vm.hiddenReplies[15252] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 9. 2013. január 9. 6:15:29 UTC link Link a hozzászóláshoz

> I suspect

That's fine; but more helpful would be a little data. Do you have the font Georgia on your computer? What happens if you disable or rename it? What happens if you set your browser's serif font to sans-serif?

{{vm.hiddenReplies[15253] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
FeuDRenais FeuDRenais 2013. január 9. 2013. január 9. 7:05:11 UTC link Link a hozzászóláshoz

No clue, haven't tried.

What would you do with this data, though? How could it be made into something implementable?

{{vm.hiddenReplies[15255] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 9. 2013. január 9. 7:28:29 UTC link Link a hozzászóláshoz

My experience suggests that there are strict limits on what Tatoeba can accomplish from its end, because the renderings that appear in users' browsers are strongly dependent on the fonts they have installed and the settings they put into their browsers.

The question is "Is my experience too narrow?" Careful reports by others with accurate relevant detail contribute to answering that question;
and when they appear here, they contribute to the community's understanding of the nature of the problem, which of course has important implications for rational solutions.

halfb1t halfb1t 2013. január 8. 2013. január 8. 14:24:47 UTC link Link a hozzászóláshoz

If Tatoeba opts for specifying fonts per-language, we may be able to crib such specs from Wikipedia, as was suggested by your original remark.

{{vm.hiddenReplies[15240] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
Balamax Balamax 2013. január 8. 2013. január 8. 20:38:26 UTC link Link a hozzászóláshoz

Shouldn't these three fonts (Uyghur Arabic, Latin, Cyrilic) appear together automatically for Uyghur, as in the sentences for Uzbek, Japanese or Chinese.

{{vm.hiddenReplies[15245] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 9. 2013. január 9. 0:18:22 UTC link Link a hozzászóláshoz

If you're thinking of transliteration you might want to start with the Wikipedia article on Uyghur alphabets.

{{vm.hiddenReplies[15247] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
Balamax Balamax 2013. január 9. 2013. január 9. 3:35:27 UTC link Link a hozzászóláshoz

Indeed these three alphabets have equal rights here for every item http://www.uyghurdictionary.org. http://www.omniglot.com/writing/uyghur.htm

{{vm.hiddenReplies[15249] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
halfb1t halfb1t 2013. január 9. 2013. január 9. 3:53:18 UTC link Link a hozzászóláshoz

The questions that come to mind are: (1) Which of the five alphabets in the Russian Wikipedia article are to be chosen? (2) Is simple transliteration between all pairs accurate in every case? On the face of it, it seems that transliteration into the Arabic script involves at least the usual initial-medial-final-solitary glyph selection.

These are not insoluble problems, but they need careful identification before they have any chance of being solved. A related and more fundamental issue is support for multiple scripts for a single language in general. This issue has arisen before. I believe it had to do with an Indian language, but I forget which.

The important thing to recognize in this connection is that no solution is likely to be simple.

{{vm.hiddenReplies[15250] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sacredceltic sacredceltic 2013. január 9. 2013. január 9. 21:41:18 UTC link Link a hozzászóláshoz

It's Marathi, which can be written either in Modi or Devanagari alphabets.

halfb1t halfb1t 2013. január 7. 2013. január 7. 2:07:17 UTC link Link a hozzászóláshoz

The root of the problem is font substitution. The reason Tatoeba's many scripts display as well as they do is font substitution.

There are--as I see it--two very different approaches to ameliorating the problem: (1) Use a glyph-rich font as default. (2) Use a default font (and other fonts) that inject as little information as possible into browsers' font-substitution algorithms.

(1) has numerous problems, which I won't try to enumerate. (2) amounts to specifying no fonts at all.

With (2), feuDRenais should get just what suits him by specifying sans-serif as his default font.

What are the issues? What are the impacts on the internationalization of Tatoeba's interface? On support for additional languages?

In addition to (2), Tatoeba's users would benefit from information: how do I get browser A to correctly display language B? I suggest that such information--readily crowd-sourced from our community--is best organized by browser within language. Sorting first by language also organizes what-fonts-do-need-and-where-can-I-get-them data.

It is conceivable that sans-serif is a better choice than no font specification at all. What might make that true? If some (or most or all) browsers' font-substitution algorithms were simpler in that case, or more predictable, or more similar to one another.

{{vm.hiddenReplies[15161] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
FeuDRenais FeuDRenais 2013. január 7. 2013. január 7. 19:41:04 UTC link Link a hozzászóláshoz

>>> With (2), feuDRenais should get just what suits him by specifying sans-serif as his default font.

Not sure if I understood everything you wrote, but if you're implying that a user should just specify the fonts themselves to find what suits them, I disagree. It's not the job of the website user to have to do these things. I'm not so much bothered by Uyghur displaying incorrectly for me personally (since I can still read and understand it) as I am by people who are learning it (in whatever capacity) being forced to read a corrupted version of the script.

A solution that I would propose is to use the specifications that are the most robust over different systems/browsers (sans-serif clearly seems the best for Uyghur so far). If that's what you're saying, too, then cool, we agree.

{{vm.hiddenReplies[15195] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sacredceltic sacredceltic 2013. január 7. 2013. január 7. 20:11:30 UTC link Link a hozzászóláshoz

The problem is, the best solution for one writing system/language is probably not the best for all of them.
So we need to find a system that applies the best option to EVERY SINGLE writing system/language combination...

I'm positively fed up with systems that fit the majority and just neglect all the minorities. They just are NOT acceptable. Democracy is definitely not the rule of the majority only !

{{vm.hiddenReplies[15196] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
FeuDRenais FeuDRenais 2013. január 7. 2013. január 7. 20:39:53 UTC link Link a hozzászóláshoz

I think that's what both sysko and liori said, more or less. You just define style rules for each language individually.

It would also be nice to right-align languages that are read right-to-left, but that's probably asking for too much, eh?

{{vm.hiddenReplies[15197] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sharptoothed sharptoothed 2013. január 7. 2013. január 7. 21:07:45 UTC link Link a hozzászóláshoz

> It would also be nice to right-align languages that are read right-to-left, but that's probably asking for too much, eh?

Actually, in Gecko-based browsers (Firefox, Seamonkey, etc.) RTL sentences are displayed right-aligned. But it seems this doesn't work for Chrome for some reason.

{{vm.hiddenReplies[15198] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
MrShoval MrShoval 2013. január 7. 2013. január 7. 22:57:39 UTC link Link a hozzászóláshoz

In my Chrome the HEB is right adjusted just fine.

{{vm.hiddenReplies[15199] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sharptoothed sharptoothed 2013. január 8. 2013. január 8. 8:35:11 UTC link Link a hozzászóláshoz

As for my Chrome on Windows 7, Hebrew sentences looks like this:
http://dl.dropbox.com/u/42772287/Hebrew.png
The same is with Arabic.
What a mess! :-)

{{vm.hiddenReplies[15217] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
MrShoval MrShoval 2013. január 8. 2013. január 8. 9:39:11 UTC link Link a hozzászóláshoz

Same page looks neat:
https://www.dropbox.com/s/xt2dm...%202130724.jpg

{{vm.hiddenReplies[15221] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
sharptoothed sharptoothed 2013. január 8. 2013. január 8. 9:52:02 UTC link Link a hozzászóláshoz

Isn't it strange? :-)

{{vm.hiddenReplies[15222] ? 'expand_more' : 'expand_less'}} válaszok elrejtése válaszok mutatása
AlanF_US AlanF_US 2013. január 9. 2013. január 9. 23:30:42 UTC link Link a hozzászóláshoz

When I use Firefox on English-language Windows 7, I see the same behavior as you, sharptoothed. The period incorrectly clings to the right of the Hebrew sentence rather than the left, though only in certain contexts. (If you look at the .png file you posted, you'll see that the period appears correctly in the Hebrew sentence that is indirectly linked to the main one.) I've discussed this with Eldad, but he (like MrShoval) doesn't see the same behavior. Perhaps it depends on the language of the OS.

Eldad Eldad 2013. január 7. 2013. január 7. 23:14:04 UTC link Link a hozzászóláshoz

In mine, too.

halfb1t halfb1t 2013. január 8. 2013. január 8. 1:36:53 UTC link Link a hozzászóláshoz

Font substitution can be avoided today only by specifying download-on-the-fly Web fonts. Until glyph-on-demand (on the Ajax model) appears, that remains impractical.

With font substitution in the mix, the rational immediate goal is classical: get to a known state. This is easier said than done; HTML/CSS is inadequate; and no knowable state is independent of OS, browser, and available font set.

The inescapable conclusion is that it is simply not possible to get the ball out of the user's court. It follows that specifying any fonts at all muddies the water.

I'd like to use FeuDRenais's situation as an example. That means I have to do some guessing about his system, so I may be wrong. I take the chance in hope of guessing right and so lending weight to my argument; but in any case the situation I'm going to describe will apply to some systems, even if it fails to apply to FeuDRenais's.

According to sharptooth, the main sentence calls for the font Georgia. Two facts are salient: (1) Georgia lacks Arabic glyphs, and (2) Georgia is a serif font. I'm guessing that all the fonts with good Uyghur glyphs that FeuDRenais has installed are sans-serif fonts and that he has installed a serif font--like Times New Roman--whose Uyghur glyphs are bad. So what happens? His browser looks for Uyghur glyphs in Georgia and doesn't find them. Since Georgia is a serif font, the browser looks for the missing glyphs _in a serif font_ and finds the bad ones in Times New Roman.

If I'm lucky enough to be right, FeuDRenais has (at least) two effective options: (1) He can set his browser's serif font to sans-serif. (2) He can install a serif font with good Uyghur glyphs--like Arab Typesetting--and set his browser's serif font to that.

The point I'm trying to make is that in a situation like this, specifying any particular font complicates the font substitution issue, because attributes of the specified font affect the substitution process.