A simple blog on Web, Media, Mobile n' everything related.

I18Nized domain names n IE7

Posted: July 31st, 2006 | Author: dotblack | Filed under: Web-Browsers, i18n | Comments Off

I usually post this kind of information in the inloop section but this is a big one. IE7 Beta 2 came with Internationalized Domain Name (IDN) support. IE7 Beta 3 just gets it better.

Remember when we talked about i18n and true localization and discussed the findability problem caused by non-localized domain names ? Recall that? Well, it’s on the way. IE7 now even supports mixing scripts, allowing to have ASCII and other characters in the domain name. Does that include the whole URL? I’ll leave that question at IE Blog.

Update: IEBlog answered: @dotone: The IDN standard only applies to domain names. For the rest of the URL, IE by default uses UTF-8 for the path, and either UTF-8 or codepaged text for the query string.

Post to Twitter Tweet This Post


Episode #2: i18n and Localization, News buzz.

Posted: July 15th, 2006 | Author: dotblack | Filed under: dotShow, i18n | 2 Comments »

Here comes the second episode of your dotShow. It’s a short one, I hope 15 minutes wouldn’t be long, but long enough I hope.

I18N, and a short News buzz.

Here you go:
Running time: 00:15:17
File size: 13.9 MB
Download: Episode #2, i18n

Post to Twitter Tweet This Post


Arabic and Farsi OL tags

Posted: July 10th, 2006 | Author: dotblack | Filed under: Web-Design, i18n | 7 Comments »

UL tags are well used in standard-based website designs. UL tags are used for lists that could be navigation menues or any kind of lists that are of no order. OL’s are supported in different formats. Roman, Latin, and Greek numbers and alphabets. Arabic or Farsi built-in support? Let’s explore that.

As seen on W3

OL and UL tags go under the lists and generated content. The CSS property that controls lists and auto-generated(populated) content is defind under list-style-type. That’s something simple that you’ve seen and dealt with before. Come the difference. ULs are naturally and semantically to use either an image bullet or a glyphs kind of format such as disc, square and circle.

Now our friend OL. In CSS2.1 specs and definitions, OLs could be controlled by the list-style-type with the following numbering system values:

  • decimal (beginning with 1)
  • decimal-leading-zero (01, 02, 03, …, 98, 99)
  • lower-roman (i, ii, iii, iv, v, etc.)
  • upper-roman (I, II, III, IV, V, etc.)
  • georgian (an, ban, gan, …, he, tan, in, in-an, …)
  • armenian (traditional Armenian numbering)

And alphabetic system values:

  • lower-latin or lower-alpha (a, b, c, … z)
  • upper-latin or upper-alpha (A, B, C, … Z)
  • lower-greek (?, ?, ?, …)

No Arabic or Farsi yet. Right? Disapointing. Let’s take a look at CSS3 definitions.

CSS3 definitions(not final yet), have more than Glyphs/Numeric/Alphabetic systems such as Algorithmic, Symbolic, and Non-Repeating. However the Numeric and Alphabetic systems are richer than CSS2.1 definitions.

Even thought not many browsers implemented the new CSS3 defs completly, some of the new Alphabetic and Numeric systems are applied in Firefox and Opera as of my testings.

One significant system that is applied is Hebrew wich is right-to-left(rtl) system like Arabic and Farsi. So what’s with Arabic and Farsi? Not supported yet. Let’s solve that problem until it is fully supported.

Note: You could view Arabic/Farsi numeric/alphabetic lists if you have your regional settings set as such which is not a setting in CSS nor controlled by your browser–OS controlled!

Let’s get visual

Let’s play with a simple ordered list like:

<ol> <li>Item One</li> <li>Item Two</li> <li>Item Three</li> <li>Item Four</li> </ol>

Numeric formats:

Numeric Ordered Lists

Alphabetic Formats

Alphabetic Ordered Lists

Arabic Formats

Arabic Ordered Lists

How do we achieve the Arabic format?

We could get the Arabic format by forcing it on the document with a little Javascript. I wrote a little class to do that job and give you back Arabic OLs.

The class is called dot1_ol which you can download here.

What dot1_ol does

dot1_ol is a simple Javascript class that takes an OL element of choice by Id or Class or all the OLs in the document and:

  1. Change their list-style-type to none
  2. Get all the LI elements in the OL and attach proper numbering to them
  3. Finally replace the old OL

Let’s do that

All you have to do is first download the dot1_ol class file and embed it in your head section of your document.

By default dot1_ol reformats all your OL tags to numeric Arabic lists. You could use it to show you Alphabetic by changing the parameter.

Demo

There are four demo files that you could view and play with and even download listed below:

  1. Convert all OLs to Numeric Arabic OLs:
  2. Convert all OLs to Alphabetic Arabic (Alpha order) OLs
  3. Convert all OLs to Alphabetic Arabic (Abjad order) OLs
  4. Convert OLs by Id or Class to Arabic formated OLs

Need to play with these simple lists? Download the demo

This is the first attempt to do this, so the code might need a little over looking and tweaking more if needed such as nested lists and complex lists. Review it, check it out and let me know if it was of any use to you.

So, next time you have an Arabic website to design, use OLs, I got your back ;)

Post to Twitter Tweet This Post


i18n and true localization Pt1

Posted: June 4th, 2006 | Author: dotblack | Filed under: Web-Design, i18n | 3 Comments »

With localization and internationalization going main stream and being wider supported on systems, apps and websites, there are steeps to overcome and bridges to be built. Problems with filenames and domain name localization direct problems to findability in which search-bots rankings depend on. On the other hand, the date and system wide Unicode support on operating systems with standardized sets of language elements and support for localized numbering are the stoppers for truly localized websites and web-services.

Localized domain names & filenames; servers vs. browsers

Recall when you last registered a domain name, remember the allowed characters? It’s a range of A-Z, 1-9, and “-“. Any characters of another language? Easter languages? Farsi, Arabic, Hebrew, Chinese, or Japanese? Not even accented European characters right?

While domain names are nothing but alphanumeric representation of IP addresses we’re still stock with only one way of doing it; in English. A number of labs have tried to implement the non-English domain systems which are in beta or in testing levels such as U.A.E’s Etisalat’s attempt to bring it to the developers, it is still going slow. You ask why?

While web-servers do not accept other than the allowed and standardized characters for domain linkage there is the client side problem; the browser side. Web browsers tend to support Unicode or any non-English characters to be typed in the address-bar but do they post it the right way? When the request is made to a server using a non-English set of characters such as an Arabic name or a Farsi name is it really taken the way it is? No, it’s URL-encoded, meaning it is converted to its hexadecimal characters equivalents and then sent over to the server.

Which browser supports Unicode to be typed in the address-bar or which one do really send and take a non-English domain name or file name? IE6 does not! FF does, Mozilla obviously does too Opera and Safari do not. That said about domain names, the file names hang out there waiting for an explanation. File name are another level of difficulty for true i18n. Some servers do not take other than Latin characters for file names while some others do. Windows servers tend to take it well in most cases but Linux server have a little lower compatibility with file names character encoding, it causes hiccups; question marks characters instead; all that cause download/upload problems for non-English named files.

The locale problem

Previously I’ve written about the usability for non-English websites and applications and how it is confusing to have an app in a language but still have your numbers and date formats in another format and language. That’s what happens online on non-English websites and particularly Arabic and Farsi languages. Unless you have your OS settings set to a particular locale you’re not going to get the right date and numbers/currencies formatting.

On the web, web-pages and web-apps, it is a confusing and weird to see different sets of characters adding that the addresses and domain names are still not united with the language of the site, so we witness an Arabic language web page yet showing dates, numbers, currencies, and numbered lists showing English numbers (note: you don’t get that if you’ve set your locale to an Arabic region) but if you don’t then you’re to face the variety of languages in one page, which is not true localization.

The problem resides again on two sides; the server-side and the client-side. The server has to know how to speak a certain language and how to output its locale data. Speaking of Apache, you could have the locale installed on its OS but then the tables are different on different OSes, come client-side is the same thing; it’s the OS again. Now the third element that plays the big role is the middle-ware which in our case is the web-page. How does it handle the communication and the handshake of the languages spoken by both server-OS and client-OS?

There are a number of techniques to do that which forces the Server to talk a certain language and output the proper locale along with the Client to accept and talk the same language. I’ll elaborate more on this on my next post regarding how to solve this issue and output the right locale of numbers, dates, and currency formatting. So I’ll leave it out now.

Findability and SEO issues

Search engines rely heavily on the page-title, domain name, file-name, and the copy in order to optimize an index to serve searching algorithms. Simply put, if you have a blog which resides on an English domain name and your file-names (even if they are all renamed URLs) are all in English but your page title and copy are in Arabic, you’re actually going to be loosing the search engine optimized results. To be précised, in the practical world it should not effect you a lot since all the rest of the websites that are served in Arabic would have the same problem so again it does not hurt at the moment but it could enhance the way searches are made. Or do search engines try it in a different way for non-English websites? Do tell if you know.

Related readings:
* SEO
* i18n and l10n
* Unicode
* Arabic
* Farsi

Post to Twitter Tweet This Post


Usability for non-English websites?

Posted: December 4th, 2005 | Author: dotblack | Filed under: Web-Design, i18n | 1 Comment »

I guess this would be on of the 2,3 entries for this month since this month’s going to be a little busier (I’ll tell you about it in the next posts) so, I picked up a hot subject that’s been puzzling me for a while; a subject that’s kept me busy for sometime. Information Architecrue & Usability for non-English websites.

Is language the dividing line?

Internationalization, i18n , L10n , and localization are terms that you’d face if you seek for multi-language software or web-applications. Language pack is another term you could find as a feature of many applications. So is that it? Turning left-to-right to right-to-left and changing character-set or using unicode; that’s it? Although it’s been appealing and a huge help for eliminating the language barrier against using applications it’s not enough. It’s not enough because changing interface language does not change the IA, nor the UI in most of the cases.

In our case, Arabic (or Farsi ) would be the secondary language.
Logicaly thinking, the whole ball game is different for informatin architecture and the user interface of the application or website because no matter how much westernized or how big of an English reader we’d be we still have our own mindset for the mother tongue and culture. When you think English you’d even answer yourself in English. So to relate it to our concern over here, when a user switches the interface to Farsi/Arabic mode, he/she will start skimming through the interface differently thus, different controls and texts would take his/her focus. If he/she is used to Arabic keyboard shortcuts of let’s say Arabic-Windows XP then the app should apply that.

How UI is different for Persian or any non-English language?

Language reading direction is one of the elements that defines how a UI is supposed to be constructed. What happens is that the reader/web-surfer starts identifying the language first, according to the language he/she starts skimming and folowing word tails depending on the ltr/rtl of the language. So to clarify, if the language used is Engish the user starts skimming from top-left all the way to bottom-right of the page. If not skimming and reading or looking for a particular thing on the page it still goes left to right but in rows. So what’s catchy and notable for an English reader would be totally different in the case of a right-to-left driven language. Okay, so a Farsi/Arabic web page would make the reader start all the way from top-right to bottom-left.

How would the links look like or the navigation layout in Arabic/Farsi website? This is a very tough question. Localized computing in this region (Middle East) is not that matured so to speak–it is only translated and right-to-left oriented following all the rules that an English computing methods follow. So, usualy all the links follow the global (English) kind of the look and feel (bold, underlined which I understand are the global standards for linking and hyperlinks depending on user’s defined CSS on the browser side).

Keyboard’s awkward layout. Even though the language is different still Arabic/Farsi keyboards use the same key layout used in english. My concerns are on Numbers, Tab-Key, Caps Lock key, Function keys (f1-f12), and Enter. For Arabic Numbers are read from right-to-left unlike Farsi and English that read from left-to-right so the numbering order is not the way one learns at school making computing experience a little harder when starting striking keys on early ages. Tab is very tricky, take an example of a data entry application that is in Arabic. Tab keys would go from right-to-left of the screen while Tab key is on top-left side of the keyboard. That is in contrast avoided by habit even though it is a major abnormality and anti-smoothnes on the input curve.

Localization: Many Arabic/Farsi sites fail!

So let’s get back to UI. Once text direction is defined either using CSS (direction:ltr or rtl) or using the Dir attribute of HTML tags every element on the page is oriented according to that direction.

What happens to numbers and dates? Unless you’re on an Arabic-Local box or your regional settings of your OS is set to Arabic or Farsi you’re going to get English digits or whatever you’ve set to display in your regional settings. The whole website is in Farsi/Arabic but still the numbers and dates are to be shown in English. Imagine clicking on a date control and you get a month’s days all in English and then again everything goes in Arabic. Isnt it even a little confusing. Of course it’s usable but not optimised for non-English users. gettext?

Special characters such as !, ?, ;, :, ., etc… are left-to-right characters so they make Farsi and Arabic text to get obfuscated where you can’t make-up where a sentence begins nor where it ends.

UTF-8 please! What’s worse than dialing a webpage address that shows you some weired question marks and other characters? That’s what happens when a page is made using a character-set that doesn’t exist on your computer. Even if that character-set is installed on your computer then you have to select that particular character-set and then reload the whole page which would engage 2-5 clicks. Is that something a potential prospect would do on a website? Isn’t that risky for any web presence and all the investment made on the website?

Typography

A couple of days ago AMEInfo (a leading online news provider for the Middle East region) released an Arabic version of their news website . While I’m sure the content would be great I still find the whole usage of type goes to an unwanted direction. Unlike English text if you select Arial font for instance, it’s alright. However, Arabic fonts tend to be smaller and hard to read. Many websites use bold for all kinds of text as a solution. Other smarter designers use Tahoma as it is a better font and better read. Yes, they’re really limited number of fonts available for use on web documents in Arabic.

So what’s the solution? While some of the websites use images, some others have used proper sizing for text and the weight. CSS is really enough but the font selection is a massive limiter. No, not even sIFR can help! Check out some good usage at Al-Bayan .

I’m still wondering how AMEInfo is still using non-bold styling for their navigational links. It was really confusing for me the first time I opened their homepage. I hope they’ll bold it up and use a non-black color too. AMEInfo, it’s a webpage! care about your users and make them feel home on the web. White space please!

Headers, titles, links, paragraphs, quotes, etc… follow the same rules used in English writing since they all come down from traditional writing methods used in print.

Read enough, show me the money!

No greens out here. The bottom line; changing text direction and translating the whole copy of an English website to Arabic/Farsi is not enough. Just about everything else is included. I believe researches and further development would be answering and ironing many of these concerns but just a reminder again, a language pack, and a rtl is not all that about it.

Share!

Post to Twitter Tweet This Post