Internationalization refers to the process of planning and designing products and services so that they can easily be adapted to various local languages and cultures without engineering changes. Sometimes the internationalization process is called translation or localization enablement. For Web applications, this has very importance because the potential users may be present worldwide. Enablement can include:
- Allowing space in user interfaces (for example, help pages, hardware labels, and online menus) for translation into languages that require more characters
- Developing with products (such as Web editors or authoring tools) that can support international character sets ( Unicode )
- Creating print or Web site graphic images so that their text labels can be translated inexpensively
- Using written examples that have global meaning
- For software, ensuring data space so that messages can be translated from languages with single-byte character codes (such as English) into languages requiring multiple-byte character codes (such as Japanese Kanji)
There are many benefits to i18n, including:
- Easier adaptation of software applications (or other content) to multiple locales
- Reduced time and cost for localization
- Single, internationalized source code for all versions of the product
- Simpler maintenance
- Improved quality and code architecture
- Reduced overall cost of ownership of the multiple versions of the product
- Adherence to international standards
The main goal of the internationalization (I18N) process is to separate all texts that will appear to the user, as well as the locale-specific features (i.e. date and time formats, currency, decimal separators, etc.), from the main product features. This process should take place during the development cycle for any given application (traditional or web-driven), because at this stage in the development lifecycle, changes to features and design can be implemented in a more cost-effective manner. Internationalization should be performed upfront in order to minimize development costs.
Most internationalization issues directly affect the localization process, which inevitably follows any internationalization effort. Below you will find a list of some of the more common internationalization (I18N) issues that should be addressed:
- hard-coded text strings
- enablement of the different character sets (i.e. double-byte and bi-directional)
- input methods and keyboard layouts
- manually generated TOCs and Indexes
- text within graphics
Internationalization Service Providers each have their own methodology for providing internationalization (I18N) services. The basic phases should include:
Discovery: includes introductory education to I18N & L10N development issues, Q & A sessions with all software internationalization stakeholders, internationalization kit preparation, and a review of current software internationalization readiness
Assessment: includes review and analysis of application, review and analysis of global marketing plans and requirements, design development and build processes, review and analysis of current I18N and L10N strategies, review and analysis of source code
Implementation: includes externalization of hard-coded strings for ease of localization, currency/time/dates/numbers issue resolution, double-byte enabling, I18N-friendly build methodology, I18N test plan preparations, localization kit preparation, knowledge transfer I18N education, recommended I18N tools and any required tool training
Internationalization of HTML Document
First, you need to remember that specifying the language is meant to encompass the page as a whole. If you write Web pages with multiple languages on them, you should define the base language of the page, and then call out the other languages as separate language elements on the page.
Then, you need to know that declaring the language is separate from declaring the character encoding of the document. Your server might define the character encoding automatically, but to be sure, it's a good idea to include the following meta tag in your XHTML and HTML documents:
<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
Finally, the text direction is also not specified by the language declaration. For example, in some languages like Hebrew and Arabic text is displayed and read from right to left, but numbers and text from other scripts are displayed (and read) from left to right. To define the text direction you can use the dir attribute in the html element to define the text direction for the entire page.
There are several ways you can define the language of your document in HTML and XHTML:
Setting the lang attribute on the html tag will define the language for the entire document. This attribute can then be overridden within the HTML for content areas that contain text in another language.
You can use a "Content-language" meta tag or a Dublin core "language" meta tag to define the language used by the document.
You can set up your Web server to send the language information in the HTTP headers.
What is the Best Way to Define the Language of the Web Page Content?
According to the W3C best practices document, the best way to define the language of your HTML and XHTML documents is with an attribute on the html tag. They say:
Always declare the default language for text in the page using attributes on the html tag, unless the document contains content aimed at speakers of more than one language.
To define the language in an HTML 4.01 document:
This defines the language as being U.S. English.
If you're writing XHTML that is delivered as type "text/html", you should use both the lang attribute and the xml:lang attribute:
<html lang="en-US" xml:lang="en-US" xmlns ="http://www.w3.org/1999/xhtml">
And if you're serving XHTML pages as XML (such as if you're serving XHTML 1.1 documents), use the xml:lang attribute alone.
<html xml:lang="en-US" xmlns ="http://www.w3.org/1999/xhtml">