Internationalization Frequently Asked Questions

Whole document tree

www.fifi.org
    Documentation
        Manpages
        GNU Info
        Debian document tree
        Whole document tree
    Trigance web page
    Public services
    User info
    Mailing lists
    Secure server
    Multilingual usage

Validate HTML
Validate CSS

Whole document tree

Java

Internationalization
Frequently Asked Questions

This page answers common questions about internationalization of the Java 2 platform, Standard Edition, version 1.3.1, and of Sun's Java 2 Runtime Environments, Standard Edition, version 1.3.1. For more information, see the Internationalization home page.

General Questions
Locales
Resource Bundles
Text Processing
Character Encodings
Text Input
Text Rendering
Component Orientation
Miscellaneous

General Questions

What is internationalization?

Internationalization allows software to be adapted to any language and cultural convention. During the internationalization process, the programmer isolates the parts of a program that are dependent on language and culture. For example, the programmer will isolate error messages because they must be translated during localization.

What is localization?

Localization is the process of adapting a program for use in a specific locale. A locale is a geographic or political region that shares the same language and customs. Localization includes the translation of text such as GUI labels, error messages, and online help. It also includes the culture-specific formatting of data items such as monetary values, times, dates, and numbers.

How do I go about internationalizing an existing program?

See the steps outlined in the Checklist section of the The Java Tutorial.

Locales

What is a locale?

A locale is a geographic or political region that shares the same language and customs. In the Java programming language, a locale is represented by a Locale object. Locale-sensitive operations, such as collation and date formatting, vary according to locale.

Where can I find some coding examples that use `Locale` objects?

See the Setting the Locale section of the The Java Tutorial.

Which locales are supported?

The supported locales vary between different implementations of the Java 2 platform and between areas of functionality. Information about the supported locales in Sun's Java 2 Runtime Environments is provided by the Supported Locales document.

Can a Java application use multiple locales?

Yes. This capability allows you to create multilingual applications.

Resource Bundles

What is a resource bundle?

A ResourceBundle object allows you to isolate localizable elements from the rest of the application. With all resources separated into a bundle, the application simply loads the appropriate bundle for the active locale. If the user switches locales, the application just loads a different bundle.

Where can I find some coding examples that use `ResourceBundle` objects?

See the Isolating Locale-Specific Data section of the The Java Tutorial.

How do I specify non-ASCII strings in a properties file?

You can specify any Unicode character with the \uXXXX notation. (The XXXX denotes the 4 hexadecimal digits that comprise the Unicode value of a character.) For example, a properties file might have the following entries:

s1=hello there
s2=\uff2d\uff33\u30b4

If you have edited and saved the file in a non-ASCII encoding, you can convert it to ASCII with the native2ascii tool. For example, you might want to do this when editing a properties file in Shift-JIS, a popular Japanese encoding.

How do I compile a non-ASCII `ListResourceBundle`?

If your source file is in a non-ASCII encoding, you can direct the compiler to convert it into Unicode. For example, you would compile a Japanese resource bundle written in the Shift-JIS encoding as follows:

javac -encoding SJIS LabelsResource_ja.java

Text Processing

How do I format a date?

You can use the SimpleDateFormat to format and parse dates in a locale-sensitive manner. See the section on formatting Dates and Times in the The Java Tutorial.

How does setting the default locale affect the results of sorting?

The Collator class, and its subclasses, are used for building sorting routines. These classes are locale-sensitive, and when created with the no-argument constructor will use the collating sequence of the default locale.

The Collator object supports different levels of decomposition and strength. How do I choose the right decomposition and strength in a locale?

Since decomposing takes time, turning decomposition off makes comparisons go faster. However, for Latin languages the NO_DECOMPOSITION mode is not useful if the text contains accents. You should use the default decomposition unless you really know what you're doing.

The strength property you choose depends on what your application is trying to accomplish. For example, when performing a text search you may allow a "weak" match, in which accents and differences in case (upper vs. lower) are ignored. This type of search employs the PRIMARY strength. If you are sorting a list of words, you might want to use the TERTIARY strength. In this mode the properties that must match are the base character, accent, and case.

Character Encodings

What is a character encoding?

A character encoding is a mapping between characters and code values.

What is Unicode?

In the Java programming language, char values represent Unicode characters. Unicode is a 16-bit character encoding that supports the world's major languages. You can learn more about the Unicode standard at the Unicode Consortium web site.

How do I convert data between Unicode and other character encodings?

The Converting Non-Unicode Text section of the The Java Tutorial explains how to peform the conversions within an application. To convert data files, use the native2ascii tool.

Which character encodings are supported when converting text to and from Unicode?

See the Supported Encodings web page.

How do I create my own character converters?

Version 1.3 of the Java 2 platform does not provide public interfaces that let application developers create their own character converters. There is a project underway that will define such public interfaces as part of the New I/O APIs. Licensees that create their own Java 2 runtime environments can use the internal interfaces in the sun.io package to create their own character converters.

What is the default encoding?

The default encoding is selected by the Java runtime based on the host operating system and its locale. For example, in the US locale on Windows, Cp1252 is used. In the Simplified Chinese locale on Solaris, either EUC_CN or GBK can be the default encoding, depending on the selection made when logging into Solaris.

The default encoding is significant because the Java programming language uses Unicode to represent characters, but the file system of the host operating system usually uses some other encoding. The default encoding has to match the encoding used by the host operating system to ensure correct interaction.

What is the UTF-8 encoding?

UTF-8 stands for Universal Transformation Format, 8-bit encoding form. It is a transmission format for Unicode that is suitable for use with many network protocols and UNIX file systems.

Are the Cp1252 and ISO8859_1 encodings identical?

No. Cp1252 contains some additional characters in the range from 0x80 to 0x9F. See the Microsoft documentation for more information.

Text Input

What is the Input Method Framework?

The input method framework enables all text editing components to receive Japanese, Chinese, or Korean text input through input methods. An input method lets users enter thousands of different characters using keyboards with far fewer keys. Typically a sequence of several characters needs to be typed and then converted to create one or more characters. For specifications and examples see the web page, Input Method Framework.

What does it mean to switch input methods?

A user may have multiple input methods available. For example, the user may have input methods for different languages or input methods that accept various types of input. Such a user must be able to select the input method used for a particular language or the input method that provides the fastest input.

Can an input method be selected and activated programmatically?

An application can request an input method that supports a specific locale using the InputContext.selectInputMethod method, but it cannot select a specific input method - that selection is up to the user.

An application can activate an input method using the InputContext.setCompositionEnabled method.

Do the AWT and Swing (JFC) text components work with input methods?

See the Input Methods section of the JDK Software Internationalization Overview.

Text Rendering

What choices does an application have in selecting fonts?

An application can select fonts in three different ways:

Using logical font names: The Java 2 platform defines five logical font names that every implementation must support: Serif, SansSerif, Monospaced, Dialog, and DialogInput. These logical font names are mapped to physical fonts in implementation dependent ways. Typically one logical font name maps to several physical fonts in order to cover a large range of characters.
Using physical fonts: The Java 2 platform provides APIs that let an application determine which fonts are available to a given runtime and which characters these fonts can handle, and request these fonts using their real name (for example, "Times Roman" or "Helvetica"). The application can either let the user choose fonts or programmatically determine the fonts to be used.
Using the Lucida fonts: Sun's Java 2 Runtime Environments contain this family of physical fonts, which is also licensed for use in other implementations of the Java 2 platform. These fonts are physical fonts, but don't depend on the host operating system.

What are the advantages and disadvantages of these three approaches?

Here's a brief summary:

Using logical font names:
- Advantages: These font names are guaranteed to work anywhere, and they enable text rendering in at least the language that the host operating system is localized for (often a much larger range of languages).
- Disadvantages: The physical fonts used for rendering the text vary between different implementations, host operating systems, and locales, so an application can not achieve the same look everywhere. Also, the mapping mechanisms often limit the range of characters that can be rendered. For example, in Sun's Java 2 Runtime Environments Japanese text can only be rendered on Japanese localized host operating systems, not on other localized systems even if Japanese fonts have been installed.
Using physical fonts:
- Advantages: This approach lets an application take full advantage of all available fonts, to accomplish both different text appearances and maximum language coverage.
- Disadvantages: This approach is substantially harder to program.
Using the Lucida fonts:
- Advantages: Applications using these fonts can achieve the same look everywhere. Also, these fonts cover a large range of languages (especially European and Middle Eastern), so you can create fully multilingual applications for the supported languages.
- Disadvantages: These fonts may not be available in all Java 2 runtime environments. Also, they currently do not cover the complete Unicode character set; in particular, Chinese, Japanese, and Korean are not supported.

Why doesn't my application display any Chinese, Japanese, or Korean characters even though I have fonts for these languages installed?

The answer depends on how your application selects fonts - see above.

Using logical font names: To use a physical font, it must be selected by the mapping mechanism. In Sun's Java 2 Runtime Environments, fonts for Chinese, Japanese, or Korean are only selected when running on host operating systems localized for these specific languages. To change the mapping, you need to modify a font.properties file - see below.
Using physical fonts: Your application may not be selecting the fonts correctly, or the font may be using an encoding that's not supported by the Java 2 Runtime Environment.
Using the Lucida fonts: The Lucida fonts included in Sun's Java 2 Runtime Environments do not support Chinese, Japanese, or Korean.

What is a font.properties file?

The font.properties files are used in Sun's Java 2 Runtime Environments to map logical font names to physical fonts. There are several files to support different mappings depending on host operating system version and locale. The files are located in the lib directory within the J2RE installation.

Note that font.properties files are implementation dependent. Not all implementations of the Java 2 platform use them, and the format and content vary between different runtime environments as well as between releases.

How do I add a physical font to the mapping of a logical font?

Since the mapping from logical fonts to physical fonts is implementation dependent, the answer varies. For Sun's Java 2 Runtime Environments, you need to create or modify a font.properties file - see the web page Editing the font.properties Files. Note however that this is a modification of the J2RE, and Sun does not support modified J2REs. For other implementations, see their respective documentation.

Why can I see some characters in Swing components, but not in peered AWT components?

Swing user interface components use a different mechanism to render text than peered AWT components. The Swing components use the Graphics.drawString method, typically specifying a logical font name. The logical font name is then mapped to a set of physical fonts to cover a large range of characters. AWT components on the other hand are implemented using host operating system components. These host operating system components often do not support Unicode, so the text gets converted to some other character encoding, depending on the host operating system and locale. These encodings often cover a smaller range of characters than the physical fonts used to implement logical font names. For example, on a Japanese Windows system, many European accented characters are mapped to the Arial font for Swing components, but get lost when converting the text to the Shift-JIS encoding for peered AWT components.

Why can't my application display all Unicode characters even though I have a Unicode font installed?

As in the Chinese/Japanese/Korean case above, this may be because text is not rendered using the Unicode font at all or only for some characters. If your application selects the Unicode font using its physical font name, and it still cannot render all characters, it could be that the Unicode font doesn't in fact cover the entire Unicode character set - sometimes a font is called a Unicode font if it just provides the tables that support the Unicode character encoding.

What font types do Sun's Java 2 Runtime Environments support?

Sun's Java 2 Runtime Environment for Windows supports TrueType and Type1 fonts. Sun's Java 2 Runtime Environment for Solaris supports outline fonts that can be handled by an X11 server, such as F3, Type1, and TrueType.

Is it possible to display more than one language in Sun's Java 2 Runtime Environments?

The short answer is yes. The long answer needs to look at which languages you want to display at the same time, and how your application selects fonts.

It is quite common for a group of languages to share a small common character set - for example, the Western European languages can be written in the ISO 8859-1 character set. If you only need to display languages within such a group, you usually don't need to do anything - it will just work.
If the languages you need to display are all supported by the Lucida font family, and your application only needs to run on Java 2 runtime environments that contain this font family, you can simply use fonts from that family.
If you need to support languages using separate character ranges, and your application selects fonts using logical font names, you need to create a font.properties file that supports all the languages. See the web page, Editing the font.properties Files, for details.
If you need to support languages using separate character ranges, and your application selects fonts using physical names, you need to select the fonts using information about the range of characters that they support.

Can Sun's Java 2 Runtime Environment render text in Thai, Lao, Burmese, or any of the Indic scripts?

No, the font rendering system in version 1.3.1 of Sun's Java 2 Runtime Environments cannot handle the complex layout rules of these scripts. There's work underway to add support for Thai and Hindi in a future release. Also, there may be other Java 2 runtime environments that do support these scripts.

Why do I see question marks and illegible text in Traditional Chinese?

Sun's Java 2 SDK and Runtime Environment version 1.3.0 for Windows had two serious bugs affecting Traditional Chinese:

The Swing dialogs and the output of several tools, including the javac compiler, show up as a mix of Chinese characters and question marks (bug 4339627).
In the Plugin control panel, Chinese characters are rendered in such poor quality that the text is illegible (bug 4346273).

These bugs have been fixed since version 1.3.0_01.

Component Orientation

Which user interface components implement component orientation in Sun's Java 2 Runtime Environments?

See the Supported Locales document.

Miscellaneous

Do Sun's Java 2 Runtime Environments support the Euro currency?

Yes, Sun's Java 2 Runtime Environments let you type the Euro character, render it, convert it from and to numerous character encodings, and use it when formatting numeric values as currency. For text input and rendering, you need the appropriate support in the host operating system - see the documentation for Windows and Solaris (general information and patches). For formatting, you just need to request a locale with the "EURO" variant.

Please send comments to: java-intl@java.sun.com

Java Software

Internationalization Frequently Asked Questions

General Questions

What is internationalization?

What is localization?

How do I go about internationalizing an existing program?

Locales

What is a locale?

Where can I find some coding examples that use Locale objects?

Which locales are supported?

Can a Java application use multiple locales?

Resource Bundles

What is a resource bundle?

Where can I find some coding examples that use ResourceBundle objects?

How do I specify non-ASCII strings in a properties file?

How do I compile a non-ASCII ListResourceBundle?

Text Processing

How do I format a date?

How does setting the default locale affect the results of sorting?

The Collator object supports different levels of decomposition and strength. How do I choose the right decomposition and strength in a locale?

Character Encodings

What is a character encoding?

What is Unicode?

How do I convert data between Unicode and other character encodings?

Which character encodings are supported when converting text to and from Unicode?

How do I create my own character converters?

What is the default encoding?

What is the UTF-8 encoding?

Are the Cp1252 and ISO8859_1 encodings identical?

Text Input

What is the Input Method Framework?

What does it mean to switch input methods?

Can an input method be selected and activated programmatically?

Do the AWT and Swing (JFC) text components work with input methods?

Text Rendering

What choices does an application have in selecting fonts?

What are the advantages and disadvantages of these three approaches?

Why doesn't my application display any Chinese, Japanese, or Korean characters even though I have fonts for these languages installed?

What is a font.properties file?

How do I add a physical font to the mapping of a logical font?

Why can I see some characters in Swing components, but not in peered AWT components?

Why can't my application display all Unicode characters even though I have a Unicode font installed?

What font types do Sun's Java 2 Runtime Environments support?

Is it possible to display more than one language in Sun's Java 2 Runtime Environments?

Can Sun's Java 2 Runtime Environment render text in Thai, Lao, Burmese, or any of the Indic scripts?

Why do I see question marks and illegible text in Traditional Chinese?

Component Orientation

Which user interface components implement component orientation in Sun's Java 2 Runtime Environments?

Miscellaneous

Do Sun's Java 2 Runtime Environments support the Euro currency?

Internationalization
Frequently Asked Questions

Where can I find some coding examples that use `Locale` objects?

Where can I find some coding examples that use `ResourceBundle` objects?

How do I compile a non-ASCII `ListResourceBundle`?