This page answers common questions about internationalization of
the Java 2 platform, Standard Edition, version 1.3.1, and of Sun's
Java 2 Runtime Environments, Standard Edition, version 1.3.1. For
more information, see the
Internationalization home page.
Internationalization allows software to be adapted to any language
and cultural convention. During the internationalization process, the
programmer isolates the parts of a program that are dependent on
language and culture. For example, the programmer will isolate error
messages because they must be translated during localization.
What is localization?
Localization is the process of adapting a program for use in a
specific locale. A locale is a geographic or political region that
shares the same language and customs. Localization includes the
translation of text such as GUI labels, error messages, and online
help. It also includes the culture-specific formatting of data items
such as monetary values, times, dates, and numbers.
How do I go about internationalizing an existing program?
A locale is a geographic or political region that shares the same
language and customs. In the Java programming language, a locale is
represented by a Locale object. Locale-sensitive
operations, such as collation and date formatting, vary according to
locale.
Where can I find some coding examples that use
Locale objects?
The supported locales vary between different implementations of
the Java 2 platform and between areas of functionality. Information
about the supported locales in Sun's Java 2 Runtime Environments is
provided by the Supported Locales
document.
Can a Java application use multiple locales?
Yes. This capability allows you to create multilingual
applications.
Resource Bundles
What is a resource bundle?
A ResourceBundle object allows you to isolate
localizable elements from the rest of the application. With all
resources separated into a bundle, the application simply loads the
appropriate bundle for the active locale. If the user switches
locales, the application just loads a different bundle.
Where can I find some coding examples that use
ResourceBundle objects?
How do I specify non-ASCII strings in a properties file?
You can specify any Unicode character with the \uXXXX
notation. (The XXXX denotes the 4 hexadecimal digits that
comprise the Unicode value of a character.) For example, a properties
file might have the following entries:
s1=hello there
s2=\uff2d\uff33\u30b4
If you have edited and saved the file in a non-ASCII encoding, you
can convert it to ASCII with the
native2ascii
tool. For example, you might want to do this when editing a
properties file in Shift-JIS, a popular Japanese encoding.
How do I compile a non-ASCII ListResourceBundle?
If your source file is in a non-ASCII encoding, you can direct the
compiler to convert it into Unicode. For example, you would compile a
Japanese resource bundle written in the Shift-JIS encoding as
follows:
javac -encoding SJIS LabelsResource_ja.java
Text Processing
How do I format a date?
You can use the SimpleDateFormat to format and parse
dates in a locale-sensitive manner. See the section on formatting
Dates
and Times in the
The Java
Tutorial.
How does setting the default locale affect the results of
sorting?
The Collator class, and its subclasses, are used for
building sorting routines. These classes are locale-sensitive, and
when created with the no-argument constructor will use the collating
sequence of the default locale.
The Collator object supports different levels of decomposition
and strength. How do I choose the right decomposition and strength in
a locale?
Since decomposing takes time, turning decomposition off makes
comparisons go faster. However, for Latin languages the
NO_DECOMPOSITION mode is not useful if the text contains
accents. You should use the default decomposition unless you really
know what you're doing.
The strength property you choose depends on what your application
is trying to accomplish. For example, when performing a text search
you may allow a "weak" match, in which accents and differences in
case (upper vs. lower) are ignored. This type of search employs the
PRIMARY strength. If you are sorting a list of words,
you might want to use the TERTIARY strength. In this
mode the properties that must match are the base character, accent,
and case.
Character Encodings
What is a character encoding?
A character encoding is a mapping between characters and code
values.
What is Unicode?
In the Java programming language, char values
represent Unicode characters. Unicode is a 16-bit character encoding
that supports the world's major languages. You can learn more about
the Unicode standard at the Unicode
Consortium web site.
How do I convert data between Unicode and other character
encodings?
Version 1.3 of the Java 2 platform does not provide public
interfaces that let application developers create their own character
converters. There is a project underway that will define such public
interfaces as part of the
New
I/O APIs. Licensees that create their own Java 2 runtime
environments can use the internal interfaces in the sun.io package to
create their own character converters.
What is the default encoding?
The default encoding is selected by the Java runtime based on the
host operating system and its locale. For example, in the US locale
on Windows, Cp1252 is used. In the Simplified Chinese locale on
Solaris, either EUC_CN or GBK can be the default encoding, depending
on the selection made when logging into Solaris.
The default encoding is significant because the Java programming
language uses Unicode to represent characters, but the file system of
the host operating system usually uses some other encoding. The
default encoding has to match the encoding used by the host operating
system to ensure correct interaction.
What is the UTF-8 encoding?
UTF-8 stands for Universal Transformation Format, 8-bit encoding
form. It is a transmission format for Unicode that is suitable for
use with many network protocols and UNIX file systems.
Are the Cp1252 and ISO8859_1 encodings identical?
No. Cp1252 contains some additional characters in the range from
0x80 to 0x9F. See the
Microsoft
documentation for more information.
Text Input
What is the Input Method Framework?
The input method framework enables all text editing components to
receive Japanese, Chinese, or Korean text input through input
methods. An input method lets users enter thousands of different
characters using keyboards with far fewer keys. Typically a sequence
of several characters needs to be typed and then converted to create
one or more characters. For specifications and examples see the web
page, Input Method Framework.
What does it mean to switch input methods?
A user may have multiple input methods available. For example, the
user may have input methods for different languages or input methods
that accept various types of input. Such a user must be able to
select the input method used for a particular language or the input
method that provides the fastest input.
Can an input method be selected and activated programmatically?
An application can request an input method that supports a
specific locale using the
InputContext.selectInputMethod
method, but it cannot select a specific input method - that selection
is up to the user.
What choices does an application
have in selecting fonts?
An application can select fonts in three different ways:
Using logical font names: The Java 2 platform defines five
logical font names that every implementation must support: Serif,
SansSerif, Monospaced, Dialog, and DialogInput. These logical font
names are mapped to physical fonts in implementation dependent
ways. Typically one logical font name maps to several physical
fonts in order to cover a large range of characters.
Using physical fonts: The Java 2 platform provides APIs that
let an application determine which fonts are available to a given
runtime and which characters these fonts can handle, and request
these fonts using their real name (for example, "Times Roman" or
"Helvetica"). The application can either let the user choose fonts
or programmatically determine the fonts to be used.
Using the Lucida fonts: Sun's Java 2 Runtime Environments
contain this family of physical fonts, which is also licensed for
use in other implementations of the Java 2 platform. These fonts
are physical fonts, but don't depend on the host operating system.
What are the advantages and disadvantages of these three
approaches?
Here's a brief summary:
Using logical font names:
Advantages: These font names are guaranteed to work
anywhere, and they enable text rendering in at least the
language that the host operating system is localized for (often
a much larger range of languages).
Disadvantages: The physical fonts used for rendering the
text vary between different implementations, host operating
systems, and locales, so an application can not achieve the
same look everywhere. Also, the mapping mechanisms often limit
the range of characters that can be rendered. For example, in
Sun's Java 2 Runtime Environments Japanese text can only be
rendered on Japanese localized host operating systems, not on
other localized systems even if Japanese fonts have been
installed.
Using physical fonts:
Advantages: This approach lets an application take full
advantage of all available fonts, to accomplish both different
text appearances and maximum language coverage.
Disadvantages: This approach is substantially harder to
program.
Using the Lucida fonts:
Advantages: Applications using these fonts can achieve the
same look everywhere. Also, these fonts cover a large range of
languages (especially European and Middle Eastern), so you can
create fully multilingual applications for the supported
languages.
Disadvantages: These fonts may not be available in all Java
2 runtime environments. Also, they currently do not cover the
complete Unicode character set; in particular, Chinese,
Japanese, and Korean are not supported.
Why doesn't my application display any Chinese,
Japanese, or Korean characters even though I have fonts for these
languages installed?
The answer depends on how your application selects fonts - see
above.
Using logical font names: To use a physical font, it must be
selected by the mapping mechanism. In Sun's Java 2 Runtime
Environments, fonts for Chinese, Japanese, or Korean are only
selected when running on host operating systems localized for
these specific languages. To change the mapping, you need to
modify a font.properties file - see below.
Using physical fonts: Your application may not be selecting
the fonts correctly, or the font may be using an encoding that's
not supported by the Java 2 Runtime Environment.
Using the Lucida fonts: The Lucida fonts included in Sun's
Java 2 Runtime Environments do not support Chinese, Japanese, or
Korean.
What is a font.properties file?
The font.properties files are used in Sun's Java 2
Runtime Environments to map logical font names to physical fonts.
There are several files to support different mappings depending on
host operating system version and locale. The files are located in
the lib directory within the J2RE installation.
Note that font.properties files are implementation dependent. Not
all implementations of the Java 2 platform use them, and the format
and content vary between different runtime environments as well as
between releases.
How do I add a physical font to the mapping of a logical font?
Since the mapping from logical fonts to physical fonts is
implementation dependent, the answer varies. For Sun's Java 2 Runtime
Environments, you need to create or modify a font.properties
file - see the web page Editing the
font.properties Files. Note however that this is a modification
of the J2RE, and Sun does not support modified J2REs. For other
implementations, see their respective documentation.
Why can I see some characters in Swing components, but not in
peered AWT components?
Swing user interface components use a different mechanism to
render text than peered AWT components. The Swing components use the
Graphics.drawString method, typically specifying a logical font name.
The logical font name is then mapped to a set of physical fonts to
cover a large range of characters. AWT components on the other hand
are implemented using host operating system components. These host
operating system components often do not support Unicode, so the text
gets converted to some other character encoding, depending on the
host operating system and locale. These encodings often cover a
smaller range of characters than the physical fonts used to implement
logical font names. For example, on a Japanese Windows system, many
European accented characters are mapped to the Arial font for Swing
components, but get lost when converting the text to the Shift-JIS
encoding for peered AWT components.
Why can't my application display all Unicode characters even
though I have a Unicode font installed?
As in the Chinese/Japanese/Korean case above,
this may be because text is not rendered using the Unicode font at
all or only for some characters. If your application selects the
Unicode font using its physical font name, and it still cannot render
all characters, it could be that the Unicode font doesn't in fact
cover the entire Unicode character set - sometimes a font is called a
Unicode font if it just provides the tables that support the Unicode
character encoding.
What font types do Sun's Java 2 Runtime Environments support?
Sun's Java 2 Runtime Environment for Windows supports TrueType and
Type1 fonts. Sun's Java 2 Runtime Environment for Solaris supports
outline fonts that can be handled by an X11 server, such as F3,
Type1, and TrueType.
Is it possible to display more than one language in Sun's Java 2
Runtime Environments?
The short answer is yes. The long answer needs to look at which
languages you want to display at the same time, and how your
application selects fonts.
It is quite common for a group of languages to share a small
common character set - for example, the Western European languages
can be written in the ISO 8859-1 character set. If you only need
to display languages within such a group, you usually don't need
to do anything - it will just work.
If the languages you need to display are all supported by the
Lucida font family, and your application only needs to run on Java
2 runtime environments that contain this font family, you can
simply use fonts from that family.
If you need to support languages using separate character
ranges, and your application selects fonts using logical font
names, you need to create a font.properties file that supports all
the languages. See the web page, Editing
the font.properties Files, for details.
If you need to support languages using separate character
ranges, and your application selects fonts using physical names,
you need to select the fonts using information about the range of
characters that they support.
Can Sun's Java 2 Runtime Environment render text in Thai, Lao,
Burmese, or any of the Indic scripts?
No, the font rendering system in version 1.3.1 of Sun's Java 2
Runtime Environments cannot handle the complex layout rules of these
scripts. There's work underway to add support for Thai and Hindi in a
future release. Also, there may be other Java 2 runtime environments
that do support these scripts.
Why do I see question marks and illegible text in Traditional
Chinese?
Sun's Java 2 SDK and Runtime Environment version 1.3.0 for Windows
had two serious bugs affecting Traditional Chinese:
The Swing dialogs and the output of several tools, including
the javac compiler, show up as a mix of Chinese characters and
question marks (bug
4339627).
In the Plugin control panel, Chinese characters are rendered
in such poor quality that the text is illegible (bug
4346273).
These bugs have been fixed since version 1.3.0_01.
Component Orientation
Which user interface components implement component orientation
in Sun's Java 2 Runtime Environments?
Do Sun's Java 2 Runtime Environments support the Euro currency?
Yes, Sun's Java 2 Runtime Environments let you type the Euro
character, render it, convert it from and to numerous character
encodings, and use it when formatting numeric values as currency. For
text input and rendering, you need the appropriate support in the
host operating system - see the documentation for
Windows and
Solaris
(general
information and
patches).
For formatting, you just need to request a locale with the "EURO"
variant.