GNU Info

Info Node: (gettext.info)Aspects

(gettext.info)Aspects


Next: Files Prev: Concepts Up: Introduction
Enter node , (file) or (file)node

Aspects in Native Language Support
==================================

   For a totally multi-lingual distribution, there are many things to
translate beyond output messages.

   * As of today, GNU `gettext' offers a complete toolset for
     translating messages output by C programs.  Perl scripts and shell
     scripts will also need to be translated.  Even if there are today
     some hooks by which this can be done, these hooks are not
     integrated as well as they should be.

   * Some programs, like `autoconf' or `bison', are able to produce
     other programs (or scripts).  Even if the generating programs
     themselves are internationalized, the generated programs they
     produce may need internationalization on their own, and this
     indirect internationalization could be automated right from the
     generating program.  In fact, quite usually, generating and
     generated programs could be internationalized independently, as
     the effort needed is fairly orthogonal.

   * A few programs include textual tables which might need translation
     themselves, independently of the strings contained in the program
     itself.  For example, RFC 1345 gives an English description for
     each character which the `recode' program is able to reconstruct
     at execution.  Since these descriptions are extracted from the RFC
     by mechanical means, translating them properly would require a
     prior translation of the RFC itself.

   * Almost all programs accept options, which are often worded out so
     to be descriptive for the English readers; one might want to
     consider offering translated versions for program options as well.

   * Many programs read, interpret, compile, or are somewhat driven by
     input files which are texts containing keywords, identifiers, or
     replies which are inherently translatable.  For example, one may
     want `gcc' to allow diacriticized characters in identifiers or use
     translated keywords; `rm -i' might accept something else than `y'
     or `n' for replies, etc.  Even if the program will eventually make
     most of its output in the foreign languages, one has to decide
     whether the input syntax, option values, etc., are to be localized
     or not.

   * The manual accompanying a package, as well as all documentation
     files in the distribution, could surely be translated, too.
     Translating a manual, with the intent of later keeping up with
     updates, is a major undertaking in itself, generally.


   As we already stressed, translation is only one aspect of locales.
Other internationalization aspects are system services and are handled
in GNU `libc'.  There are many attributes that are needed to define a
country's cultural conventions.  These attributes include beside the
country's native language, the formatting of the date and time, the
representation of numbers, the symbols for currency, etc.  These local
"rules" are termed the country's locale.  The locale represents the
knowledge needed to support the country's native attributes.

   There are a few major areas which may vary between countries and
hence, define what a locale must describe.  The following list helps
putting multi-lingual messages into the proper context of other tasks
related to locales.  See the GNU `libc' manual for details.

_Characters and Codesets_
     The codeset most commonly used through out the USA and most English
     speaking parts of the world is the ASCII codeset.  However, there
     are many characters needed by various locales that are not found
     within this codeset.  The 8-bit ISO 8859-1 code set has most of
     the special characters needed to handle the major European
     languages.  However, in many cases, the ISO 8859-1 font is not
     adequate.  Hence each locale will need to specify which codeset
     they need to use and will need to have the appropriate character
     handling routines to cope with the codeset.

_Currency_
     The symbols used vary from country to country as does the position
     used by the symbol.  Software needs to be able to transparently
     display currency figures in the native mode for each locale.

_Dates_
     The format of date varies between locales.  For example, Christmas
     day in 1994 is written as 12/25/94 in the USA and as 25/12/94 in
     Australia.  Other countries might use ISO 8061 dates, etc.

     Time of the day may be noted as HH:MM, HH.MM, or otherwise.  Some
     locales require time to be specified in 24-hour mode rather than
     as AM or PM.  Further, the nature and yearly extent of the
     Daylight Saving correction vary widely between countries.

_Numbers_
     Numbers can be represented differently in different locales.  For
     example, the following numbers are all written correctly for their
     respective locales:

          12,345.67       English
          12.345,67       French
          1,2345.67       Asia

     Some programs could go further and use different unit systems, like
     English units or Metric units, or even take into account variants
     about how numbers are spelled in full.

_Messages_
     The most obvious area is the language support within a locale.
     This is where GNU `gettext' provides the means for developers and
     users to easily change the language that the software uses to
     communicate to the user.

   Components of locale outside of message handling are standardized in
the ISO C standard and the SUSV2 specification.  GNU `libc' fully
implements this, and most other modern systems provide a more or less
reasonable support for at least some of the missing components.


automatically generated by info2www version 1.2.2.9