GNU Info

Info Node: (libc.info)The message catalog files

(libc.info)The message catalog files


Next: The gencat program Prev: The catgets Functions Up: Message catalogs a la X/Open
Enter node , (file) or (file)node

Format of the message catalog files
-----------------------------------

   The only reasonable way the translate all the messages of a function
and store the result in a message catalog file which can be read by the
`catopen' function is to write all the message text to the translator
and let her/him translate them all.  I.e., we must have a file with
entries which associate the set/message tuple with a specific
translation.  This file format is specified in the X/Open standard and
is as follows:

   * Lines containing only whitespace characters or empty lines are
     ignored.

   * Lines which contain as the first non-whitespace character a `$'
     followed by a whitespace character are comment and are also
     ignored.

   * If a line contains as the first non-whitespace characters the
     sequence `$set' followed by a whitespace character an additional
     argument is required to follow.  This argument can either be:

        - a number.  In this case the value of this number determines
          the set to which the following messages are added.

        - an identifier consisting of alphanumeric characters plus the
          underscore character.  In this case the set get automatically
          a number assigned.  This value is one added to the largest
          set number which so far appeared.

          How to use the symbolic names is explained in section Note:
          Common Usage.

          It is an error if a symbol name appears more than once.  All
          following messages are placed in a set with this number.

   * If a line contains as the first non-whitespace characters the
     sequence `$delset' followed by a whitespace character an
     additional argument is required to follow.  This argument can
     either be:

        - a number.  In this case the value of this number determines
          the set which will be deleted.

        - an identifier consisting of alphanumeric characters plus the
          underscore character.  This symbolic identifier must match a
          name for a set which previously was defined.  It is an error
          if the name is unknown.

     In both cases all messages in the specified set will be removed.
     They will not appear in the output.  But if this set is later
     again selected with a `$set' command again messages could be added
     and these messages will appear in the output.

   * If a line contains after leading whitespaces the sequence
     `$quote', the quoting character used for this input file is
     changed to the first non-whitespace character following the
     `$quote'.  If no non-whitespace character is present before the
     line ends quoting is disable.

     By default no quoting character is used.  In this mode strings are
     terminated with the first unescaped line break.  If there is a
     `$quote' sequence present newline need not be escaped.  Instead a
     string is terminated with the first unescaped appearance of the
     quote character.

     A common usage of this feature would be to set the quote character
     to `"'.  Then any appearance of the `"' in the strings must be
     escaped using the backslash (i.e., `\"' must be written).

   * Any other line must start with a number or an alphanumeric
     identifier (with the underscore character included).  The
     following characters (starting after the first whitespace
     character) will form the string which gets associated with the
     currently selected set and the message number represented by the
     number and identifier respectively.

     If the start of the line is a number the message number is
     obvious.  It is an error if the same message number already
     appeared for this set.

     If the leading token was an identifier the message number gets
     automatically assigned.  The value is the current maximum messages
     number for this set plus one.  It is an error if the identifier was
     already used for a message in this set.  It is OK to reuse the
     identifier for a message in another thread.  How to use the
     symbolic identifiers will be explained below (Note: Common
     Usage).  There is one limitation with the identifier: it must
     not be `Set'.  The reason will be explained below.

     The text of the messages can contain escape characters.  The usual
     bunch of characters known from the ISO C language are recognized
     (`\n', `\t', `\v', `\b', `\r', `\f', `\\', and `\NNN', where NNN
     is the octal coding of a character code).

   *Important:* The handling of identifiers instead of numbers for the
set and messages is a GNU extension.  Systems strictly following the
X/Open specification do not have this feature.  An example for a message
catalog file is this:

     $ This is a leading comment.
     $quote "
     
     $set SetOne
     1 Message with ID 1.
     two "   Message with ID \"two\", which gets the value 2 assigned"
     
     $set SetTwo
     $ Since the last set got the number 1 assigned this set has number 2.
     4000 "The numbers can be arbitrary, they need not start at one."

   This small example shows various aspects:
   * Lines 1 and 9 are comments since they start with `$' followed by a
     whitespace.

   * The quoting character is set to `"'.  Otherwise the quotes in the
     message definition would have to be left away and in this case the
     message with the identifier `two' would loose its leading
     whitespace.

   * Mixing numbered messages with message having symbolic names is no
     problem and the numbering happens automatically.

   While this file format is pretty easy it is not the best possible for
use in a running program.  The `catopen' function would have to parser
the file and handle syntactic errors gracefully.  This is not so easy
and the whole process is pretty slow.  Therefore the `catgets'
functions expect the data in another more compact and ready-to-use file
format.  There is a special program `gencat' which is explained in
detail in the next section.

   Files in this other format are not human readable.  To be easy to
use by programs it is a binary file.  But the format is byte order
independent so translation files can be shared by systems of arbitrary
architecture (as long as they use the GNU C Library).

   Details about the binary file format are not important to know since
these files are always created by the `gencat' program.  The sources of
the GNU C Library also provide the sources for the `gencat' program and
so the interested reader can look through these source files to learn
about the file format.


automatically generated by info2www version 1.2.2.9