GNU Info

Info Node: (libc.info)Keeping the state

(libc.info)Keeping the state


Next: Converting a Character Prev: Selecting the Conversion Up: Restartable multibyte conversion
Enter node , (file) or (file)node

Representing the state of the conversion
----------------------------------------

   In the introduction of this chapter it was said that certain
character sets use a "stateful" encoding.  That is, the encoded values
depend in some way on the previous bytes in the text.

   Since the conversion functions allow converting a text in more than
one step we must have a way to pass this information from one call of
the functions to another.

 - Data type: mbstate_t
     A variable of type `mbstate_t' can contain all the information
     about the "shift state" needed from one call to a conversion
     function to another.

     `mbstate_t' is defined in `wchar.h'.  It was introduced in
     Amendment 1 to ISO C90.

   To use objects of type `mbstate_t' the programmer has to define such
objects (normally as local variables on the stack) and pass a pointer to
the object to the conversion functions.  This way the conversion
function can update the object if the current multibyte character set
is stateful.

   There is no specific function or initializer to put the state object
in any specific state.  The rules are that the object should always
represent the initial state before the first use, and this is achieved
by clearing the whole variable with code such as follows:

     {
       mbstate_t state;
       memset (&state, '\0', sizeof (state));
       /* from now on STATE can be used.  */
       ...
     }

   When using the conversion functions to generate output it is often
necessary to test whether the current state corresponds to the initial
state.  This is necessary, for example, to decide whether to emit
escape sequences to set the state to the initial state at certain
sequence points.  Communication protocols often require this.

 - Function: int mbsinit (const mbstate_t *PS)
     The `mbsinit' function determines whether the state object pointed
     to by PS is in the initial state.  If PS is a null pointer or the
     object is in the initial state the return value is nonzero.
     Otherwise it is zero.

     `mbsinit' was introduced in Amendment 1 to ISO C90 and is declared
     in `wchar.h'.

   Code using `mbsinit' often looks similar to this:

     {
       mbstate_t state;
       memset (&state, '\0', sizeof (state));
       /* Use STATE.  */
       ...
       if (! mbsinit (&state))
         {
           /* Emit code to return to initial state.  */
           const wchar_t empty[] = L"";
           const wchar_t *srcp = empty;
           wcsrtombs (outbuf, &srcp, outbuflen, &state);
         }
       ...
     }

   The code to emit the escape sequence to get back to the initial
state is interesting.  The `wcsrtombs' function can be used to
determine the necessary output code (Note: Converting Strings).
Please note that on GNU systems it is not necessary to perform this
extra action for the conversion from multibyte text to wide character
text since the wide character encoding is not stateful.  But there is
nothing mentioned in any standard that prohibits making `wchar_t' using
a stateful encoding.


automatically generated by info2www version 1.2.2.9