GNU Info

Info Node: (libc.info)Collation Functions

(libc.info)Collation Functions


Next: Search Functions Prev: String/Array Comparison Up: String and Array Utilities
Enter node , (file) or (file)node

Collation Functions
===================

   In some locales, the conventions for lexicographic ordering differ
from the strict numeric ordering of character codes.  For example, in
Spanish most glyphs with diacritical marks such as accents are not
considered distinct letters for the purposes of collation.  On the
other hand, the two-character sequence `ll' is treated as a single
letter that is collated immediately after `l'.

   You can use the functions `strcoll' and `strxfrm' (declared in the
headers file `string.h') and `wcscoll' and `wcsxfrm' (declared in the
headers file `wchar') to compare strings using a collation ordering
appropriate for the current locale.  The locale used by these functions
in particular can be specified by setting the locale for the
`LC_COLLATE' category; see Note: Locales.

   In the standard C locale, the collation sequence for `strcoll' is
the same as that for `strcmp'.  Similarly, `wcscoll' and `wcscmp' are
the same in this situation.

   Effectively, the way these functions work is by applying a mapping to
transform the characters in a string to a byte sequence that represents
the string's position in the collating sequence of the current locale.
Comparing two such byte sequences in a simple fashion is equivalent to
comparing the strings with the locale's collating sequence.

   The functions `strcoll' and `wcscoll' perform this translation
implicitly, in order to do one comparison.  By contrast, `strxfrm' and
`wcsxfrm' perform the mapping explicitly.  If you are making multiple
comparisons using the same string or set of strings, it is likely to be
more efficient to use `strxfrm' or `wcsxfrm' to transform all the
strings just once, and subsequently compare the transformed strings
with `strcmp' or `wcscmp'.

 - Function: int strcoll (const char *S1, const char *S2)
     The `strcoll' function is similar to `strcmp' but uses the
     collating sequence of the current locale for collation (the
     `LC_COLLATE' locale).

 - Function: int wcscoll (const wchar_t *WS1, const wchar_t *WS2)
     The `wcscoll' function is similar to `wcscmp' but uses the
     collating sequence of the current locale for collation (the
     `LC_COLLATE' locale).

   Here is an example of sorting an array of strings, using `strcoll'
to compare them.  The actual sort algorithm is not written here; it
comes from `qsort' (Note: Array Sort Function).  The job of the code
shown here is to say how to compare the strings while sorting them.
(Later on in this section, we will show a way to do this more
efficiently using `strxfrm'.)

     /* This is the comparison function used with `qsort'. */
     
     int
     compare_elements (char **p1, char **p2)
     {
       return strcoll (*p1, *p2);
     }
     
     /* This is the entry point--the function to sort
        strings using the locale's collating sequence. */
     
     void
     sort_strings (char **array, int nstrings)
     {
       /* Sort `temp_array' by comparing the strings. */
       qsort (array, nstrings,
              sizeof (char *), compare_elements);
     }

 - Function: size_t strxfrm (char *restrict TO, const char *restrict
          FROM, size_t SIZE)
     The function `strxfrm' transforms the string FROM using the
     collation transformation determined by the locale currently
     selected for collation, and stores the transformed string in the
     array TO.  Up to SIZE characters (including a terminating null
     character) are stored.

     The behavior is undefined if the strings TO and FROM overlap; see
     Note: Copying and Concatenation.

     The return value is the length of the entire transformed string.
     This value is not affected by the value of SIZE, but if it is
     greater or equal than SIZE, it means that the transformed string
     did not entirely fit in the array TO.  In this case, only as much
     of the string as actually fits was stored.  To get the whole
     transformed string, call `strxfrm' again with a bigger output
     array.

     The transformed string may be longer than the original string, and
     it may also be shorter.

     If SIZE is zero, no characters are stored in TO.  In this case,
     `strxfrm' simply returns the number of characters that would be
     the length of the transformed string.  This is useful for
     determining what size the allocated array should be.  It does not
     matter what TO is if SIZE is zero; TO may even be a null pointer.

 - Function: size_t wcsxfrm (wchar_t *restrict WTO, const wchar_t
          *WFROM, size_t SIZE)
     The function `wcsxfrm' transforms wide character string WFROM
     using the collation transformation determined by the locale
     currently selected for collation, and stores the transformed
     string in the array WTO.  Up to SIZE wide characters (including a
     terminating null character) are stored.

     The behavior is undefined if the strings WTO and WFROM overlap;
     see Note: Copying and Concatenation.

     The return value is the length of the entire transformed wide
     character string.  This value is not affected by the value of
     SIZE, but if it is greater or equal than SIZE, it means that the
     transformed wide character string did not entirely fit in the
     array WTO.  In this case, only as much of the wide character
     string as actually fits was stored.  To get the whole transformed
     wide character string, call `wcsxfrm' again with a bigger output
     array.

     The transformed wide character string may be longer than the
     original wide character string, and it may also be shorter.

     If SIZE is zero, no characters are stored in TO.  In this case,
     `wcsxfrm' simply returns the number of wide characters that would
     be the length of the transformed wide character string.  This is
     useful for determining what size the allocated array should be
     (remember to multiply with `sizeof (wchar_t)').  It does not
     matter what WTO is if SIZE is zero; WTO may even be a null pointer.

   Here is an example of how you can use `strxfrm' when you plan to do
many comparisons.  It does the same thing as the previous example, but
much faster, because it has to transform each string only once, no
matter how many times it is compared with other strings.  Even the time
needed to allocate and free storage is much less than the time we save,
when there are many strings.

     struct sorter { char *input; char *transformed; };
     
     /* This is the comparison function used with `qsort'
        to sort an array of `struct sorter'. */
     
     int
     compare_elements (struct sorter *p1, struct sorter *p2)
     {
       return strcmp (p1->transformed, p2->transformed);
     }
     
     /* This is the entry point--the function to sort
        strings using the locale's collating sequence. */
     
     void
     sort_strings_fast (char **array, int nstrings)
     {
       struct sorter temp_array[nstrings];
       int i;
     
       /* Set up `temp_array'.  Each element contains
          one input string and its transformed string. */
       for (i = 0; i < nstrings; i++)
         {
           size_t length = strlen (array[i]) * 2;
           char *transformed;
           size_t transformed_length;
     
           temp_array[i].input = array[i];
     
           /* First try a buffer perhaps big enough.  */
           transformed = (char *) xmalloc (length);
     
           /* Transform `array[i]'.  */
           transformed_length = strxfrm (transformed, array[i], length);
     
           /* If the buffer was not large enough, resize it
              and try again.  */
           if (transformed_length >= length)
             {
               /* Allocate the needed space. +1 for terminating
                  `NUL' character.  */
               transformed = (char *) xrealloc (transformed,
                                                transformed_length + 1);
     
               /* The return value is not interesting because we know
                  how long the transformed string is.  */
               (void) strxfrm (transformed, array[i],
                               transformed_length + 1);
             }
     
           temp_array[i].transformed = transformed;
         }
     
       /* Sort `temp_array' by comparing transformed strings. */
       qsort (temp_array, sizeof (struct sorter),
              nstrings, compare_elements);
     
       /* Put the elements back in the permanent array
          in their sorted order. */
       for (i = 0; i < nstrings; i++)
         array[i] = temp_array[i].input;
     
       /* Free the strings we allocated. */
       for (i = 0; i < nstrings; i++)
         free (temp_array[i].transformed);
     }

   The interesting part of this code for the wide character version
would look like this:

     void
     sort_strings_fast (wchar_t **array, int nstrings)
     {
       ...
           /* Transform `array[i]'.  */
           transformed_length = wcsxfrm (transformed, array[i], length);
     
           /* If the buffer was not large enough, resize it
              and try again.  */
           if (transformed_length >= length)
             {
               /* Allocate the needed space. +1 for terminating
                  `NUL' character.  */
               transformed = (wchar_t *) xrealloc (transformed,
                                                   (transformed_length + 1)
                                                   * sizeof (wchar_t));
     
               /* The return value is not interesting because we know
                  how long the transformed string is.  */
               (void) wcsxfrm (transformed, array[i],
                               transformed_length + 1);
             }
       ...

Note the additional multiplication with `sizeof (wchar_t)' in the
`realloc' call.

   *Compatibility Note:* The string collation functions are a new
feature of ISO C90.  Older C dialects have no equivalent feature.  The
wide character versions were introduced in Amendment 1 to ISO C90.


automatically generated by info2www version 1.2.2.9