GNU Info

Info Node: (libc.info)Copying and Concatenation

(libc.info)Copying and Concatenation


Next: String/Array Comparison Prev: String Length Up: String and Array Utilities
Enter node , (file) or (file)node

Copying and Concatenation
=========================

   You can use the functions described in this section to copy the
contents of strings and arrays, or to append the contents of one string
to another.  The `str' and `mem' functions are declared in the header
file `string.h' while the `wstr' and `wmem' functions are declared in
the file `wchar.h'.

   A helpful way to remember the ordering of the arguments to the
functions in this section is that it corresponds to an assignment
expression, with the destination array specified to the left of the
source array.  All of these functions return the address of the
destination array.

   Most of these functions do not work properly if the source and
destination arrays overlap.  For example, if the beginning of the
destination array overlaps the end of the source array, the original
contents of that part of the source array may get overwritten before it
is copied.  Even worse, in the case of the string functions, the null
character marking the end of the string may be lost, and the copy
function might get stuck in a loop trashing all the memory allocated to
your program.

   All functions that have problems copying between overlapping arrays
are explicitly identified in this manual.  In addition to functions in
this section, there are a few others like `sprintf' (Note: Formatted
Output Functions) and `scanf' (Note: Formatted Input Functions).

 - Function: void * memcpy (void *restrict TO, const void *restrict
          FROM, size_t SIZE)
     The `memcpy' function copies SIZE bytes from the object beginning
     at FROM into the object beginning at TO.  The behavior of this
     function is undefined if the two arrays TO and FROM overlap; use
     `memmove' instead if overlapping is possible.

     The value returned by `memcpy' is the value of TO.

     Here is an example of how you might use `memcpy' to copy the
     contents of an array:

          struct foo *oldarray, *newarray;
          int arraysize;
          ...
          memcpy (new, old, arraysize * sizeof (struct foo));

 - Function: wchar_t * wmemcpy (wchar_t *restrict WTO, const wchar_t
          *restruct WFROM, size_t SIZE)
     The `wmemcpy' function copies SIZE wide characters from the object
     beginning at WFROM into the object beginning at WTO.  The behavior
     of this function is undefined if the two arrays WTO and WFROM
     overlap; use `wmemmove' instead if overlapping is possible.

     The following is a possible implementation of `wmemcpy' but there
     are more optimizations possible.

          wchar_t *
          wmemcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
                   size_t size)
          {
            return (wchar_t *) memcpy (wto, wfrom, size * sizeof (wchar_t));
          }

     The value returned by `wmemcpy' is the value of WTO.

     This function was introduced in Amendment 1 to ISO C90.

 - Function: void * mempcpy (void *restrict TO, const void *restrict
          FROM, size_t SIZE)
     The `mempcpy' function is nearly identical to the `memcpy'
     function.  It copies SIZE bytes from the object beginning at
     `from' into the object pointed to by TO.  But instead of returning
     the value of TO it returns a pointer to the byte following the
     last written byte in the object beginning at TO.  I.e., the value
     is `((void *) ((char *) TO + SIZE))'.

     This function is useful in situations where a number of objects
     shall be copied to consecutive memory positions.

          void *
          combine (void *o1, size_t s1, void *o2, size_t s2)
          {
            void *result = malloc (s1 + s2);
            if (result != NULL)
              mempcpy (mempcpy (result, o1, s1), o2, s2);
            return result;
          }

     This function is a GNU extension.

 - Function: wchar_t * wmempcpy (wchar_t *restrict WTO, const wchar_t
          *restrict WFROM, size_t SIZE)
     The `wmempcpy' function is nearly identical to the `wmemcpy'
     function.  It copies SIZE wide characters from the object
     beginning at `wfrom' into the object pointed to by WTO.  But
     instead of returning the value of WTO it returns a pointer to the
     wide character following the last written wide character in the
     object beginning at WTO.  I.e., the value is `WTO + SIZE'.

     This function is useful in situations where a number of objects
     shall be copied to consecutive memory positions.

     The following is a possible implementation of `wmemcpy' but there
     are more optimizations possible.

          wchar_t *
          wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
                    size_t size)
          {
            return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t));
          }

     This function is a GNU extension.

 - Function: void * memmove (void *TO, const void *FROM, size_t SIZE)
     `memmove' copies the SIZE bytes at FROM into the SIZE bytes at TO,
     even if those two blocks of space overlap.  In the case of
     overlap, `memmove' is careful to copy the original values of the
     bytes in the block at FROM, including those bytes which also
     belong to the block at TO.

     The value returned by `memmove' is the value of TO.

 - Function: wchar_t * wmemmove (wchar *WTO, const wchar_t *WFROM,
          size_t SIZE)
     `wmemmove' copies the SIZE wide characters at WFROM into the SIZE
     wide characters at WTO, even if those two blocks of space overlap.
     In the case of overlap, `memmove' is careful to copy the original
     values of the wide characters in the block at WFROM, including
     those wide characters which also belong to the block at WTO.

     The following is a possible implementation of `wmemcpy' but there
     are more optimizations possible.

          wchar_t *
          wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
                    size_t size)
          {
            return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t));
          }

     The value returned by `wmemmove' is the value of WTO.

     This function is a GNU extension.

 - Function: void * memccpy (void *restrict TO, const void *restrict
          FROM, int C, size_t SIZE)
     This function copies no more than SIZE bytes from FROM to TO,
     stopping if a byte matching C is found.  The return value is a
     pointer into TO one byte past where C was copied, or a null
     pointer if no byte matching C appeared in the first SIZE bytes of
     FROM.

 - Function: void * memset (void *BLOCK, int C, size_t SIZE)
     This function copies the value of C (converted to an `unsigned
     char') into each of the first SIZE bytes of the object beginning
     at BLOCK.  It returns the value of BLOCK.

 - Function: wchar_t * wmemset (wchar_t *BLOCK, wchar_t WC, size_t SIZE)
     This function copies the value of WC into each of the first SIZE
     wide characters of the object beginning at BLOCK.  It returns the
     value of BLOCK.

 - Function: char * strcpy (char *restrict TO, const char *restrict
          FROM)
     This copies characters from the string FROM (up to and including
     the terminating null character) into the string TO.  Like
     `memcpy', this function has undefined results if the strings
     overlap.  The return value is the value of TO.

 - Function: wchar_t * wcscpy (wchar_t *restrict WTO, const wchar_t
          *restrict WFROM)
     This copies wide characters from the string WFROM (up to and
     including the terminating null wide character) into the string
     WTO.  Like `wmemcpy', this function has undefined results if the
     strings overlap.  The return value is the value of WTO.

 - Function: char * strncpy (char *restrict TO, const char *restrict
          FROM, size_t SIZE)
     This function is similar to `strcpy' but always copies exactly
     SIZE characters into TO.

     If the length of FROM is more than SIZE, then `strncpy' copies
     just the first SIZE characters.  Note that in this case there is
     no null terminator written into TO.

     If the length of FROM is less than SIZE, then `strncpy' copies all
     of FROM, followed by enough null characters to add up to SIZE
     characters in all.  This behavior is rarely useful, but it is
     specified by the ISO C standard.

     The behavior of `strncpy' is undefined if the strings overlap.

     Using `strncpy' as opposed to `strcpy' is a way to avoid bugs
     relating to writing past the end of the allocated space for TO.
     However, it can also make your program much slower in one common
     case: copying a string which is probably small into a potentially
     large buffer.  In this case, SIZE may be large, and when it is,
     `strncpy' will waste a considerable amount of time copying null
     characters.

 - Function: wchar_t * wcsncpy (wchar_t *restrict WTO, const wchar_t
          *restrict WFROM, size_t SIZE)
     This function is similar to `wcscpy' but always copies exactly
     SIZE wide characters into WTO.

     If the length of WFROM is more than SIZE, then `wcsncpy' copies
     just the first SIZE wide characters.  Note that in this case there
     is no null terminator written into WTO.

     If the length of WFROM is less than SIZE, then `wcsncpy' copies
     all of WFROM, followed by enough null wide characters to add up to
     SIZE wide characters in all.  This behavior is rarely useful, but
     it is specified by the ISO C standard.

     The behavior of `wcsncpy' is undefined if the strings overlap.

     Using `wcsncpy' as opposed to `wcscpy' is a way to avoid bugs
     relating to writing past the end of the allocated space for WTO.
     However, it can also make your program much slower in one common
     case: copying a string which is probably small into a potentially
     large buffer.  In this case, SIZE may be large, and when it is,
     `wcsncpy' will waste a considerable amount of time copying null
     wide characters.

 - Function: char * strdup (const char *S)
     This function copies the null-terminated string S into a newly
     allocated string.  The string is allocated using `malloc'; see
     Note: Unconstrained Allocation.  If `malloc' cannot allocate
     space for the new string, `strdup' returns a null pointer.
     Otherwise it returns a pointer to the new string.

 - Function: wchar_t * wcsdup (const wchar_t *WS)
     This function copies the null-terminated wide character string WS
     into a newly allocated string.  The string is allocated using
     `malloc'; see Note: Unconstrained Allocation.  If `malloc'
     cannot allocate space for the new string, `wcsdup' returns a null
     pointer.  Otherwise it returns a pointer to the new wide character
     string.

     This function is a GNU extension.

 - Function: char * strndup (const char *S, size_t SIZE)
     This function is similar to `strdup' but always copies at most
     SIZE characters into the newly allocated string.

     If the length of S is more than SIZE, then `strndup' copies just
     the first SIZE characters and adds a closing null terminator.
     Otherwise all characters are copied and the string is terminated.

     This function is different to `strncpy' in that it always
     terminates the destination string.

     `strndup' is a GNU extension.

 - Function: char * stpcpy (char *restrict TO, const char *restrict
          FROM)
     This function is like `strcpy', except that it returns a pointer to
     the end of the string TO (that is, the address of the terminating
     null character `to + strlen (from)') rather than the beginning.

     For example, this program uses `stpcpy' to concatenate `foo' and
     `bar' to produce `foobar', which it then prints.

          #include <string.h>
          #include <stdio.h>
          
          int
          main (void)
          {
            char buffer[10];
            char *to = buffer;
            to = stpcpy (to, "foo");
            to = stpcpy (to, "bar");
            puts (buffer);
            return 0;
          }

     This function is not part of the ISO or POSIX standards, and is not
     customary on Unix systems, but we did not invent it either.
     Perhaps it comes from MS-DOG.

     Its behavior is undefined if the strings overlap.  The function is
     declared in `string.h'.

 - Function: wchar_t * wcpcpy (wchar_t *restrict WTO, const wchar_t
          *restrict WFROM)
     This function is like `wcscpy', except that it returns a pointer to
     the end of the string WTO (that is, the address of the terminating
     null character `wto + strlen (wfrom)') rather than the beginning.

     This function is not part of ISO or POSIX but was found useful
     while developing the GNU C Library itself.

     The behavior of `wcpcpy' is undefined if the strings overlap.

     `wcpcpy' is a GNU extension and is declared in `wchar.h'.

 - Function: char * stpncpy (char *restrict TO, const char *restrict
          FROM, size_t SIZE)
     This function is similar to `stpcpy' but copies always exactly
     SIZE characters into TO.

     If the length of FROM is more then SIZE, then `stpncpy' copies
     just the first SIZE characters and returns a pointer to the
     character directly following the one which was copied last.  Note
     that in this case there is no null terminator written into TO.

     If the length of FROM is less than SIZE, then `stpncpy' copies all
     of FROM, followed by enough null characters to add up to SIZE
     characters in all.  This behavior is rarely useful, but it is
     implemented to be useful in contexts where this behavior of the
     `strncpy' is used.  `stpncpy' returns a pointer to the _first_
     written null character.

     This function is not part of ISO or POSIX but was found useful
     while developing the GNU C Library itself.

     Its behavior is undefined if the strings overlap.  The function is
     declared in `string.h'.

 - Function: wchar_t * wcpncpy (wchar_t *restrict WTO, const wchar_t
          *restrict WFROM, size_t SIZE)
     This function is similar to `wcpcpy' but copies always exactly
     WSIZE characters into WTO.

     If the length of WFROM is more then SIZE, then `wcpncpy' copies
     just the first SIZE wide characters and returns a pointer to the
     wide character directly following the one which was copied last.
     Note that in this case there is no null terminator written into
     WTO.

     If the length of WFROM is less than SIZE, then `wcpncpy' copies
     all of WFROM, followed by enough null characters to add up to SIZE
     characters in all.  This behavior is rarely useful, but it is
     implemented to be useful in contexts where this behavior of the
     `wcsncpy' is used.  `wcpncpy' returns a pointer to the _first_
     written null character.

     This function is not part of ISO or POSIX but was found useful
     while developing the GNU C Library itself.

     Its behavior is undefined if the strings overlap.

     `wcpncpy' is a GNU extension and is declared in `wchar.h'.

 - Macro: char * strdupa (const char *S)
     This macro is similar to `strdup' but allocates the new string
     using `alloca' instead of `malloc' (Note: Variable Size
     Automatic).  This means of course the returned string has the
     same limitations as any block of memory allocated using `alloca'.

     For obvious reasons `strdupa' is implemented only as a macro; you
     cannot get the address of this function.  Despite this limitation
     it is a useful function.  The following code shows a situation
     where using `malloc' would be a lot more expensive.

          #include <paths.h>
          #include <string.h>
          #include <stdio.h>
          
          const char path[] = _PATH_STDPATH;
          
          int
          main (void)
          {
            char *wr_path = strdupa (path);
            char *cp = strtok (wr_path, ":");
          
            while (cp != NULL)
              {
                puts (cp);
                cp = strtok (NULL, ":");
              }
            return 0;
          }

     Please note that calling `strtok' using PATH directly is invalid.
     It is also not allowed to call `strdupa' in the argument list of
     `strtok' since `strdupa' uses `alloca' (Note: Variable Size
     Automatic) can interfere with the parameter passing.

     This function is only available if GNU CC is used.

 - Macro: char * strndupa (const char *S, size_t SIZE)
     This function is similar to `strndup' but like `strdupa' it
     allocates the new string using `alloca' Note: Variable Size
     Automatic.  The same advantages and limitations of `strdupa' are
     valid for `strndupa', too.

     This function is implemented only as a macro, just like `strdupa'.
     Just as `strdupa' this macro also must not be used inside the
     parameter list in a function call.

     `strndupa' is only available if GNU CC is used.

 - Function: char * strcat (char *restrict TO, const char *restrict
          FROM)
     The `strcat' function is similar to `strcpy', except that the
     characters from FROM are concatenated or appended to the end of
     TO, instead of overwriting it.  That is, the first character from
     FROM overwrites the null character marking the end of TO.

     An equivalent definition for `strcat' would be:

          char *
          strcat (char *restrict to, const char *restrict from)
          {
            strcpy (to + strlen (to), from);
            return to;
          }

     This function has undefined results if the strings overlap.

 - Function: wchar_t * wcscat (wchar_t *restrict WTO, const wchar_t
          *restrict WFROM)
     The `wcscat' function is similar to `wcscpy', except that the
     characters from WFROM are concatenated or appended to the end of
     WTO, instead of overwriting it.  That is, the first character from
     WFROM overwrites the null character marking the end of WTO.

     An equivalent definition for `wcscat' would be:

          wchar_t *
          wcscat (wchar_t *wto, const wchar_t *wfrom)
          {
            wcscpy (wto + wcslen (wto), wfrom);
            return wto;
          }

     This function has undefined results if the strings overlap.

   Programmers using the `strcat' or `wcscat' function (or the
following `strncat' or `wcsncar' functions for that matter) can easily
be recognized as lazy and reckless.  In almost all situations the
lengths of the participating strings are known (it better should be
since how can one otherwise ensure the allocated size of the buffer is
sufficient?)  Or at least, one could know them if one keeps track of the
results of the various function calls.  But then it is very inefficient
to use `strcat'/`wcscat'.  A lot of time is wasted finding the end of
the destination string so that the actual copying can start.  This is a
common example:

     /* This function concatenates arbitrarily many strings.  The last
        parameter must be `NULL'.  */
     char *
     concat (const char *str, ...)
     {
       va_list ap, ap2;
       size_t total = 1;
       const char *s;
       char *result;
     
       va_start (ap, str);
       /* Actually `va_copy', but this is the name more gcc versions
          understand.  */
       __va_copy (ap2, ap);
     
       /* Determine how much space we need.  */
       for (s = str; s != NULL; s = va_arg (ap, const char *))
         total += strlen (s);
     
       va_end (ap);
     
       result = (char *) malloc (total);
       if (result != NULL)
         {
           result[0] = '\0';
     
           /* Copy the strings.  */
           for (s = str; s != NULL; s = va_arg (ap2, const char *))
             strcat (result, s);
         }
     
       va_end (ap2);
     
       return result;
     }

   This looks quite simple, especially the second loop where the strings
are actually copied.  But these innocent lines hide a major performance
penalty.  Just imagine that ten strings of 100 bytes each have to be
concatenated.  For the second string we search the already stored 100
bytes for the end of the string so that we can append the next string.
For all strings in total the comparisons necessary to find the end of
the intermediate results sums up to 5500!  If we combine the copying
with the search for the allocation we can write this function more
efficient:

     char *
     concat (const char *str, ...)
     {
       va_list ap;
       size_t allocated = 100;
       char *result = (char *) malloc (allocated);
       char *wp;
     
       if (allocated != NULL)
         {
           char *newp;
     
           va_start (ap, atr);
     
           wp = result;
           for (s = str; s != NULL; s = va_arg (ap, const char *))
             {
               size_t len = strlen (s);
     
               /* Resize the allocated memory if necessary.  */
               if (wp + len + 1 > result + allocated)
                 {
                   allocated = (allocated + len) * 2;
                   newp = (char *) realloc (result, allocated);
                   if (newp == NULL)
                     {
                       free (result);
                       return NULL;
                     }
                   wp = newp + (wp - result);
                   result = newp;
                 }
     
               wp = mempcpy (wp, s, len);
             }
     
           /* Terminate the result string.  */
           *wp++ = '\0';
     
           /* Resize memory to the optimal size.  */
           newp = realloc (result, wp - result);
           if (newp != NULL)
             result = newp;
     
           va_end (ap);
         }
     
       return result;
     }

   With a bit more knowledge about the input strings one could fine-tune
the memory allocation.  The difference we are pointing to here is that
we don't use `strcat' anymore.  We always keep track of the length of
the current intermediate result so we can safe us the search for the
end of the string and use `mempcpy'.  Please note that we also don't
use `stpcpy' which might seem more natural since we handle with
strings.  But this is not necessary since we already know the length of
the string and therefore can use the faster memory copying function.
The example would work for wide characters the same way.

   Whenever a programmer feels the need to use `strcat' she or he
should think twice and look through the program whether the code cannot
be rewritten to take advantage of already calculated results.  Again: it
is almost always unnecessary to use `strcat'.

 - Function: char * strncat (char *restrict TO, const char *restrict
          FROM, size_t SIZE)
     This function is like `strcat' except that not more than SIZE
     characters from FROM are appended to the end of TO.  A single null
     character is also always appended to TO, so the total allocated
     size of TO must be at least `SIZE + 1' bytes longer than its
     initial length.

     The `strncat' function could be implemented like this:

          char *
          strncat (char *to, const char *from, size_t size)
          {
            to[strlen (to) + size] = '\0';
            strncpy (to + strlen (to), from, size);
            return to;
          }

     The behavior of `strncat' is undefined if the strings overlap.

 - Function: wchar_t * wcsncat (wchar_t *restrict WTO, const wchar_t
          *restrict WFROM, size_t SIZE)
     This function is like `wcscat' except that not more than SIZE
     characters from FROM are appended to the end of TO.  A single null
     character is also always appended to TO, so the total allocated
     size of TO must be at least `SIZE + 1' bytes longer than its
     initial length.

     The `wcsncat' function could be implemented like this:

          wchar_t *
          wcsncat (wchar_t *restrict wto, const wchar_t *restrict wfrom,
                   size_t size)
          {
            wto[wcslen (to) + size] = L'\0';
            wcsncpy (wto + wcslen (wto), wfrom, size);
            return wto;
          }

     The behavior of `wcsncat' is undefined if the strings overlap.

   Here is an example showing the use of `strncpy' and `strncat' (the
wide character version is equivalent).  Notice how, in the call to
`strncat', the SIZE parameter is computed to avoid overflowing the
character array `buffer'.

     #include <string.h>
     #include <stdio.h>
     
     #define SIZE 10
     
     static char buffer[SIZE];
     
     main ()
     {
       strncpy (buffer, "hello", SIZE);
       puts (buffer);
       strncat (buffer, ", world", SIZE - strlen (buffer) - 1);
       puts (buffer);
     }

The output produced by this program looks like:

     hello
     hello, wo

 - Function: void bcopy (const void *FROM, void *TO, size_t SIZE)
     This is a partially obsolete alternative for `memmove', derived
     from BSD.  Note that it is not quite equivalent to `memmove',
     because the arguments are not in the same order and there is no
     return value.

 - Function: void bzero (void *BLOCK, size_t SIZE)
     This is a partially obsolete alternative for `memset', derived from
     BSD.  Note that it is not as general as `memset', because the only
     value it can store is zero.


automatically generated by info2www version 1.2.2.9