(elisp)Non-ASCII in Strings


Non-ASCII Characters in Strings
...............................

   You can include a non-ASCII international character in a string
constant by writing it literally.  There are two text representations
for non-ASCII characters in Emacs strings (and in buffers): unibyte and
multibyte.  If the string constant is read from a multibyte source,
such as a multibyte buffer or string, or a file that would be visited as
multibyte, then the character is read as a multibyte character, and that
makes the string multibyte.  If the string constant is read from a
unibyte source, then the character is read as unibyte and that makes the
string unibyte.

   You can also represent a multibyte non-ASCII character with its
character code: use a hex escape, `\xNNNNNNN', with as many digits as
necessary.  (Multibyte non-ASCII character codes are all greater than
256.)  Any character which is not a valid hex digit terminates this
construct.  If the next character in the string could be interpreted as
a hex digit, write `\ ' (backslash and space) to terminate the hex
escape--for example, `\x8e0\ ' represents one character, `a' with grave
accent.  `\ ' in a string constant is just like backslash-newline; it
does not contribute any character to the string, but it does terminate
the preceding hex escape.

   Using a multibyte hex escape forces the string to multibyte.  You can
represent a unibyte non-ASCII character with its character code, which
must be in the range from 128 (0200 octal) to 255 (0377 octal).  This
forces a unibyte string.

   Note: Text Representations, for more information about the two
text representations.

automatically generated by info2www version 1.2.2.9