GNU Info

Info Node: (elisp)Chars and Bytes

(elisp)Chars and Bytes


Next: Splitting Characters Prev: Character Sets Up: Non-ASCII Characters
Enter node , (file) or (file)node

Characters and Bytes
====================

   In multibyte representation, each character occupies one or more
bytes.  Each character set has an "introduction sequence", which is
normally one or two bytes long.  (Exception: the ASCII character set
and the EIGHT-BIT-GRAPHIC character set have a zero-length introduction
sequence.)  The introduction sequence is the beginning of the byte
sequence for any character in the character set.  The rest of the
character's bytes distinguish it from the other characters in the same
character set.  Depending on the character set, there are either one or
two distinguishing bytes; the number of such bytes is called the
"dimension" of the character set.

 - Function: charset-dimension charset
     This function returns the dimension of CHARSET; at present, the
     dimension is always 1 or 2.

 - Function: charset-bytes charset
     This function returns the number of bytes used to represent a
     character in character set CHARSET.

   This is the simplest way to determine the byte length of a character
set's introduction sequence:

     (- (charset-bytes CHARSET)
        (charset-dimension CHARSET))


automatically generated by info2www version 1.2.2.9