GNU Info

Info Node: (elisp)Character Type

(elisp)Character Type


Next: Symbol Type Prev: Floating Point Type Up: Programming Types
Enter node , (file) or (file)node

Character Type
--------------

   A "character" in Emacs Lisp is nothing more than an integer.  In
other words, characters are represented by their character codes.  For
example, the character `A' is represented as the integer 65.

   Individual characters are not often used in programs.  It is far more
common to work with _strings_, which are sequences composed of
characters.  Note: String Type.

   Characters in strings, buffers, and files are currently limited to
the range of 0 to 524287--nineteen bits.  But not all values in that
range are valid character codes.  Codes 0 through 127 are ASCII codes;
the rest are non-ASCII (Note: Non-ASCII Characters).  Characters that
represent keyboard input have a much wider range, to encode modifier
keys such as Control, Meta and Shift.

   Since characters are really integers, the printed representation of a
character is a decimal number.  This is also a possible read syntax for
a character, but writing characters that way in Lisp programs is a very
bad idea.  You should _always_ use the special read syntax formats that
Emacs Lisp provides for characters.  These syntax formats start with a
question mark.

   The usual read syntax for alphanumeric characters is a question mark
followed by the character; thus, `?A' for the character `A', `?B' for
the character `B', and `?a' for the character `a'.

   For example:

     ?Q => 81     ?q => 113

   You can use the same syntax for punctuation characters, but it is
often a good idea to add a `\' so that the Emacs commands for editing
Lisp code don't get confused.  For example, `?\ ' is the way to write
the space character.  If the character is `\', you _must_ use a second
`\' to quote it: `?\\'.

   You can express the characters Control-g, backspace, tab, newline,
vertical tab, formfeed, return, del, and escape as `?\a', `?\b', `?\t',
`?\n', `?\v', `?\f', `?\r', `?\d', and `?\e', respectively.  Thus,

     ?\a => 7                 ; `C-g'
     ?\b => 8                 ; backspace, <BS>, `C-h'
     ?\t => 9                 ; tab, <TAB>, `C-i'
     ?\n => 10                ; newline, `C-j'
     ?\v => 11                ; vertical tab, `C-k'
     ?\f => 12                ; formfeed character, `C-l'
     ?\r => 13                ; carriage return, <RET>, `C-m'
     ?\e => 27                ; escape character, <ESC>, `C-['
     ?\\ => 92                ; backslash character, `\'
     ?\d => 127               ; delete character, <DEL>

   These sequences which start with backslash are also known as "escape
sequences", because backslash plays the role of an escape character;
this usage has nothing to do with the character <ESC>.

   Control characters may be represented using yet another read syntax.
This consists of a question mark followed by a backslash, caret, and the
corresponding non-control character, in either upper or lower case.  For
example, both `?\^I' and `?\^i' are valid read syntax for the character
`C-i', the character whose value is 9.

   Instead of the `^', you can use `C-'; thus, `?\C-i' is equivalent to
`?\^I' and to `?\^i':

     ?\^I => 9     ?\C-I => 9

   In strings and buffers, the only control characters allowed are those
that exist in ASCII; but for keyboard input purposes, you can turn any
character into a control character with `C-'.  The character codes for
these non-ASCII control characters include the 2**26 bit as well as the
code for the corresponding non-control character.  Ordinary terminals
have no way of generating non-ASCII control characters, but you can
generate them straightforwardly using X and other window systems.

   For historical reasons, Emacs treats the <DEL> character as the
control equivalent of `?':

     ?\^? => 127     ?\C-? => 127

As a result, it is currently not possible to represent the character
`Control-?', which is a meaningful input character under X, using
`\C-'.  It is not easy to change this, as various Lisp files refer to
<DEL> in this way.

   For representing control characters to be found in files or strings,
we recommend the `^' syntax; for control characters in keyboard input,
we prefer the `C-' syntax.  Which one you use does not affect the
meaning of the program, but may guide the understanding of people who
read it.

   A "meta character" is a character typed with the <META> modifier
key.  The integer that represents such a character has the 2**27 bit
set (which on most machines makes it a negative number).  We use high
bits for this and other modifiers to make possible a wide range of
basic character codes.

   In a string, the 2**7 bit attached to an ASCII character indicates a
meta character; thus, the meta characters that can fit in a string have
codes in the range from 128 to 255, and are the meta versions of the
ordinary ASCII characters.  (In Emacs versions 18 and older, this
convention was used for characters outside of strings as well.)

   The read syntax for meta characters uses `\M-'.  For example,
`?\M-A' stands for `M-A'.  You can use `\M-' together with octal
character codes (see below), with `\C-', or with any other syntax for a
character.  Thus, you can write `M-A' as `?\M-A', or as `?\M-\101'.
Likewise, you can write `C-M-b' as `?\M-\C-b', `?\C-\M-b', or
`?\M-\002'.

   The case of a graphic character is indicated by its character code;
for example, ASCII distinguishes between the characters `a' and `A'.
But ASCII has no way to represent whether a control character is upper
case or lower case.  Emacs uses the 2**25 bit to indicate that the
shift key was used in typing a control character.  This distinction is
possible only when you use X terminals or other special terminals;
ordinary terminals do not report the distinction to the computer in any
way.  The Lisp syntax for the shift bit is `\S-'; thus, `?\C-\S-o' or
`?\C-\S-O' represents the shifted-control-o character.

   The X Window System defines three other modifier bits that can be set
in a character: "hyper", "super" and "alt".  The syntaxes for these
bits are `\H-', `\s-' and `\A-'.  (Case is significant in these
prefixes.)  Thus, `?\H-\M-\A-x' represents `Alt-Hyper-Meta-x'.
Numerically, the bit values are 2**22 for alt, 2**23 for super and
2**24 for hyper.

   Finally, the most general read syntax for a character represents the
character code in either octal or hex.  To use octal, write a question
mark followed by a backslash and the octal character code (up to three
octal digits); thus, `?\101' for the character `A', `?\001' for the
character `C-a', and `?\002' for the character `C-b'.  Although this
syntax can represent any ASCII character, it is preferred only when the
precise octal value is more important than the ASCII representation.

     ?\012 => 10         ?\n => 10         ?\C-j => 10
     ?\101 => 65         ?A => 65

   To use hex, write a question mark followed by a backslash, `x', and
the hexadecimal character code.  You can use any number of hex digits,
so you can represent any character code in this way.  Thus, `?\x41' for
the character `A', `?\x1' for the character `C-a', and `?\x8e0' for the
Latin-1 character `a' with grave accent.

   A backslash is allowed, and harmless, preceding any character without
a special escape meaning; thus, `?\+' is equivalent to `?+'.  There is
no reason to add a backslash before most characters.  However, you
should add a backslash before any of the characters `()\|;'`"#.,' to
avoid confusing the Emacs commands for editing Lisp code.  Also add a
backslash before whitespace characters such as space, tab, newline and
formfeed.  However, it is cleaner to use one of the easily readable
escape sequences, such as `\t', instead of an actual whitespace
character such as a tab.


automatically generated by info2www version 1.2.2.9