GNU Info

Info Node: (emacs)Coding Systems

(emacs)Coding Systems


Next: Recognize Coding Prev: Multibyte Conversion Up: International
Enter node , (file) or (file)node

Coding Systems
==============

   Users of various languages have established many more-or-less
standard coding systems for representing them.  Emacs does not use
these coding systems internally; instead, it converts from various
coding systems to its own system when reading data, and converts the
internal coding system to other coding systems when writing data.
Conversion is possible in reading or writing files, in sending or
receiving from the terminal, and in exchanging data with subprocesses.

   Emacs assigns a name to each coding system.  Most coding systems are
used for one language, and the name of the coding system starts with the
language name.  Some coding systems are used for several languages;
their names usually start with `iso'.  There are also special coding
systems `no-conversion', `raw-text' and `emacs-mule' which do not
convert printing characters at all.

   A special class of coding systems, collectively known as
"codepages", is designed to support text encoded by MS-Windows and
MS-DOS software.  To use any of these systems, you need to create it
with `M-x codepage-setup'.  Note: MS-DOS and MULE.  After creating
the coding system for the codepage, you can use it as any other coding
system.  For example, to visit a file encoded in codepage 850, type
`C-x <RET> c cp850 <RET> C-x C-f FILENAME <RET>'.

   In addition to converting various representations of non-ASCII
characters, a coding system can perform end-of-line conversion.  Emacs
handles three different conventions for how to separate lines in a file:
newline, carriage-return linefeed, and just carriage-return.

`C-h C CODING <RET>'
     Describe coding system CODING.

`C-h C <RET>'
     Describe the coding systems currently in use.

`M-x list-coding-systems'
     Display a list of all the supported coding systems.

   The command `C-h C' (`describe-coding-system') displays information
about particular coding systems.  You can specify a coding system name
as the argument; alternatively, with an empty argument, it describes
the coding systems currently selected for various purposes, both in the
current buffer and as the defaults, and the priority list for
recognizing coding systems (Note: Recognize Coding).

   To display a list of all the supported coding systems, type `M-x
list-coding-systems'.  The list gives information about each coding
system, including the letter that stands for it in the mode line (Note:
Mode Line).

   Each of the coding systems that appear in this list--except for
`no-conversion', which means no conversion of any kind--specifies how
and whether to convert printing characters, but leaves the choice of
end-of-line conversion to be decided based on the contents of each file.
For example, if the file appears to use the sequence carriage-return
linefeed to separate lines, DOS end-of-line conversion will be used.

   Each of the listed coding systems has three variants which specify
exactly what to do for end-of-line conversion:

`...-unix'
     Don't do any end-of-line conversion; assume the file uses newline
     to separate lines.  (This is the convention normally used on Unix
     and GNU systems.)

`...-dos'
     Assume the file uses carriage-return linefeed to separate lines,
     and do the appropriate conversion.  (This is the convention
     normally used on Microsoft systems.(1))

`...-mac'
     Assume the file uses carriage-return to separate lines, and do the
     appropriate conversion.  (This is the convention normally used on
     the Macintosh system.)

   These variant coding systems are omitted from the
`list-coding-systems' display for brevity, since they are entirely
predictable.  For example, the coding system `iso-latin-1' has variants
`iso-latin-1-unix', `iso-latin-1-dos' and `iso-latin-1-mac'.

   The coding system `raw-text' is good for a file which is mainly
ASCII text, but may contain byte values above 127 which are not meant to
encode non-ASCII characters.  With `raw-text', Emacs copies those byte
values unchanged, and sets `enable-multibyte-characters' to `nil' in
the current buffer so that they will be interpreted properly.
`raw-text' handles end-of-line conversion in the usual way, based on
the data encountered, and has the usual three variants to specify the
kind of end-of-line conversion to use.

   In contrast, the coding system `no-conversion' specifies no
character code conversion at all--none for non-ASCII byte values and
none for end of line.  This is useful for reading or writing binary
files, tar files, and other files that must be examined verbatim.  It,
too, sets `enable-multibyte-characters' to `nil'.

   The easiest way to edit a file with no conversion of any kind is with
the `M-x find-file-literally' command.  This uses `no-conversion', and
also suppresses other Emacs features that might convert the file
contents before you see them.  Note: Visiting.

   The coding system `emacs-mule' means that the file contains
non-ASCII characters stored with the internal Emacs encoding.  It
handles end-of-line conversion based on the data encountered, and has
the usual three variants to specify the kind of end-of-line conversion.

   ---------- Footnotes ----------

   (1) It is also specified for MIME `text/*' bodies and in other
network transport contexts.  It is different from the SGML reference
syntax record-start/record-end format which Emacs doesn't support
directly.


automatically generated by info2www version 1.2.2.9