GNU Info

Info Node: (elisp)Coding System Basics

(elisp)Coding System Basics


Next: Encoding and I/O Up: Coding Systems
Enter node , (file) or (file)node

Basic Concepts of Coding Systems
--------------------------------

   "Character code conversion" involves conversion between the encoding
used inside Emacs and some other encoding.  Emacs supports many
different encodings, in that it can convert to and from them.  For
example, it can convert text to or from encodings such as Latin 1, Latin
2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022.  In some
cases, Emacs supports several alternative encodings for the same
characters; for example, there are three coding systems for the Cyrillic
(Russian) alphabet: ISO, Alternativnyj, and KOI8.

   Most coding systems specify a particular character code for
conversion, but some of them leave the choice unspecified--to be chosen
heuristically for each file, based on the data.

   "End of line conversion" handles three different conventions used on
various systems for representing end of line in files.  The Unix
convention is to use the linefeed character (also called newline).  The
DOS convention is to use a carriage-return and a linefeed at the end of
a line.  The Mac convention is to use just carriage-return.

   "Base coding systems" such as `latin-1' leave the end-of-line
conversion unspecified, to be chosen based on the data.  "Variant
coding systems" such as `latin-1-unix', `latin-1-dos' and `latin-1-mac'
specify the end-of-line conversion explicitly as well.  Most base
coding systems have three corresponding variants whose names are formed
by adding `-unix', `-dos' and `-mac'.

   The coding system `raw-text' is special in that it prevents
character code conversion, and causes the buffer visited with that
coding system to be a unibyte buffer.  It does not specify the
end-of-line conversion, allowing that to be determined as usual by the
data, and has the usual three variants which specify the end-of-line
conversion.  `no-conversion' is equivalent to `raw-text-unix': it
specifies no conversion of either character codes or end-of-line.

   The coding system `emacs-mule' specifies that the data is
represented in the internal Emacs encoding.  This is like `raw-text' in
that no code conversion happens, but different in that the result is
multibyte data.

 - Function: coding-system-get coding-system property
     This function returns the specified property of the coding system
     CODING-SYSTEM.  Most coding system properties exist for internal
     purposes, but one that you might find useful is `mime-charset'.
     That property's value is the name used in MIME for the character
     coding which this coding system can read and write.  Examples:

          (coding-system-get 'iso-latin-1 'mime-charset)
               => iso-8859-1
          (coding-system-get 'iso-2022-cn 'mime-charset)
               => iso-2022-cn
          (coding-system-get 'cyrillic-koi8 'mime-charset)
               => koi8-r

     The value of the `mime-charset' property is also defined as an
     alias for the coding system.


automatically generated by info2www version 1.2.2.9