GNU Info

Info Node: (gnus)Charsets

(gnus)Charsets


Next: Article Commands Prev: MIME Commands Up: The Summary Buffer
Enter node , (file) or (file)node

Charsets
========

   People use different charsets, and we have MIME to let us know what
charsets they use.  Or rather, we wish we had.  Many people use
newsreaders and mailers that do not understand or use MIME, and just
send out messages without saying what character sets they use.  To help
a bit with this, some local news hierarchies have policies that say
what character set is the default.  For instance, the `fj' hierarchy
uses `iso-2022-jp-2'.

   This knowledge is encoded in the `gnus-group-charset-alist'
variable, which is an alist of regexps (to match group names) and
default charsets to be used when reading these groups.

   In addition, some people do use soi-disant MIME-aware agents that
aren't.  These blithely mark messages as being in `iso-8859-1' even if
they really are in `koi-8'.  To help here, the
`gnus-newsgroup-ignored-charsets' variable can be used.  The charsets
that are listed here will be ignored.  The variable can be set on a
group-by-group basis using the group parameters (Note: Group
Parameters).  The default value is `(unknown-8bit)', which is
something some agents insist on having in there.

   When posting, `gnus-group-posting-charset-alist' is used to
determine which charsets should not be encoded using the MIME
encodings.  For instance, some hierarchies discourage using
quoted-printable header encoding.

   This variable is an alist of regexps and permitted unencoded charsets
for posting.  Each element of the alist has the form `('TEST HEADER
BODY-LIST`)', where:

TEST
     is either a regular expression matching the newsgroup header or a
     variable to query,

HEADER
     is the charset which may be left unencoded in the header (`nil'
     means encode all charsets),

BODY-LIST
     is a list of charsets which may be encoded using 8bit
     content-transfer encoding in the body, or one of the special
     values `nil' (always encode using quoted-printable) or `t' (always
     use 8bit).

   Other charset tricks that may be useful, although not Gnus-specific:

   If there are several MIME charsets that encode the same Emacs
charset, you can choose what charset to use by saying the following:

     (put-charset-property 'cyrillic-iso8859-5
                           'preferred-coding-system 'koi8-r)

   This means that Russian will be encoded using `koi8-r' instead of
the default `iso-8859-5' MIME charset.

   If you want to read messages in `koi8-u', you can cheat and say

     (define-coding-system-alias 'koi8-u 'koi8-r)

   This will almost do the right thing.

   And finally, to read charsets like `windows-1251', you can say
something like

     (codepage-setup 1251)
     (define-coding-system-alias 'windows-1251 'cp1251)

   while if you use a non-Latin-1 language environment you could see the
Latin-1 subset of `windows-1252' using:

     (define-coding-system-alias 'windows-1252 'latin-1)


automatically generated by info2www version 1.2.2.9