GNU Info

Info Node: (web2c.info)TCX files

(web2c.info)TCX files


Prev: patgen invocation Up: Languages and hyphenation
Enter node , (file) or (file)node

TCX files: Character translations
---------------------------------

  TCX (TeX character translation) files help TeX support direct input
of 8-bit international characters if fonts containing those characters
are being used.  Specifically, they map an input (keyboard) character
code to the internal TeX character code (a superset of ASCII).

  Of the various proposals for handling more than one input encoding,
TCX files were chosen because they follow Knuth's original ideas for
the use of the `xhcr' and `xord' tables.  He ventured that these would
be changed in the WEB source in order to adjust the actual version to a
given environment.  It turned out, however, that recompiling the WEB
sources is not as simple task as Knuth predicted; therefore, TCX files,
providing the possibility of changing of the conversion tables on
on-the-fly, has been implemented instead.

  This approach limits the portability of TeX documents, as some
implementations do not support it (or use a different method for
input-internal reencoding).  It may also be problematic to determine the
encoding to use for a TeX document of unknown provenance; in the worst
case, failure to do so correctly may result in subtle errors in the
typeset output.

  While TCX files can be used with any format, using them breaks the
LaTeX `inputenc' package.  This is why you should either use TCXFILE or
`inputenc' in LaTeX files, but never both.

  This is entirely independent of the MLTeX extension (Note: MLTeX):
whereas a TCX file defines how an input keyboard character is mapped to
TeX's internal code, MLTeX defines substitutions for a non-existing
character glyph in a font with a `\accent' construction made out of two
separate character glyphs.  TCX files involve no new primitives; it is
not possible to specify that an input (keyboard) character maps to more
than one character.

  Specifying TCX files:
   * You can specify a TCX file to be used for a particular TeX run by
     specifying the command-line option `-translate-file=TCXFILE' or
     (preferably) specifying it explicitly in the first line of the
     main document `%& -translate-file=TCXFILE'.

   * TCX files are searched for along the `WEB2C' path.

   * `INITEX' ignores TCX files.

  The Web2c distribution comes with at least two TCX files,
`il1-t1.tcx' and `il2-t1.tcx'.  These support ISO Latin 1 and ISO Latin
2, respectively, with Cork-encoded fonts (a.k.a. the T1 encoding).  TCX
files for Czech, Polish, and Slovak are also provided.

  Syntax of TCX files:
  1. Line-oriented. Blank lines are ignored.

  2. Whitespace is ignored except as a separator.

  3. Comments start with `%' and continue to the end of the line.

  4. Otherwise, a line consists of one or two character codes:
          SRC [DEST]

  5. Each character code may be specified in octal with a leading `0',
     hexadecimal with a leading `0x', or decimal otherwise. Values must
     be between 0 and 255, inclusive (decimal).

  6. If the DEST code is not specified, it is taken to be the same as
     SRC.

  7. If the same SRC code is specified more than once, it is the last
     definition that counts.

  Finally, here's what happens: when TeX sees an input character with
code SRC, it 1) changes SRC to DEST; and 2) makes code the DEST
"printable", i.e., printed as-is in diagnostics and the log file
instead of in `^^' notation.

  By default, no characters are translated, and character codes between
32 and 126 inclusive (decimal) are printable.  It is not possible to
make these (or any) characters unprintable.

  Specifying translations for the printable ASCII characters (codes
32-127) will yield unpredictable results.  Additionally you shouldn't
make the following characters printable: `^^I' (TAB), `^^J' (line
feed), `^^M' (carriage return), and `^^?' (delete), since TeX uses them
in various ways.

  Thus, the idea is to specify the input (keyboard) character code for
SRC, and the output (font) character code for DEST.


automatically generated by info2www version 1.2.2.9