GNU Info

Info Node: (textutils.info)Character sets

(textutils.info)Character sets


Next: Translating Up: tr invocation
Enter node , (file) or (file)node

Specifying sets of characters
-----------------------------

   The format of the SET1 and SET2 arguments resembles the format of
regular expressions; however, they are not regular expressions, only
lists of characters.  Most characters simply represent themselves in
these strings, but the strings can contain the shorthands listed below,
for convenience.  Some of them can be used only in SET1 or SET2, as
noted below.

Backslash escapes
     A backslash followed by a character not listed below causes an
     error message.

    `\a'
          Control-G.

    `\b'
          Control-H.

    `\f'
          Control-L.

    `\n'
          Control-J.

    `\r'
          Control-M.

    `\t'
          Control-I.

    `\v'
          Control-K.

    `\OOO'
          The character with the value given by OOO, which is 1 to 3
          octal digits,

    `\\'
          A backslash.

Ranges
     The notation `M-N' expands to all of the characters from M through
     N, in ascending order.  M should collate before N; if it doesn't,
     an error results.  As an example, `0-9' is the same as
     `0123456789'.  Although GNU `tr' does not support the System V
     syntax that uses square brackets to enclose ranges, translations
     specified in that format will still work as long as the brackets
     in STRING1 correspond to identical brackets in STRING2.

Repeated characters
     The notation `[C*N]' in SET2 expands to N copies of character C.
     Thus, `[y*6]' is the same as `yyyyyy'.  The notation `[C*]' in
     STRING2 expands to as many copies of C as are needed to make SET2
     as long as SET1.  If N begins with `0', it is interpreted in
     octal, otherwise in decimal.

Character classes
     The notation `[:CLASS:]' expands to all of the characters in the
     (predefined) class CLASS.  The characters expand in no particular
     order, except for the `upper' and `lower' classes, which expand in
     ascending order.  When the `--delete' (`-d') and
     `--squeeze-repeats' (`-s') options are both given, any character
     class can be used in SET2.  Otherwise, only the character classes
     `lower' and `upper' are accepted in SET2, and then only if the
     corresponding character class (`upper' and `lower', respectively)
     is specified in the same relative position in SET1.  Doing this
     specifies case conversion.  The class names are given below; an
     error results when an invalid class name is given.

    `alnum'
          Letters and digits.

    `alpha'
          Letters.

    `blank'
          Horizontal whitespace.

    `cntrl'
          Control characters.

    `digit'
          Digits.

    `graph'
          Printable characters, not including space.

    `lower'
          Lowercase letters.

    `print'
          Printable characters, including space.

    `punct'
          Punctuation characters.

    `space'
          Horizontal or vertical whitespace.

    `upper'
          Uppercase letters.

    `xdigit'
          Hexadecimal digits.

Equivalence classes
     The syntax `[=C=]' expands to all of the characters that are
     equivalent to C, in no particular order.  Equivalence classes are
     a relatively recent invention intended to support non-English
     alphabets.  But there seems to be no standard way to define them
     or determine their contents.  Therefore, they are not fully
     implemented in GNU `tr'; each character's equivalence class
     consists only of that character, which is of no particular use.


automatically generated by info2www version 1.2.2.9