Squeezing repeats and deleting
------------------------------
When given just the `--delete' (`-d') option, `tr' removes any input
characters that are in SET1.
When given just the `--squeeze-repeats' (`-s') option, `tr' replaces
each input sequence of a repeated character that is in SET1 with a
single occurrence of that character.
When given both `--delete' and `--squeeze-repeats', `tr' first
performs any deletions using SET1, then squeezes repeats from any
remaining characters using SET2.
The `--squeeze-repeats' option may also be used when translating, in
which case `tr' first performs translation, then squeezes repeats from
any remaining characters using SET2.
Here are some examples to illustrate various combinations of options:
* Remove all zero bytes:
tr -d '\000'
* Put all words on lines by themselves. This converts all
non-alphanumeric characters to newlines, then squeezes each string
of repeated newlines into a single newline:
tr -cs 'a-zA-Z0-9' '[\n*]'
* Convert each sequence of repeated newlines to a single newline:
tr -s '\n'
* Find doubled occurrences of words in a document. For example,
people often write "the the" with the duplicated words separated
by a newline. The bourne shell script below works first by
converting each sequence of punctuation and blank characters to a
single newline. That puts each "word" on a line by itself. Next
it maps all uppercase characters to lower case, and finally it
runs `uniq' with the `-d' option to print out only the words that
were adjacent duplicates.
#!/bin/sh
cat "$@" \
| tr -s '[:punct:][:blank:]' '\n' \
| tr '[:upper:]' '[:lower:]' \
| uniq -d