Changing the lexical structure of words
=======================================
The macro `changeword' and all associated functionnality is
experimental. It is only available if the `--enable-changeword'
option was given to `configure', at GNU `m4' installation time.
The functionnality might change or even go away in the future.
_Do not rely on it_. Please direct your comments about it the
same way you would do for bugs.
A file being processed by `m4' is split into quoted strings, words
(potential macro names) and simple tokens (any other single character).
Initially a word is defined by the following regular expression:
[_a-zA-Z][_a-zA-Z0-9]*
Using `changeword', you can change this regular expression. Relaxing
`m4''s lexical rules might be useful (for example) if you wanted to
apply translations to a file of numbers:
changeword(`[_a-zA-Z0-9]+')
define(1, 0)
=>1
Tightening the lexical rules is less useful, because it will
generally make some of the builtins unavailable. You could use it to
prevent accidental call of builtins, for example:
define(`_indir', defn(`indir'))
changeword(`_[_a-zA-Z0-9]*')
esyscmd(foo)
_indir(`esyscmd', `ls')
Because `m4' constructs its words a character at a time, there is a
restriction on the regular expressions that may be passed to
`changeword'. This is that if your regular expression accepts `foo',
it must also accept `f' and `fo'.
`changeword' has another function. If the regular expression
supplied contains any bracketed subexpressions, then text outside the
first of these is discarded before symbol lookup. So:
changecom(`/*', `*/')
changeword(`#\([_a-zA-Z0-9]*\)')
#esyscmd(ls)
`m4' now requires a `#' mark at the beginning of every macro
invocation, so one can use `m4' to preprocess shell scripts without
getting `shift' commands swallowed, and plain text without losing
various common words.
`m4''s macro substitution is based on text, while TeX's is based on
tokens. `changeword' can throw this difference into relief. For
example, here is the same idea represented in TeX and `m4'. First, the
TeX version:
\def\a{\message{Hello}}
\catcode`\@=0
\catcode`\\=12
=>@a
=>@bye
Then, the `m4' version:
define(a, `errprint(`Hello')')
changeword(`@\([_a-zA-Z0-9]*\)')
=>@a
In the TeX example, the first line defines a macro `a' to print the
message `Hello'. The second line defines <@> to be usable instead of
<\> as an escape character. The third line defines <\> to be a normal
printing character, not an escape. The fourth line invokes the macro
`a'. So, when TeX is run on this file, it displays the message `Hello'.
When the `m4' example is passed through `m4', it outputs
`errprint(Hello)'. The reason for this is that TeX does lexical
analysis of macro definition when the macro is _defined_. `m4' just
stores the text, postponing the lexical analysis until the macro is
_used_.
You should note that using `changeword' will slow `m4' down by a
factor of about seven.