GNU Info

Info Node: (m4.info)Changeword

(m4.info)Changeword


Next: M4wrap Prev: Changecom Up: Input Control
Enter node , (file) or (file)node

Changing the lexical structure of words
=======================================

     The macro `changeword' and all associated functionnality is
     experimental.  It is only available if the `--enable-changeword'
     option was given to `configure', at GNU `m4' installation time.
     The functionnality might change or even go away in the future.
     _Do not rely on it_.  Please direct your comments about it the
     same way you would do for bugs.

   A file being processed by `m4' is split into quoted strings, words
(potential macro names) and simple tokens (any other single character).
Initially a word is defined by the following regular expression:

     [_a-zA-Z][_a-zA-Z0-9]*

   Using `changeword', you can change this regular expression.  Relaxing
`m4''s lexical rules might be useful (for example) if you wanted to
apply translations to a file of numbers:

     changeword(`[_a-zA-Z0-9]+')
     define(1, 0)
     =>1

   Tightening the lexical rules is less useful, because it will
generally make some of the builtins unavailable.  You could use it to
prevent accidental call of builtins, for example:

     define(`_indir', defn(`indir'))
     changeword(`_[_a-zA-Z0-9]*')
     esyscmd(foo)
     _indir(`esyscmd', `ls')

   Because `m4' constructs its words a character at a time, there is a
restriction on the regular expressions that may be passed to
`changeword'.  This is that if your regular expression accepts `foo',
it must also accept `f' and `fo'.

   `changeword' has another function.  If the regular expression
supplied contains any bracketed subexpressions, then text outside the
first of these is discarded before symbol lookup.  So:

     changecom(`/*', `*/')
     changeword(`#\([_a-zA-Z0-9]*\)')
     #esyscmd(ls)

   `m4' now requires a `#' mark at the beginning of every macro
invocation, so one can use `m4' to preprocess shell scripts without
getting `shift' commands swallowed, and plain text without losing
various common words.

   `m4''s macro substitution is based on text, while TeX's is based on
tokens.  `changeword' can throw this difference into relief.  For
example, here is the same idea represented in TeX and `m4'.  First, the
TeX version:

     \def\a{\message{Hello}}
     \catcode`\@=0
     \catcode`\\=12
     =>@a
     =>@bye

Then, the `m4' version:

     define(a, `errprint(`Hello')')
     changeword(`@\([_a-zA-Z0-9]*\)')
     =>@a

   In the TeX example, the first line defines a macro `a' to print the
message `Hello'.  The second line defines <@> to be usable instead of
<\> as an escape character.  The third line defines <\> to be a normal
printing character, not an escape.  The fourth line invokes the macro
`a'.  So, when TeX is run on this file, it displays the message `Hello'.

   When the `m4' example is passed through `m4', it outputs
`errprint(Hello)'.  The reason for this is that TeX does lexical
analysis of macro definition when the macro is _defined_.  `m4' just
stores the text, postponing the lexical analysis until the macro is
_used_.

   You should note that using `changeword' will slow `m4' down by a
factor of about seven.


automatically generated by info2www version 1.2.2.9