Info Node: (emacs-lisp-intro.info)Words and Symbols
(emacs-lisp-intro.info)Words and Symbols
What to Count?
==============
When we first start thinking about how to count the words in a
function definition, the first question is (or ought to be) what are we
going to count? When we speak of `words' with respect to a Lisp
function definition, we are actually speaking, in large part, of
`symbols'. For example, the following `multiply-by-seven' function
contains the five symbols `defun', `multiply-by-seven', `number', `*',
and `7'. In addition, in the documentation string, it contains the
four words `Multiply', `NUMBER', `by', and `seven'. The symbol
`number' is repeated, so the definition contains a total of ten words
and symbols.
(defun multiply-by-seven (number)
"Multiply NUMBER by seven."
(* 7 number))
However, if we mark the `multiply-by-seven' definition with `C-M-h'
(`mark-defun'), and then call `count-words-region' on it, we will find
that `count-words-region' claims the definition has eleven words, not
ten! Something is wrong!
The problem is twofold: `count-words-region' does not count the `*'
as a word, and it counts the single symbol, `multiply-by-seven', as
containing three words. The hyphens are treated as if they were
interword spaces rather than intraword connectors: `multiply-by-seven'
is counted as if it were written `multiply by seven'.
The cause of this confusion is the regular expression search within
the `count-words-region' definition that moves point forward word by
word. In the canonical version of `count-words-region', the regexp is:
"\\w+\\W*"
This regular expression is a pattern defining one or more word
constituent characters possibly followed by one or more characters that
are not word constituents. What is meant by `word constituent
characters' brings us to the issue of syntax, which is worth a section
of its own.