GNU Info

Info Node: (emacs-lisp-intro.info)Design count-words-region

(emacs-lisp-intro.info)Design count-words-region


Next: Whitespace Bug Prev: count-words-region Up: count-words-region
Enter node , (file) or (file)node

Designing `count-words-region'
------------------------------

   First, we will implement the word count command with a `while' loop,
then with recursion.  The command will, of course, be interactive.

   The template for an interactive function definition is, as always:

     (defun NAME-OF-FUNCTION (ARGUMENT-LIST)
       "DOCUMENTATION..."
       (INTERACTIVE-EXPRESSION...)
       BODY...)

   What we need to do is fill in the slots.

   The name of the function should be self-explanatory and similar to
the existing `count-lines-region' name.  This makes the name easier to
remember.  `count-words-region' is a good choice.

   The function counts words within a region.  This means that the
argument list must contain symbols that are bound to the two positions,
the beginning and end of the region.  These two positions can be called
`beginning' and `end' respectively.  The first line of the
documentation should be a single sentence, since that is all that is
printed as documentation by a command such as `apropos'.  The
interactive expression will be of the form `(interactive "r")', since
that will cause Emacs to pass the beginning and end of the region to
the function's argument list.  All this is routine.

   The body of the function needs to be written to do three tasks:
first, to set up conditions under which the `while' loop can count
words, second, to run the `while' loop, and third, to send a message to
the user.

   When a user calls `count-words-region', point may be at the
beginning or the end of the region.  However, the counting process must
start at the beginning of the region.  This means we will want to put
point there if it is not already there.  Executing `(goto-char
beginning)' ensures this.  Of course, we will want to return point to
its expected position when the function finishes its work.  For this
reason, the body must be enclosed in a `save-excursion' expression.

   The central part of the body of the function consists of a `while'
loop in which one expression jumps point forward word by word, and
another expression counts those jumps.  The true-or-false-test of the
`while' loop should test true so long as point should jump forward, and
false when point is at the end of the region.

   We could use `(forward-word 1)' as the expression for moving point
forward word by word, but it is easier to see what Emacs identifies as a
`word' if we use a regular expression search.

   A regular expression search that finds the pattern for which it is
searching leaves point after the last character matched.  This means
that a succession of successful word searches will move point forward
word by word.

   As a practical matter, we want the regular expression search to jump
over whitespace and punctuation between words as well as over the words
themselves.  A regexp that refuses to jump over interword whitespace
would never jump more than one word!  This means that the regexp should
include the whitespace and punctuation that follows a word, if any, as
well as the word itself.  (A word may end a buffer and not have any
following whitespace or punctuation, so that part of the regexp must be
optional.)

   Thus, what we want for the regexp is a pattern defining one or more
word constituent characters followed, optionally, by one or more
characters that are not word constituents.  The regular expression for
this is:

     \w+\W*

The buffer's syntax table determines which characters are and are not
word constituents.  (Note: What Constitutes a Word or Symbol?,
for more about syntax.  Also, see Note: Syntax, and
Note: Syntax Tables.)

   The search expression looks like this:

     (re-search-forward "\\w+\\W*")

(Note that paired backslashes precede the `w' and `W'.  A single
backslash has special meaning to the Emacs Lisp interpreter.  It
indicates that the following character is interpreted differently than
usual.  For example, the two characters, `\n', stand for `newline',
rather than for a backslash followed by `n'.  Two backslashes in a row
stand for an ordinary, `unspecial' backslash.)

   We need a counter to count how many words there are; this variable
must first be set to 0 and then incremented each time Emacs goes around
the `while' loop.  The incrementing expression is simply:

     (setq count (1+ count))

   Finally, we want to tell the user how many words there are in the
region.  The `message' function is intended for presenting this kind of
information to the user.  The message has to be phrased so that it
reads properly regardless of how many words there are in the region: we
don't want to say that "there are 1 words in the region".  The conflict
between singular and plural is ungrammatical.  We can solve this
problem by using a conditional expression that evaluates different
messages depending on the number of words in the region.  There are
three possibilities: no words in the region, one word in the region,
and more than one word.  This means that the `cond' special form is
appropriate.

   All this leads to the following function definition:

     ;;; First version; has bugs!
     (defun count-words-region (beginning end)
       "Print number of words in the region.
     Words are defined as at least one word-constituent
     character followed by at least one character that
     is not a word-constituent.  The buffer's syntax
     table determines which characters these are."
       (interactive "r")
       (message "Counting words in region ... ")
     
     ;;; 1. Set up appropriate conditions.
       (save-excursion
         (goto-char beginning)
         (let ((count 0))
     
     ;;; 2. Run the while loop.
           (while (< (point) end)
             (re-search-forward "\\w+\\W*")
             (setq count (1+ count)))
     
     ;;; 3. Send a message to the user.
           (cond ((zerop count)
                  (message
                   "The region does NOT have any words."))
                 ((= 1 count)
                  (message
                   "The region has 1 word."))
                 (t
                  (message
                   "The region has %d words." count))))))

As written, the function works, but not in all circumstances.


automatically generated by info2www version 1.2.2.9