GNU Info

Info Node: (emacs-lisp-intro.info)Whitespace Bug

(emacs-lisp-intro.info)Whitespace Bug


Prev: Design count-words-region Up: count-words-region
Enter node , (file) or (file)node

The Whitespace Bug in `count-words-region'
------------------------------------------

   The `count-words-region' command described in the preceding section
has two bugs, or rather, one bug with two manifestations.  First, if
you mark a region containing only whitespace in the middle of some
text, the `count-words-region' command tells you that the region
contains one word!  Second, if you mark a region containing only
whitespace at the end of the buffer or the accessible portion of a
narrowed buffer, the command displays an error message that looks like
this:

     Search failed: "\\w+\\W*"

   If you are reading this in Info in GNU Emacs, you can test for these
bugs yourself.

   First, evaluate the function in the usual manner to install it.
Here is a copy of the definition.  Place your cursor after the closing
parenthesis and type `C-x C-e' to install it.

     ;; First version; has bugs!
     (defun count-words-region (beginning end)
       "Print number of words in the region.
     Words are defined as at least one word-constituent character followed
     by at least one character that is not a word-constituent.  The buffer's
     syntax table determines which characters these are."
       (interactive "r")
       (message "Counting words in region ... ")
     
     ;;; 1. Set up appropriate conditions.
       (save-excursion
         (goto-char beginning)
         (let ((count 0))
     
     ;;; 2. Run the while loop.
           (while (< (point) end)
             (re-search-forward "\\w+\\W*")
             (setq count (1+ count)))
     
     ;;; 3. Send a message to the user.
           (cond ((zerop count)
                  (message "The region does NOT have any words."))
                 ((= 1 count) (message "The region has 1 word."))
                 (t (message "The region has %d words." count))))))

   If you wish, you can also install this keybinding by evaluating it:

     (global-set-key "\C-c=" 'count-words-region)

   To conduct the first test, set mark and point to the beginning and
end of the following line and then type `C-c =' (or `M-x
count-words-region' if you have not bound `C-c ='):

         one   two  three

Emacs will tell you, correctly, that the region has three words.

   Repeat the test, but place mark at the beginning of the line and
place point just _before_ the word `one'.  Again type the command `C-c
=' (or `M-x count-words-region').  Emacs should tell you that the
region has no words, since it is composed only of the whitespace at the
beginning of the line.  But instead Emacs tells you that the region has
one word!

   For the third test, copy the sample line to the end of the
`*scratch*' buffer and then type several spaces at the end of the line.
Place mark right after the word `three' and point at the end of line.
(The end of the line will be the end of the buffer.)  Type `C-c =' (or
`M-x count-words-region') as you did before.  Again, Emacs should tell
you that the region has no words, since it is composed only of the
whitespace at the end of the line.  Instead, Emacs displays an error
message saying `Search failed'.

   The two bugs stem from the same problem.

   Consider the first manifestation of the bug, in which the command
tells you that the whitespace at the beginning of the line contains one
word.  What happens is this: The `M-x count-words-region' command moves
point to the beginning of the region.  The `while' tests whether the
value of point is smaller than the value of `end', which it is.
Consequently, the regular expression search looks for and finds the
first word.  It leaves point after the word.  `count' is set to one.
The `while' loop repeats; but this time the value of point is larger
than the value of `end', the loop is exited; and the function displays
a message saying the number of words in the region is one.  In brief,
the regular expression search looks for and finds the word even though
it is outside the marked region.

   In the second manifestation of the bug, the region is whitespace at
the end of the buffer.  Emacs says `Search failed'.  What happens is
that the true-or-false-test in the `while' loop tests true, so the
search expression is executed.  But since there are no more words in
the buffer, the search fails.

   In both manifestations of the bug, the search extends or attempts to
extend outside of the region.

   The solution is to limit the search to the region--this is a fairly
simple action, but as you may have come to expect, it is not quite as
simple as you might think.

   As we have seen, the `re-search-forward' function takes a search
pattern as its first argument.  But in addition to this first,
mandatory argument, it accepts three optional arguments.  The optional
second argument bounds the search.  The optional third argument, if
`t', causes the function to return `nil' rather than signal an error if
the search fails.  The optional fourth argument is a repeat count.  (In
Emacs, you can see a function's documentation by typing `C-h f', the
name of the function, and then <RET>.)

   In the `count-words-region' definition, the value of the end of the
region is held by the variable `end' which is passed as an argument to
the function.  Thus, we can add `end' as an argument to the regular
expression search expression:

     (re-search-forward "\\w+\\W*" end)

   However, if you make only this change to the `count-words-region'
definition and then test the new version of the definition on a stretch
of whitespace, you will receive an error message saying `Search failed'.

   What happens is this: the search is limited to the region, and fails
as you expect because there are no word-constituent characters in the
region.  Since it fails, we receive an error message.  But we do not
want to receive an error message in this case; we want to receive the
message that "The region does NOT have any words."

   The solution to this problem is to provide `re-search-forward' with
a third argument of `t', which causes the function to return `nil'
rather than signal an error if the search fails.

   However, if you make this change and try it, you will see the message
"Counting words in region ... " and ... you will keep on seeing that
message ..., until you type `C-g' (`keyboard-quit').

   Here is what happens: the search is limited to the region, as before,
and it fails because there are no word-constituent characters in the
region, as expected.  Consequently, the `re-search-forward' expression
returns `nil'.  It does nothing else.  In particular, it does not move
point, which it does as a side effect if it finds the search target.
After the `re-search-forward' expression returns `nil', the next
expression in the `while' loop is evaluated.  This expression
increments the count.  Then the loop repeats.  The true-or-false-test
tests true because the value of point is still less than the value of
end, since the `re-search-forward' expression did not move point. ...
and the cycle repeats ...

   The `count-words-region' definition requires yet another
modification, to cause the true-or-false-test of the `while' loop to
test false if the search fails.  Put another way, there are two
conditions that must be satisfied in the true-or-false-test before the
word count variable is incremented: point must still be within the
region and the search expression must have found a word to count.

   Since both the first condition and the second condition must be true
together, the two expressions, the region test and the search
expression, can be joined with an `and' special form and embedded in
the `while' loop as the true-or-false-test, like this:

     (and (< (point) end) (re-search-forward "\\w+\\W*" end t))

(Note: forward-paragraph, for information about `and'.)

   The `re-search-forward' expression returns `t' if the search
succeeds and as a side effect moves point.  Consequently, as words are
found, point is moved through the region.  When the search expression
fails to find another word, or when point reaches the end of the
region, the true-or-false-test tests false, the `while' loop exists,
and the `count-words-region' function displays one or other of its
messages.

   After incorporating these final changes, the `count-words-region'
works without bugs (or at least, without bugs that I have found!).
Here is what it looks like:

     ;;; Final version: `while'
     (defun count-words-region (beginning end)
       "Print number of words in the region."
       (interactive "r")
       (message "Counting words in region ... ")
     
     ;;; 1. Set up appropriate conditions.
       (save-excursion
         (let ((count 0))
           (goto-char beginning)
     
     ;;; 2. Run the while loop.
           (while (and (< (point) end)
                       (re-search-forward "\\w+\\W*" end t))
             (setq count (1+ count)))
     
     ;;; 3. Send a message to the user.
           (cond ((zerop count)
                  (message
                   "The region does NOT have any words."))
                 ((= 1 count)
                  (message
                   "The region has 1 word."))
                 (t
                  (message
                   "The region has %d words." count))))))


automatically generated by info2www version 1.2.2.9