GNU Info

Info Node: (emacs-lisp-intro.info)recursive-count-words

(emacs-lisp-intro.info)recursive-count-words


Next: Counting Exercise Prev: count-words-region Up: Counting Words
Enter node , (file) or (file)node

Count Words Recursively
=======================

   You can write the function for counting words recursively as well as
with a `while' loop.  Let's see how this is done.

   First, we need to recognize that the `count-words-region' function
has three jobs: it sets up the appropriate conditions for counting to
occur; it counts the words in the region; and it sends a message to the
user telling how many words there are.

   If we write a single recursive function to do everything, we will
receive a message for every recursive call.  If the region contains 13
words, we will receive thirteen messages, one right after the other.
We don't want this!  Instead, we must write two functions to do the
job, one of which (the recursive function) will be used inside of the
other.  One function will set up the conditions and display the
message; the other will return the word count.

   Let us start with the function that causes the message to be
displayed.  We can continue to call this `count-words-region'.

   This is the function that the user will call.  It will be
interactive.  Indeed, it will be similar to our previous versions of
this function, except that it will call `recursive-count-words' to
determine how many words are in the region.

   We can readily construct a template for this function, based on our
previous versions:

     ;; Recursive version; uses regular expression search
     (defun count-words-region (beginning end)
       "DOCUMENTATION..."
       (INTERACTIVE-EXPRESSION...)
     
     ;;; 1. Set up appropriate conditions.
       (EXPLANATORY MESSAGE)
       (SET-UP FUNCTIONS...
     
     ;;; 2. Count the words.
         RECURSIVE CALL
     
     ;;; 3. Send a message to the user.
         MESSAGE PROVIDING WORD COUNT))

   The definition looks straightforward, except that somehow the count
returned by the recursive call must be passed to the message displaying
the word count.  A little thought suggests that this can be done by
making use of a `let' expression: we can bind a variable in the varlist
of a `let' expression to the number of words in the region, as returned
by the recursive call; and then the `cond' expression, using binding,
can display the value to the user.

   Often, one thinks of the binding within a `let' expression as
somehow secondary to the `primary' work of a function.  But in this
case, what you might consider the `primary' job of the function,
counting words, is done within the `let' expression.

   Using `let', the function definition looks like this:

     (defun count-words-region (beginning end)
       "Print number of words in the region."
       (interactive "r")
     
     ;;; 1. Set up appropriate conditions.
       (message "Counting words in region ... ")
       (save-excursion
         (goto-char beginning)
     
     ;;; 2. Count the words.
         (let ((count (recursive-count-words end)))
     
     ;;; 3. Send a message to the user.
           (cond ((zerop count)
                  (message
                   "The region does NOT have any words."))
                 ((= 1 count)
                  (message
                   "The region has 1 word."))
                 (t
                  (message
                   "The region has %d words." count))))))

   Next, we need to write the recursive counting function.

   A recursive function has at least three parts: the `do-again-test',
the `next-step-expression', and the recursive call.

   The do-again-test determines whether the function will or will not be
called again.  Since we are counting words in a region and can use a
function that moves point forward for every word, the do-again-test can
check whether point is still within the region.  The do-again-test
should find the value of point and determine whether point is before,
at, or after the value of the end of the region.  We can use the
`point' function to locate point.  Clearly, we must pass the value of
the end of the region to the recursive counting function as an argument.

   In addition, the do-again-test should also test whether the search
finds a word.  If it does not, the function should not call itself
again.

   The next-step-expression changes a value so that when the recursive
function is supposed to stop calling itself, it stops.  More precisely,
the next-step-expression changes a value so that at the right time, the
do-again-test stops the recursive function from calling itself again.
In this case, the next-step-expression can be the expression that moves
point forward, word by word.

   The third part of a recursive function is the recursive call.

   Somewhere, also, we also need a part that does the `work' of the
function, a part that does the counting.  A vital part!

   But already, we have an outline of the recursive counting function:

     (defun recursive-count-words (region-end)
       "DOCUMENTATION..."
        DO-AGAIN-TEST
        NEXT-STEP-EXPRESSION
        RECURSIVE CALL)

   Now we need to fill in the slots.  Let's start with the simplest
cases first:  if point is at or beyond the end of the region, there
cannot be any words in the region, so the function should return zero.
Likewise, if the search fails, there are no words to count, so the
function should return zero.

   On the other hand, if point is within the region and the search
succeeds, the function should call itself again.

   Thus, the do-again-test should look like this:

     (and (< (point) region-end)
          (re-search-forward "\\w+\\W*" region-end t))

   Note that the search expression is part of the do-again-test--the
function returns `t' if its search succeeds and `nil' if it fails.
(Note: The Whitespace Bug in `count-words-region', for
an explanation of how `re-search-forward' works.)

   The do-again-test is the true-or-false test of an `if' clause.
Clearly, if the do-again-test succeeds, the then-part of the `if'
clause should call the function again; but if it fails, the else-part
should return zero since either point is outside the region or the
search failed because there were no words to find.

   But before considering the recursive call, we need to consider the
next-step-expression.  What is it?  Interestingly, it is the search
part of the do-again-test.

   In addition to returning `t' or `nil' for the do-again-test,
`re-search-forward' moves point forward as a side effect of a
successful search.  This is the action that changes the value of point
so that the recursive function stops calling itself when point
completes its movement through the region.  Consequently, the
`re-search-forward' expression is the next-step-expression.

   In outline, then, the body of the `recursive-count-words' function
looks like this:

     (if DO-AGAIN-TEST-AND-NEXT-STEP-COMBINED
         ;; then
         RECURSIVE-CALL-RETURNING-COUNT
       ;; else
       RETURN-ZERO)

   How to incorporate the mechanism that counts?

   If you are not used to writing recursive functions, a question like
this can be troublesome.  But it can and should be approached
systematically.

   We know that the counting mechanism should be associated in some way
with the recursive call.  Indeed, since the next-step-expression moves
point forward by one word, and since a recursive call is made for each
word, the counting mechanism must be an expression that adds one to the
value returned by a call to `recursive-count-words'.

   Consider several cases:

   * If there are two words in the region, the function should return a
     value resulting from adding one to the value returned when it
     counts the first word, plus the number returned when it counts the
     remaining words in the region, which in this case is one.

   * If there is one word in the region, the function should return a
     value resulting from adding one to the value returned when it
     counts that word, plus the number returned when it counts the
     remaining words in the region, which in this case is zero.

   * If there are no words in the region, the function should return
     zero.

   From the sketch we can see that the else-part of the `if' returns
zero for the case of no words.  This means that the then-part of the
`if' must return a value resulting from adding one to the value
returned from a count of the remaining words.

   The expression will look like this, where `1+' is a function that
adds one to its argument.

     (1+ (recursive-count-words region-end))

   The whole `recursive-count-words' function will then look like this:

     (defun recursive-count-words (region-end)
       "DOCUMENTATION..."
     
     ;;; 1. do-again-test
       (if (and (< (point) region-end)
                (re-search-forward "\\w+\\W*" region-end t))
     
     ;;; 2. then-part: the recursive call
           (1+ (recursive-count-words region-end))
     
     ;;; 3. else-part
         0))

   Let's examine how this works:

   If there are no words in the region, the else part of the `if'
expression is evaluated and consequently the function returns zero.

   If there is one word in the region, the value of point is less than
the value of `region-end' and the search succeeds.  In this case, the
true-or-false-test of the `if' expression tests true, and the then-part
of the `if' expression is evaluated.  The counting expression is
evaluated.  This expression returns a value (which will be the value
returned by the whole function) that is the sum of one added to the
value returned by a recursive call.

   Meanwhile, the next-step-expression has caused point to jump over the
first (and in this case only) word in the region.  This means that when
`(recursive-count-words region-end)' is evaluated a second time, as a
result of the recursive call, the value of point will be equal to or
greater than the value of region end.  So this time,
`recursive-count-words' will return zero.  The zero will be added to
one, and the original evaluation of `recursive-count-words' will return
one plus zero, which is one, which is the correct amount.

   Clearly, if there are two words in the region, the first call to
`recursive-count-words' returns one added to the value returned by
calling `recursive-count-words' on a region containing the remaining
word--that is, it adds one to one, producing two, which is the correct
amount.

   Similarly, if there are three words in the region, the first call to
`recursive-count-words' returns one added to the value returned by
calling `recursive-count-words' on a region containing the remaining
two words--and so on and so on.

With full documentation the two functions look like this:

The recursive function:

     (defun recursive-count-words (region-end)
       "Number of words between point and REGION-END."
     
     ;;; 1. do-again-test
       (if (and (< (point) region-end)
                (re-search-forward "\\w+\\W*" region-end t))
     
     ;;; 2. then-part: the recursive call
           (1+ (recursive-count-words region-end))
     
     ;;; 3. else-part
         0))

The wrapper:

     ;;; Recursive version
     (defun count-words-region (beginning end)
       "Print number of words in the region.
     
     Words are defined as at least one word-constituent
     character followed by at least one character that is
     not a word-constituent.  The buffer's syntax table
     determines which characters these are."
       (interactive "r")
       (message "Counting words in region ... ")
       (save-excursion
         (goto-char beginning)
         (let ((count (recursive-count-words end)))
           (cond ((zerop count)
                  (message
                   "The region does NOT have any words."))
                 ((= 1 count)
                  (message "The region has 1 word."))
                 (t
                  (message
                   "The region has %d words." count))))))


automatically generated by info2www version 1.2.2.9