GNU Info

Info Node: (emacs-lisp-intro.info)the-the

(emacs-lisp-intro.info)the-the


Next: Kill Ring Prev: Conclusion Up: Top
Enter node , (file) or (file)node

The `the-the' Function
**********************

   Sometimes when you you write text, you duplicate words--as with "you
you" near the beginning of this sentence.  I find that most frequently,
I duplicate "the'; hence, I call the function for detecting duplicated
words, `the-the'.

   As a first step, you could use the following regular expression to
search for duplicates:

     \\(\\w+[ \t\n]+\\)\\1

This regexp matches one or more word-constituent characters followed by
one or more spaces, tabs, or newlines.  However, it does not detect
duplicated words on different lines, since the ending of the first
word, the end of the line, is different from the ending of the second
word, a space.  (For more information about regular expressions, see
Note: Regular Expression Searches, as well as Note:
Syntax of Regular Expressions, and Note: Regular
Expressions.)

   You might try searching just for duplicated word-constituent
characters but that does not work since the pattern detects doubles
such as the two occurrences of `th' in `with the'.

   Another possible regexp searches for word-constituent characters
followed by non-word-constituent characters, reduplicated.  Here,
`\\w+' matches one or more word-constituent characters and `\\W*'
matches zero or more non-word-constituent characters.

     \\(\\(\\w+\\)\\W*\\)\\1

Again, not useful.

   Here is the pattern that I use.  It is not perfect, but good enough.
`\\b' matches the empty string, provided it is at the beginning or end
of a word; `[^@ \n\t]+' matches one or more occurrences of any
characters that are _not_ an @-sign, space, newline, or tab.

     \\b\\([^@ \n\t]+\\)[ \n\t]+\\1\\b

   One can write more complicated expressions, but I found that this
expression is good enough, so I use it.

   Here is the `the-the' function, as I include it in my `.emacs' file,
along with a handy global key binding:

     (defun the-the ()
       "Search forward for for a duplicated word."
       (interactive)
       (message "Searching for for duplicated words ...")
       (push-mark)
       ;; This regexp is not perfect
       ;; but is fairly good over all:
       (if (re-search-forward
            "\\b\\([^@ \n\t]+\\)[ \n\t]+\\1\\b" nil 'move)
           (message "Found duplicated word.")
         (message "End of buffer")))
     
     ;; Bind `the-the' to  C-c \
     (global-set-key "\C-c\\" 'the-the)


   Here is test text:

     one two two three four five
     five six seven

   You can substitute the other regular expressions shown above in the
function definition and try each of them on this list.


automatically generated by info2www version 1.2.2.9