GNU Info

Info Node: (slib.info)Sequence Comparison

(slib.info)Sequence Comparison


Prev: String Search Up: Sorting and Searching
Enter node , (file) or (file)node

Sequence Comparison
-------------------

  `(require 'diff)'

This package implements the algorithm:

     S. Wu, E. Myers, U. Manber, and W. Miller,
        "An O(NP) Sequence Comparison Algorithm,"
        Information Processing Letters 35, 6 (1990), 317-323.
        <http://www.cs.arizona.edu/people/gene/vita.html>

If the items being sequenced are text lines, then the computed
edit-list is equivalent to the output of the "diff" utility program.
If the items being sequenced are words, then it is like the lesser
known "spiff" program.

The values returned by `diff:edit-length' can be used to gauge the
degree of match between two sequences.

I believe that this algorithm is currently the fastest for these tasks,
but genome sequencing applications fuel extensive research in this area.

 - Function: diff:longest-common-subsequence array1 array2 =?
 - Function: diff:longest-common-subsequence array1 array2
     ARRAY1 and ARRAY2 are one-dimensional arrays.  The procedure =? is
     used to compare sequence tokens for equality.  =? defaults to
     `eqv?'.  `diff:longest-common-subsequence' returns a
     one-dimensional array of length `(quotient (- (+ len1 len2)
     (fp:edit-length ARRAY1 ARRAY2)) 2)' holding the longest sequence
     common to both ARRAYs.

 - Function: diff:edits array1 array2 =?
 - Function: diff:edits array1 array2
     ARRAY1 and ARRAY2 are one-dimensional arrays.  The procedure =? is
     used to compare sequence tokens for equality.  =? defaults to
     `eqv?'.  `diff:edits' returns a list of length `(fp:edit-length
     ARRAY1 ARRAY2)' composed of a shortest sequence of edits
     transformaing ARRAY1 to ARRAY2.

     Each edit is a list of an integer and a symbol:
    (J insert)
          Inserts `(array-ref ARRAY1 J)' into the sequence.

    (K delete)
          Deletes `(array-ref ARRAY2 K)' from the sequence.

 - Function: diff:edit-length array1 array2 =?
 - Function: diff:edit-length array1 array2
     ARRAY1 and ARRAY2 are one-dimensional arrays.  The procedure =? is
     used to compare sequence tokens for equality.  =? defaults to
     `eqv?'.  `diff:edit-length' returns the length of the shortest
     sequence of edits transformaing ARRAY1 to ARRAY2.

     (diff:longest-common-subsequence '#(f g h i e j c k l m)
                                      '#(f g e h i j k p q r l m))
                                      => #(f g h i j k l m)
     
     (diff:edit-length '#(f g h i e j c k l m)
                       '#(f g e h i j k p q r l m))
     => 6
     
     (pretty-print (diff:edits '#(f g h i e j c k l m)
                               '#(f g e h i j k p q r l m)))
     -|
     ((3 insert)                           ; e
      (4 delete)                           ; c
      (6 delete)                           ; h
      (7 insert)                           ; p
      (8 insert)                           ; q
      (9 insert))                          ; r


automatically generated by info2www version 1.2.2.9