GNU Info

Info Node: (diff.info)Large Files

(diff.info)Large Files


Next: Ignoring Changes Prev: Arbitrary Limits Up: Shortcomings
Enter node , (file) or (file)node

Handling Files that Do Not Fit in Memory
----------------------------------------

   `diff' operates by reading both files into memory.  This method
fails if the files are too large, and `diff' should have a fallback.

   One way to do this is to scan the files sequentially to compute hash
codes of the lines and put the lines in equivalence classes based only
on hash code.  Then compare the files normally.  This does produce some
false matches.

   Then scan the two files sequentially again, checking each match to
see whether it is real.  When a match is not real, mark both the
"matching" lines as changed.  Then build an edit script as usual.

   The output routines would have to be changed to scan the files
sequentially looking for the text to print.


automatically generated by info2www version 1.2.2.9