Here are some ideas for improving GNU diff and patch. The
GNU project has identified some improvements as potential programming
projects for volunteers. You can also help by reporting any bugs that
you find.
If you are a programmer and would like to contribute something to the
GNU project, please consider volunteering for one of these projects. If
you are seriously contemplating work, please write to
`gnu@prep.ai.mit.edu' to coordinate with other volunteers.
One should be able to use GNU diff to generate a patch from any
pair of directory trees, and given the patch and a copy of one such
tree, use patch to generate a faithful copy of the other.
Unfortunately, some changes to directory trees cannot be expressed using
current patch formats; also, patch does not handle some of the
existing formats. These shortcomings motivate the following suggested
projects.
diff and patch do not handle some changes to directory
structure. For example, suppose one directory tree contains a directory
named `D' with some subsidiary files, and another contains a file
with the same name `D'. `diff -r' does not output enough
information for patch to transform the the directory subtree into
the file.
There should be a way to specify that a file has been deleted without
having to include its entire contents in the patch file. There should
also be a way to tell patch that a file was renamed, even if
there is no way for diff to generate such information.
These problems can be fixed by extending the diff output format
to represent changes in directory structure, and extending patch
to understand these extensions.
Some files are neither directories nor regular files: they are unusual
files like symbolic links, device special files, named pipes, and
sockets. Currently, diff treats symbolic links like regular files;
it treats other special files like regular files if they are specified
at the top level, but simply reports their presence when comparing
directories. This means that patch cannot represent changes
to such files. For example, if you change which file a symbolic link
points to, diff outputs the difference between the two files,
instead of the change to the symbolic link.
diff should optionally report changes to special files specially,
and patch should be extended to understand these extensions.
When a file name contains an unusual character like a newline or
white space, `diff -r' generates a patch that patch cannot
parse. The problem is with format of diff output, not just with
patch, because with odd enough file names one can cause
diff to generate a patch that is syntactically correct but
patches the wrong files. The format of diff output should be
extended to handle all possible file names.
GNU diff can analyze files with arbitrarily long lines and files
that end in incomplete lines. However, patch cannot patch such
files. The patch internal limits on line lengths should be
removed, and patch should be extended to parse diff
reports of incomplete lines.
diff operates by reading both files into memory. This method
fails if the files are too large, and diff should have a fallback.
One way to do this is to scan the files sequentially to compute hash
codes of the lines and put the lines in equivalence classes based only
on hash code. Then compare the files normally. This does produce some
false matches.
Then scan the two files sequentially again, checking each match to see
whether it is real. When a match is not real, mark both the
"matching" lines as changed. Then build an edit script as usual.
The output routines would have to be changed to scan the files
sequentially looking for the text to print.
It would be nice to have a feature for specifying two strings, one in
from-file and one in to-file, which should be considered to
match. Thus, if the two strings are `foo' and `bar', then if
two lines differ only in that `foo' in file 1 corresponds to
`bar' in file 2, the lines are treated as identical.
It is not clear how general this feature can or should be, or
what syntax should be used for it.
If you think you have found a bug in GNU cmp, diff,
diff3, sdiff, or patch, please report it by
electronic mail to `bug-gnu-utils@prep.ai.mit.edu'. Send as
precise a description of the problem as you can, including sample input
files that produce the bug, if applicable.
Because Larry Wall has not released a new version of patch since
mid 1988 and the GNU version of patch has been changed since
then, please send bug reports for patch by electronic mail to
both `bug-gnu-utils@prep.ai.mit.edu' and
`lwall@netlabs.com'.