Hashing
-------
`(require 'hash)'
These hashing functions are for use in quickly classifying objects.
Hash tables use these functions.
- Function: hashq obj k
- Function: hashv obj k
- Function: hash obj k
Returns an exact non-negative integer less than K. For each
non-negative integer less than K there are arguments OBJ for which
the hashing functions applied to OBJ and K returns that integer.
For `hashq', `(eq? obj1 obj2)' implies `(= (hashq obj1 k) (hashq
obj2))'.
For `hashv', `(eqv? obj1 obj2)' implies `(= (hashv obj1 k) (hashv
obj2))'.
For `hash', `(equal? obj1 obj2)' implies `(= (hash obj1 k) (hash
obj2))'.
`hash', `hashv', and `hashq' return in time bounded by a constant.
Notice that items having the same `hash' implies the items have
the same `hashv' implies the items have the same `hashq'.
`(require 'sierpinski)'
- Function: make-sierpinski-indexer max-coordinate
Returns a procedure (eg hash-function) of 2 numeric arguments which
preserves _nearness_ in its mapping from NxN to N.
MAX-COORDINATE is the maximum coordinate (a positive integer) of a
population of points. The returned procedures is a function that
takes the x and y coordinates of a point, (non-negative integers)
and returns an integer corresponding to the relative position of
that point along a Sierpinski curve. (You can think of this as
computing a (pseudo-) inverse of the Sierpinski spacefilling
curve.)
Example use: Make an indexer (hash-function) for integer points
lying in square of integer grid points [0,99]x[0,99]:
(define space-key (make-sierpinski-indexer 100))
Now let's compute the index of some points:
(space-key 24 78) => 9206
(space-key 23 80) => 9172
Note that locations (24, 78) and (23, 80) are near in index and
therefore, because the Sierpinski spacefilling curve is
continuous, we know they must also be near in the plane. Nearness
in the plane does not, however, necessarily correspond to nearness
in index, although it _tends_ to be so.
Example applications:
* Sort points by Sierpinski index to get heuristic solution to
_travelling salesman problem_. For details of performance,
see L. Platzman and J. Bartholdi, "Spacefilling curves and the
Euclidean travelling salesman problem", JACM 36(4):719-737
(October 1989) and references therein.
* Use Sierpinski index as key by which to store 2-dimensional
data in a 1-dimensional data structure (such as a table).
Then locations that are near each other in 2-d space will
tend to be near each other in 1-d data structure; and
locations that are near in 1-d data structure will be near in
2-d space. This can significantly speed retrieval from
secondary storage because contiguous regions in the plane
will tend to correspond to contiguous regions in secondary
storage. (This is a standard technique for managing CAD/CAM
or geographic data.)
`(require 'soundex)'
- Function: soundex name
Computes the _soundex_ hash of NAME. Returns a string of an
initial letter and up to three digits between 0 and 6. Soundex
supposedly has the property that names that sound similar in normal
English pronunciation tend to map to the same key.
Soundex was a classic algorithm used for manual filing of personal
records before the advent of computers. It performs adequately for
English names but has trouble with other languages.
See Knuth, Vol. 3 `Sorting and searching', pp 391-2
To manage unusual inputs, `soundex' omits all non-alphabetic
characters. Consequently, in this implementation:
(soundex <string of blanks>) => ""
(soundex "") => ""
Examples from Knuth:
(map soundex '("Euler" "Gauss" "Hilbert" "Knuth"
"Lloyd" "Lukasiewicz"))
=> ("E460" "G200" "H416" "K530" "L300" "L222")
(map soundex '("Ellery" "Ghosh" "Heilbronn" "Kant"
"Ladd" "Lissajous"))
=> ("E460" "G200" "H416" "K530" "L300" "L222")
Some cases in which the algorithm fails (Knuth):
(map soundex '("Rogers" "Rodgers")) => ("R262" "R326")
(map soundex '("Sinclair" "St. Clair")) => ("S524" "S324")
(map soundex '("Tchebysheff" "Chebyshev")) => ("T212" "C121")