Bogofilter's Spamicity Calculation Methods
Bogofilter offers three methods of computing a message's "spamicity," a
number, ranging from 0 to 1, that gets closer to 1 as the likelihood
increases that the message is spam. We call these:
the Graham method
the Robinson method
the Robinson-Fisher method
The Graham method computes the probability that each word in the
message indicates spam or non-spam, then selects the words with
highest probabilities and computes the spamicity. This gives a range
of spamicities from 0.01 to 0.99, with values over 0.9 indicating
spam.
The Robinson method computes word probabilities in much the same way
as the Graham method, but deals more cleanly with cases where a word
has been encountered few, or no, times in the past. This method gives
a range of spamicities between 0 and 1 (not inclusive), but the
numbers tend more toward the center of the range. Consequently,
spamicities over approximately 0.54 indicate spam.
The Fisher method is a variant of the Robinson method that uses a
statistical chi-square calculation as the last step. This method
tends to yield results very near 0 for non-spam and very near 1 for
spam. Messages that aren't clearly identifiable produce results in
the range of roughly 0.1 to 0.95. This method is called
Robinson-Fisher because the calculation is based on Fisher's method,
published in the 1950s, for combining independently derived
probabilities.
When building bogofilter, the user can specify (at configuration) any
or all of the above methods for inclusion in the program being built.
The method to use when running bogofilter can be specified on the
command line or in the configuration file.