================= BOGOFILTER NEWS ================= $Id: NEWS,v 1.112 2003/07/02 20:10:46 relson Exp $ 0.13.7.2 2003-07-02 * Fixed loop in yyinput() caused by unexpected EOF. 0.13.7.1 2003-06-26 * Update bogotune to version 0.3 * Added '-k size' option to bogofilter and bogoutil for setting BerkeleyDB's cache size. 2003-06-25 * For bogotune change processing of '-t' switch from pass 1 to pass 2 so that it supercedes the config file. * Man pages now use '\ ' when a non-breaking space is needed, instead of 0xA0. 2003-06-24 * '-Q' processing no longer requires that spamlist.db be present. 0.13.7 2003-06-20 * Replaced tuning/tuning.sh with tuning/bogotune (and related files). 0.13.6.3 2003-06-18 * Minor code rewrites to speed up processing messages, mboxes, and msg-count files. In particular, tuning/tuning.sh runs are approx 47% faster than before. * Fixed several errors in tuning/tuning.sh and reformatted "Top 10 Results" output. * Minor changes to bogoutil to support bogotune script. * Added newlines to correct usage messages. 0.13.6.2 2003-06-05 * Don't allow square brackets in tokens. Do allow dollar signs in tokens in msg-count files. * Bogolexer now discards first 'From' token to match scoring behavior of bogofilter. 0.13.6.1 2003-06-05 * Updated file tuning/README and script tuning/tuning.sh. 0.13.6 2003-06-04 * Fix check for "^From " lines to work properly during base64 decoding. * End html comment processing when a message header is found. * Improve README for the tuning scripts and simplify them. 0.13.5 2003-06-03 * Allow terminal exclamation points on tokens. * Updated contrib/mime.get.rfc822 2003-06-02 * Fixed bogofilter's non-use of message counts in msg-count files. * Diagnose invalid values of robx. * Modified rstats_print_histogram() so it doesn't print 'nan's. * Modified t.frame to find version of grep on Solaris so t.bulkmode can run successfully. 0.13.4.1 2003-05-31 * Modified t.parsing test so it works with OSX's default file system. 0.13.4 2003-05-30 * Changed default value of ROBS from 0.001 to 0.01 * Fixed options '-M' (mailbox mode) and '-p' (passthrough mode) so they work properly together. * Minor cleanups in bogofilter.cf.example * Added db-3.2 and db-3.1 to list for AC_CHECK_DB in configure.ac 0.13.3 2003-05-28 * Minor code tweaks to quiet gcc-3.3 warnings. 2003-05-26 * Added doc/programmer/README.osx to distribution. * Corrected FAQ's procmail recipe for training with SpamAssassin. 2003-05-24 * Added -V (version) option to bogolexer. * Tweaked long line check used to prevent scanner buffer overflow. 0.13.2.1 2003-05-24 * In bulkmode, output filenames to stdout. * Further fixes for static-build system. 0.13.2 2003-05-24 * Autoconfiguration of BerkeleyDB library has been improved. * Build procedure for statically linked binaries has been improved. * Fixed defect in replace_nonascii_characters that was superseding ignore_case option. * Portability fix for efence usage in t.frame. * Added static-build to solve glibc version problem. 0.13.1 2003-05-23 * Modified "make rpm" to also build statically linked binaries. They're packaged in bogofilter-static-x.y.z-1.i586.rpm * Fixed bogofilter.spec.in to include files CHANGES-0.13 and RELEASE.NOTES-0.13 which had been left out. * tests/t.frame portability fix for non-Linux compatibility. 0.13.0 2003-05-21 * Added file RELEASE.NOTES-0.13. Read it! * Changed parsing defaults to: -PI ignore_case (default is disabled) -Ph header_line_markup (default is enabled) -Pt tokenize_html_tags (default is enabled) * Recognize a line of whitespace as ending the message header. * contrib/randomtrain and contrib/scramble can now process both mbox and maildir formats. * Added perl script contrib/mime.get.rfc822 to extract forwarded messages from with a message. * Added basic support for emacs RMAIL mailboxes. * Removed incomplete RMAIL/Babyl-5 support. * Registration code modified to count unique tokens for each message and display the total of the counts. * Added 'bogo-what?' to FAQ. 0.12.3 2003-05-10 * Modified bulk mode code to allow registering maildirs. * Added options to return tokens from inside HTML tags. Switch '-Ht' and option "tokenize_html_tags" turn it on. * Bogofilter's '-e' switch can now be used without '-p'. * Added doc/integrating-with-postfix. * Added bogofilter-faq-fr.html, a French translation of the FAQ. * Revise description of verbose output in FAQ. * Update man page documentation of bogofilter's switches. * Added basic memory accounting and debug capability. * Fixed memory leak in rstats.c * Fixed defect in handling of folded spam header lines. * Modified parsing so that yyredo() and yy_use_redo_text() are no longer needed. 0.12.2 2003-04-30 * Corrected bulkmode problem processing messages without "^From " lines. * Corrected alignment of wordprop_t which caused bus error on risc. * Added directory to 'Error creating directory' message. 0.12.1 2003-04-25 * Corrected bad BOGOFILTER_DIR value in t.bulkmode * Subdirectories contrib and tuning now install from rpm to /usr/share/bogofilter. 0.12.0 2003-04-24 * Corrected some errors in rpm specfile. 2003-04-23 * Added 'tuning' directory with scripts for tuning bogofilter. (cf. bogofilter-tuning.HOWTO) 2003-04-21 * Added '-M' to allow classification of multiple messages in mbox formatted files. * New option '-Q' (query/display config) replaces '-qv'. * Grouped options into logical groups for help message and man page. Revised option descriptions. 2003-04-19 * Added bogofilter-tuning.HOWTO as replacement for README.Robinson 2003-04-18 * Added classification support for msg-count formatted files. * Add bogolex.sh for creating msg-count formatted email file. 2003-04-17 * Added bulk mode procesing for Maildirs. '-b' reads filenames from stdin. '-B' gets filenames from the command line. 2003-04-16 * Miscellaneous refactoring in main.c 0.11.2 - stable version 2003-04-13 * Added 'terse' option to bogofilter.cf for selecting format of X-Bogosity line. 2003-04-11 * Use frexp() to retain maximum precision of floating point results. 0.11.1.9 2003-04-10 * Reformat histogram output (from "-vv") to fit in 80 columns. * Added sample configuration for maildrop. 2003-04-09 * Added protections against negative token counts to bogoutil.c and database_db.c * Additional portability changes made to the regression tests. * Enhanced '-m' option allows specifying robs value. 0.11.1.8 2003-04-07 * Include 'strict_check' in '-qv' output. * Correct outdated acinclude.m4, as it causes the configure script to be invalid. * Revised UPGRADE document. * Added contrib/bogotrain.sh 0.11.1.7 2003-04-07 * Change bogoutil's '-p' option to require a database. * Fix OS X segfault caused by using DB handle after closing database. 2003-04-05 * Improve bogoutil's reporting of a bad directory or filename. * Simplify configure check for BerkeleyDB. * Extend configure's compiler checks for AIX. 0.11.1.6 2003-04-01 * Changed default value of 'strict_check' to 'no' (disabled). 2003-03-30 * Added config file option 'strict_check' for processing html comments. Enabled means to use "" to delimit comments. Disabled uses "". 0.11.1.5 2003-03-28 * Bogofilter now frees _all_ memory that it allocates. * FAQ reorganized and info added on asian spam, the format of verbose output, and using SpamAssassin to train bogofilter. * Fixed processing of '-o' option. 0.11.1.4 2003-03-25 * Cleaned up help messages and added version info. * Expanded bogofilter-faq.html * Fixed precedence for directory specifications. * Fixed processing of folded X-Bogosity line. * Fixed processing of spam_subject_tag. 0.11.1.3 - stable release 2003-03-10 * Expanded regression tests. * Cleaned up fprintf() arguments. * Cleaned up message and mime header checks. * Additional improvements to maintenance code. 0.11.1.2 2003-03-06 * Fixed bogoutil's broken maintenance mode. * Update bogofilter documentation and FAQ. * Explicitly check linking against libdb early to avoid unspecific error messages as "cannot determine size of unsigned short". * Retry locking without mmap() on systems that return the old-fashioned EACCES rather than EAGAIN for locking failures such as AIX 4.3.3. * Fix potential division by zero in histogram generator, it caused program abort after not handling floating point exceptions on some architectures such as Alpha. The division by zero is now avoided. 0.11.1.1 2003-03-05 * Fixed flaw that caused user config file to be ignored. * Fixed broken '-u' (update) code. * Updated documentation of bogolexer and bogoutil. 0.11.1 2003-03-04 * Using standard html comment delimiters when discarding comments. * Fixed charset initialization flaw. 0.11.0 2003-03-03 * The Robinson-Fisher algorithm is now the default algorithm. * The configuration file parser is stricter and more correct. * Separated message registration options from unregistration options. '-S' and '-N' have been changed and now just do unregistration. To move a message from one wordlist to the other, use '-S -n' or '-N -s' (as appropriate) * Bogofilter's -p (passthrough) mode will no longer read the entire mail into memory if the standard input is a seekable regular file. * Bogofilter's '-l' option was changed and no longer allows an argument. Use the new '-L yourtag' option to provide a tag for log messages. * Database access efficiency changes. * Improvements in html comment handling code. * Internal cleanup of storage used in parsing messages and working with databases. * Manual pages now contain the proper path to bogofilter.cf. 0.10.3.1 2003-02-17 * Updated bogofilter and bogoutil man pages. * Give command line options preference over config file options. 0.10.3 2003-02-14 * Database access efficiency changes. * Database properly closed at end of maintenance pass. * Improvements in html comment handling code. * Support option '-?' 0.10.2 2003-02-01 * Stable release of 0.10.1.5 0.10.1.5 2003-02-01 0.10.1.4 2003-01-30 0.10.1.3 2003-01-29 0.10.1.2 2003-01-27 0.10.1.1 2003-01-25 0.10.1 2003-01-22 * A variety of robustness and portability changes, code and file cleanups and documentation updates. * Multiple fixes for mime and html processing. * Additional support and fixes for the various spam scoring algorithms. ** See file CHANGES-0.10 for details of the above items. 0.10.0 2003-01-19 * Added mime processing capability, with decoding of base64, quoted-printable and uuencoded sections. Ignores attachments when computing spamicity. * Added wordlist maintenance capability to bogoutil. Can discard tokens based on count, age, or length. Can replace non-ASCII chars with question marks. * Added dates to wordlist tokens. Option "datestamp_tokens=true|false" can be used to enable/disable them. * Moved most documentation files to doc directory. * Added sample procmail file, contrib/procmailrc.example * Spamicity score now computable from multiple word list pairs, i.e. all spam and ham word lists in directories named on command line or in config file (via "wordlist=" or "bogofilter_dir=" lines). * Lexer is now case insensitive * Increase MAXTOKENLEN from 20 to 30, allowing more and longer tokens to be processed. * New options for setting of default charset and replacing of non-ASCII characters. New character set handling routines to provide charset specific token parsing. * New error handling routine will output error messages to stderr and, if '-l' (logging) is enabled, to syslog. * New message formatting capability allows formats to be put in config file for X-Bogosity line and logging messages. Message content can include status, spamicity, version, etc. * Long-standing locking bugs that caused corruption in the data base have been resolved. * Work around ash-0.2 and bash-1 bugs, needed for make check. * Cater for malloc/calloc implementations that return NULL when 0 bytes of memory are requested, some AIX versions e. g., that would previously falsely claim an "out of memory" condition. (also available as patch for 0.9.1.2) * Reorder gcc __attribute__ lines for gcc-2.7 compatibility. (also available as patch for 0.9.1.2) 0.9.1.2 2002-12-05 * A defect in the collect_words routines (in 0.9.1) caused incorrect generation of "must get only one message to calculate spamicity!" messages. This has been fixed. * A defect in the contrib/bogopass script caused the unbase64-edited version of the mail to be printed rather than the original with just the header added. This has been fixed. * Documentation has been revised and updated. * Robinson-Fisher method now produces a tristate status, i.e. spam/ham/unsure, if ham_cutoff is non-zero. ham_cutoff defaults to 0.1 and can be set via config file. * Script contrib/bogopass has revised error and environment checking. 0.9.1 2002-11-30 * New script contrib/bogopass allows processing of base64 attachments. This is a temporary solution until base64 code can be built into bogofilter. * New file README.Robinson describes the tunable parameters for the Robinson algorithm and what to do for best performance. * Changed the default behavior to use the Robinson algorithm. * Corrected incorrect sort order when printing statistics. * Added support for Fisher's method of combining probabilities, as optimized for this purpose by Rob W. W. Hooft, to the "Robinson" algorithm. * The new file METHODS describes the Graham, Robinson, and Fisher methods that bogofilter supports for computing spamicity. * New file README.dcdflib gives some info on the dcdflib free library of routines for cumulative distribution functions. * A new '-f' option tells bogofilter to use Fisher's method. * A new '-c' option in bogofilter allows specification of the configuration file to read. * A new '-C' option tells bogofilter not to read any config file. * The syslog facility in '-l' mode has changed from "daemon" to "mail", so your logs may now be in /var/log/maillog or /var/log/mail rather than /var/log/messages. Check your /etc/syslog.conf. * The testing framework now works on Solaris. Internal Changes: * Fixed several portability problems uncovered by the new regression tests. * Added three more regression tests designed to confirm that bogofilter's results are matching saved reference results. * Implemented an object oriented API for using computational methods. * Split the main module into a registration module and three algorithm modules - for the fisher, graham and robinson methods. * Registering big mbox files is much faster now, at the expense of some memory. 0.8.0 2002-11-10 * The lexer code now detects read errors (and exits with code 2 if it finds one.) * Fixed passthrough mode in bogofilter: it no longer strips the spam-header from a mail body. * Fixed portability to some systems, notably, Solaris and HP-UX, added README. for some systems to describe build issues. * Fixed "rpl_malloc" link failures. * Fixed bogofilter 0.7.6 passthrough regression on some systems: The X-Bogofilter header would be added to the body and a bogus blank line would be added. * Bogofilter now supports a configuration file named /etc/bogofilter.cf and/or ~/.bogofilter.cf. * Bogofilter's use of '-v' for printing spamicity statistics has been organized with increasing levels of details as additional '-v's are added. * When using the Robinson algorithm, bogofilter can print a simple histogram showing word probability distribution. * Bogoutil supports a new '-w' switch for displaying tokens from the word list databases. * Bogolexer added to distribution. Provides easy access to parsing a file to examine the tokens. * Bogolexer has a new '-p' (passthrough) for printing tokens and bogoutil has a new '-p' (probability) for printing the probabilities of one or more tokens. They can be connected via pipe to display the probabilities of all words in a message. * DB 4.1 support has been fixed. * Documentation updates. 0.7.6 2002-10-27 * Added README.hp-ux for those using HP-UX. * Added support for additional architectures - ia64, arm, powerpc, and s390. * Bogofilter -p mode now preserves CR and NUL characters. * Bogofilter -p mode now detects if the computer runs out of memory. * Bogofilter supports a new "-l" switch to write run-time log information to syslog. * Bogofilter supports a new algorithm to calculate the "spamicity", the "Robinson" algorithm. It is enabled with the new "-r" switch. The old behavior is called the "Graham" algorithm and can be enforced with the new "-g" switch. The default behavior is to use the "Graham" algorithm. * Bogofilter now has an "-R FILE" option (that implies -r) to print an R data frame to FILE. * Bogofilter and bogoutil now have a "-x CLASSES" option to turn on debugging. * Bogoupgrade.pl has been renamed to bogoupgrade. * There is now a man page for bogoupgrade. * BASE64 treatment has been fixed. It ignored whole lines if they consisted of a single token. Now a token is only considered base64 and ignored if it's >= 32 characters or ends in one or two padding "=" signs. * MIME boundary lines are now emitted as tokens. Some of them are typical of certain spam software, so they might turn out to be useful. * All control characters are now considered token delimiters. * Bogofilter now aborts if it cannot figure where to look for its data base directory. * The software no longer crashes on machines that do not allow for unaligned memory access (m68k; many RISC, e. g. SPARC). * DB 4.1 is now supported. * Documentation updates. 0.7.5 Sun Oct 20 17:34:35 PDT 2002 * The header in bogofilter -p mode now defaults to X-Bogosity, but can be changed by using "./configure --enable-spam-header=name" at compile time. * The option names -h/-H are back to -n/-N like they were in version 0.6, and -h now means "help". * A utility has been added to help upgrade wordlists from older versions of bogofilter to the current format. See the UPGRADE file for more information. * Support has been added for the environment variable BOGOFILTER_DIR to control where bogofilter looks for it's wordlists. * Now bogofilter no longer depends on the Judy package. We now use a high performance hashing algorithm for message evaluation. The Judy package is no longer required to compile or run bogofilter. * Support for the -e flag, which will cause bogofilter to exit with a value of 0 regardless of the spamicity of the message. This is useful when using -p mode. * Support for -u flag. This allows message evaluation and training to happen in the same invocation of bogofilter. * Extended TOKEN patterns to improve support for European languages. * Improved wordlist locking to prevent data corruption. * Added procmail recipes for example usage in the man page. 0.7.4 Tue Sep 17 02:29:48 EDT 2002 * Added infrastructure to support multiple wordlists * Fixed classification bug * Fixed errors in documentation * Improved portability of locking code * Fixed 'last line occasionally emitted twice' bug * Cleaned up underflow checking for word counts in bogofilter.c * Code readability improvements * Split main() function in bogofilter.c into smaller pieces * Message processing performance improvements 0.7.3 Thu Sep 12 13:28:37 PDT 2002 Adrian Otto: * Added portable file locking support for files and databases David Relson * Bug fix for negative counts in word registration * Bug fix for SEGV in $HOME path code * Bug fix for trailing slash in -d option 0.7.2: Wed Sep 11 15:28:00 PDT 2002 Adrian Otto: * Introduced GNU configure for portability code 0.7.1: Tue Sep 10 00:59:00 PDT 2002 Adrian Otto: * Skip existing X-Spam-Header * Performance improvement for -p mode Paul Tomblin: * Bug fix in getopt argument 0.7: Sat Sep 7 14:18:33 EDT 2002 Eric S. Raymond: * Check your scripts! Option names have changed. * Name changes: goodlist -> hamlist, badlist-> spamlist. This is a step towards supporting more categories. * Autodaemon is gone. Instead, the new implementation uses DBM. Optimization with mmap will be in a future release. * Speed-tuning of the bogofilter function. * We're back to not ignoring HTML comments. 0.6: Fri Aug 30 00:25:49 EDT 2002 * Fixed a fluky bug in the socket-transmission logic * Fixed an edge case where a single message with a From line was getting counted twice. * Unknown-word probability bumped from 0.2 to 0.4, tracking a change by Paul Graham. * Documented -d option. 0.5: Thu Aug 29 13:38:12 EDT 2002 * Passthrough option can be used to add an X-Spam-Status header. * There is now a per-message word frequency cap, so spammers can't do an equivalent of Google fodder. * HTML comments are now ignored. * HTML 4.0 keywords and attributes are now ignored. * Improved extrema calculation. * Mutt patch withdrawn -- have a better version of mutt macros instead. * -S and -N options from matt@lickey.com (Matt Armstrong). * Client-server partitioning with a persistent server, drastically reducing startup cost after the first run. * Minor bug fix by Eric Seppanen. 0.4: Sat Aug 24 09:07:45 EDT 2002 * regenerated bogofilter mutt patch. * wordlist files are now automatically created in -s and -n modes. * Reversed the exit values, following a suggestion by Michael Elkins about how to make bogofilter fail gracefully. * -Wall cleanup and uninitialized-variable fix from Eric Seppanen. * fcntl(2) file locking to head off a race condition in write_list. * Added the long-sought procmail recipe. 0.3: Fri 23 Aug 03:30:49 EDT 2002 * Specfile/Makefile improvements from Graham Reed. * Case blindness fix from Eric Seppanen. * Deallocation fix from Mike Mayfield. * Wordlist file format changed. 0.2: Tue Aug 20 06:49:42 EDT 2002 * Added mutt-1.4 interface patch * Note: Location of the base directory has changed. 0.1: Mon 19 Aug 2002 03:07:31 * Initial release. vim:tw=79 com=bf\:* ts=8 sts=8 sw=8 ai: LocalWords: BOGOFILTER Exp procmail contrib Spamicity config spamicity LocalWords: bogoutil datestamp MAXTOKENLEN charset stderr syslog gcc malloc LocalWords: calloc Bogosity