SWISH++ Changes =============== ******************************************************************************* 5.7 ******************************************************************************* NEW FEATURES ------------ * LaTeX files can now be indexed directly. (This feature shall be known as feature LATEX.) * Document titles now have multiple whitespace characters squeezed into single whitespace characters. (This feature shall be known as feature STWS.) * index(1) will now use the value of the environment variable TMPDIR if it's set as the default temporary directory. However, the value is still superseded by one of -T, --temp-dir, or TempDirectory if given. (This feature shall be known as feature TMPDIR.) BUG FIXES --------- * HTML comment parsing was broken in that it allowed "->" in addition to "-->" to terminate a comment. (This bug fix shall be known as bug fix HTC.) * Yet more bugs in the thread-pooling code. (This bug fix shall be known as bug fix TPB.) CHANGES, file-by-file --------------------- * config/config.mk 1. Added "latex" to MOD_LIST for feature LATEX. 2. Added I_ETC. * config.h 1. Changed value for SocketQueueSize_Default to 511. * GNUmakefile 1. Added TempDirectory.c for feature TMPDIR. 2. Changed installation of swish++.conf to $(I_ETC) * indexer.c 1. In tidy_title(), added code to squeeze multiple whitespace characters for feature STWS. * man/man1/index.1 1. Added TMPDIR for feature TMPDIR. 2. Added LaTeX section for feature LATEX. 3. Added Leslie Lamport reference for feature LATEX. 4. s/SCCS/CVS/ since nobody uses SCCS any more. * man/man1/search.1 1. Changed default value for -q to 511. 2. Added missing "encoding" to XML example. * man/man4/swish++.conf.4 1. Added missing TempDirectory. * mod/html/mod_html.c 1. In is_html_comment(), reworked code for bug fix HTC. 2. In entity_to_ascii(), added static reference to char_entity_map::instance(). 3. In parse_html_tag() and post_options(), made "elements" reference static. * mod/latex/GNUmakefile * mod/latex/commands.c * mod/latex/commands.h * mod/latex/latex_config.h * mod/latex/mod_latex.c * mod/latex/mod_latex.h 1. New files for feature LATEX. * README.Solaris 1. Removed "ephemeral ports" since that wasn't right. * search_daemon.c 1. Removed: accept_failed(). 2. Added: handle_accept() and reset_socket(). 3. Moved thread_pool object inside of handle_accept(). * search_thread.c 1. Factored out reset_socket(). * swish++.conf 1. Changed value for SocketQueueSize to 511. * TempDirectory.c 1. New file for feature TMPDIR. * TempDirectory.h 1. Added default_value() for feature TMPDIR. 2. Moved #include "config.h" to new .c file for feature TMPDIR. * thread_pool.c 1. s/thread_pool_thread_destroy/thread_pool_thread_cleanup/ 2. Added: thread_pool_decrement_busy(). 3. In thread_pool_thread_main(), changed code so that pool_.t_idle_ is always signalled when idle. 4. In thread_pool_thread_main(), added: pthread_cleanup_push( thread_pool_decrement_busy, t ) to ensure that t->pool_.t_busy_ gets decremented even if the thread is killed. 5. In thread_pool_thread_main(), added DEFER_CANCEL/RESTORE_CANCEL around code that removes a task from the queue. 6. In ~thread(), added DEFER_CANCEL/RESTORE_CANCEL. 7. In thread_pool::thread_pool(), added t_busy_( 0 ) for bug fix TPB. 8. In thread_pool_thread_main(), made signaling of idle independent of the size of the thread pool. 9. Made new_task() take and return a bool argument and queue the task only if it will queue it. * thread_pool.h 1. s/thread_pool_thread_destroy/thread_pool_thread_cleanup/ 2. Added: thread_pool_decrement_busy(). 3. Made new_task() take and return a bool argument. * version.h 1. Upped version to to "5.7". ******************************************************************************* 5.6 ******************************************************************************* NEW FEATURES ------------ * The text/enriched attachment indexer that was part of the Mail module was split off into its own RTF (Rich Text Format) module so stand-alone RTF files can be indexed. (This feature shall be known as feature RTF.) * For search(1) running as a daemon, added code to reset the TCP connection for bad requests. The reason for doing this is so we don't potentially have a socket lingering in TIME-WAIT from a client that was too dumb to give us a valid request in the first place. This helps alleviate denial-of-service attacks (if that's what's going on). This came about due to the way Solaris handles TIME-WAIT. Read the new README.Solaris file for details. This change has no effect in in Linux 2.2.x kernels since sending a reset on close by setting SO_LINGER wasn't implemented. (This feature shall be known as feature RST.) BUG FIXES --------- * The files Group.c and SocketAddress.c didn't compile under FreeBSD. (This bug fix shall be know as big fix BSD5.) CHANGES, file-by-file --------------------- * config/config.mk 1. Added "rtf" to MOD_LIST. 2. Added explanation about module dependencies. * Group.c * SocketAddress.c 1. Added #include for bug fix BSD5. * INSTALL.unix 1. Added mention of README.Solaris file for feature RST. 2. s/www.objectspace.com/www.stlport.org/ 3. Moved module wording in step 2 of building to config/config.mk. * man/man1/index.1 1. Added description of RTF module. * man/man4/swish++.conf.4 1. Added mention of RTF module. 2. Added mention that text/enriched attachments can be indexed only if the RTF module is compiled into index(1). 3. Added mention that text/html attachments can be indexed only if the HTML module is compiled into index(1). 4. Fixed RFC 1563 attribution. * mod/mail/mod_mail.c 1. Removed: #include "platform.h" 2. Added: #include "mod/rtf/mod_rtf.h" 3. Removed index_enriched(). 4. In index_headers(), made "text/enriched" #ifdef'd on mod_rtf. 5. In index_words(), switched to using RTF module. * mod/mail/mod_mail.h 1. Fixed RFC attributions. 2. Made Text_Enriched #ifdef'd on mod_rtf. 3. Removed index_enriched(). * mod/rtf/GNUmakefile * mod/rtf/mod_rtf.c * mod/rtf/mod_rtf.h 1. New files for feature RTF. * README.Solaris 1. New file for feature RST. * search.c 1. Made return-type of search() and service_request() bool for feature RST. * search.h 1. Made return-type of service_request() bool for feature RST. * searchc.in 1. Added call to shutdown() after sending query. * search_thread.c 1. In search_thread::main(), added: out << flush; 2. In search_thread::main(), removed EINTR guard (not needed). * swish++.conf 1. Added: IncludeFile RTF *.rtf * thread_pool.c 1. s/thread_main/thread_pool_thread_main/ 2. s/thread_destroy/thread_pool_thread_destroy/ 3. Replaced q_lock class by simple mutex again. 4. Changed state_ back to destructing_. 5. Added DEFER_CANCEL, RESTORE_CANCEL, MUTEX_LOCK, MUTEX_UNLOCK macros. 6. In thread_pool_thread_destroy(), removed unlocking of q_lock. 7. In thread_pool_thread_main(), moved unlock of run_lock_ here. 8. In thread_pool_thread_main(), reworked mutex locking. 9. In ~thread(), removed mutex_lock of t_lock_. A. In thread_pool(), removed ERRORCHECK attribute. B. In ~thread_pool(), added DEFER_CANCEL(). C. In new_task(), reworked mutex locking. D. In new_task(), Added DEFER_CANCEL(). * thread_pool.h 1. s/thread_main/thread_pool_thread_main/ 2. s/thread_destroy/thread_pool_thread_destroy/ 3. Replaced q_lock class by simple mutex again. 4. Added private default constructor to argument_type. 5. Changed state_ back to destructing_. * version.h 1. Upped version to to "5.6". ******************************************************************************* 5.5.3 ******************************************************************************* NEW FEATURES ------------ * A sample Procmail recipe has been included that can be used to split incoming mail messages into individual files for indexing. (This feature shall be known as feature SIM.) * The indexing word-determination rules have been relaxed somewhat; the following rules have been eliminated: 1. Starts with a capital letter, is of mixed case, and contains more than a third capital letters. This enables words like FedEx to be indexed. 2. Contains a capital letter other than the first. This enables words like iMac to be indexed. (This feature shall be known as feature RWD.) BUG FIXES --------- * When running as a server, search(1) had a memory leak. (This bug fix shall be know as bug fix SML.) * When running as a server, search(1) didn't make the sockets reusable. (This bug fix shall be know as bug fix RSA.) CHANGES, file-by-file --------------------- * GNUmakefile 1. For INITD_DIR and LEVEL_DIR, redirected error output to /dev/null. * INSTALL.unix 1. Added mention of Procmail for feature SIM. * man/man1/index.1 1. Removed mention of removed word-determination rules for feature RWD. * procmailrc 1. New file for feature SIM. * searchd.in 1. Added: KILL=`which kill` 2. Added "|| exit 1" in a few places. 3. Added "sleep 3" in restart case. * search.c 1. In search(), added "delete format" for bug fix SML. * search_daemon.c 1. Added BIND_SOCKET() for bug fix RSA. * searchmonitor.in 1. Added: KILL=`which kill` * version.h 1. Upped version to 5.5.3. * word_util.c 2. In is_ok_word(), removed rules for feature RWD. * www_example/sample.html 1. Converted to XMTML. ******************************************************************************* 5.5.2 ******************************************************************************* BUG FIXES --------- * Indexing attachments has been broken since version 5.2. Major d'oh. (This bug fix shall be known as bug fix IAB.) CHANGES, file-by-file --------------------- * mod/mail/mod_mail.c 1. In index_headers(), put a missing "else" back for bug fix IAB. * version.h 1. Upped version to 5.5.2. ******************************************************************************* 5.5.1 ******************************************************************************* BUG FIXES --------- * Automatic thread-pool size reduction had a race condition where too many threads could be destroyed. (This bug fix shall be known as bug fix TCD.) CHANGES, file-by-file --------------------- * thread_pool.cpp 1. Changed thread::destructing_ to thread::state_ for bug fix TCD. 2. In thread_main(), set thread state to expired before calling delete on it for bug fix TCD. * thread_pool.h 1. Changed thread::destructing_ to thread::state_ for bug fix TCD. * version.h 1. Upped version to 5.5.1. ******************************************************************************* 5.5 ******************************************************************************* NEW FEATURES ------------ * search(1) can now be run as a daemon without it automatically putting itself into the background. This is useful in order to wrap a start script around it and automatically restart it if it dies for any reason. Correspondingly, there are 2 new utility scripts: searchmonitor (a process monitor for search) and searchd (a start/stop script for SysV-like systems). (This bug fix shall be known as feature NOB.) * search(1), when run as a daemon, can give away its root privileges if it started with them. There are now new command-line options of -U, --user, -G, and --group as well as new configuration variables User and Group. (This bug fix shall be known as feature GAR.) BUG FIXES --------- * When search(1) was running as a daemon, it ignored -F and --format options specified via the socket. (This bug fix shall be known as bug fix SDF.) * For very large document sets when many partial indicies were generated, if the number of partial indicies exceeded the maximum number of file descriptors a process could have open, merging would fail. (This bug fix shall be known as bug fix MFD.) CHANGES, file-by-file --------------------- * conf_enum.c * conf_enum.h 1. Added the is_legal() function for bug fix SDF. * config.h 1. Added Group_Default and UserDefault for feature GAR. * conf_var.c 1. In map_ref(), added "user" and "group" for feature GAR. 2. In map_ref(), added "searchbackground" for feature NOB. * GNUmakefile 1. Added Group.c and User.c to S_SOURCES for feature GAR. 2. Removed WIN32 PERL_TARGET conditional since WIN32 isn't set at that point. 3. s/PERL_TARGET/OTHER_TARGET/ 4. Added searchmonitor to OTHER_TARGET for feature NOB. 5. Added INITD_TARGET for feature NOB. 6. Added BIN_TARGET since other targets get installed places other than in a bin directory. 7. Added INITD_DIR and LEVEL_DIR to figure out a SysV system's run level directories for feature NOB. 8. Added installation of /etc/swish++.conf for feature NOB. 9. Added install_sysv target for feature NOB. A. Added uninstallation of start/stop scripts to uninstall target for feature NOB. * Group.c * Group.h 1. New files for feature GAR. * exit_codes.h 1. Added Exit_No_User and Exit_No_Group for feature GAR. 2. Changed Exit_Internal_Error from 255 to 127. * index.c 1. In main(), added maxing out of number of file descriptors to enable more partial indicies to be generated for bug fix MFD. * INSTALL.unix 1. Added step 5 regarding installing the searchd start/stop script for feature NOB. * man/man1/index.1 1. Added missing error codes 40 and 127. * man/man1/search.1 1. Added description of -B and --no-background options and the SearchBackground variable for feature NOB. 2. Added description of -U, --user, -G, and --group options and the User and Group variables for feature GAR. 3. Added subsections to Daemon section. 4. Added mention of giving away root privileges for feature GAR. 5. Added mention of searchmonitor(8) for feature NOB. * man/man4/swish++.conf.4 1. Added mention of SearchBackground variable for feature NOB. 2. Added mention of Group and User variables for feature GAR. * man/man8/GNUmakefile * man/man8/searchd.8 * man/man8/searchmonitor.8 * SearchBackground.h 1. New files for feature NOB. * search.c 1. Added #include "SearchBackground.h" for feature NOB. 2. Added global search_background variable for feature NOB. 3. In main(), added check of search_background_opt for feature NOB. 4. In search_options::search_options(), added initialization of search_background_opt for feature NOB. 5. In search_options::search_options(), added case for 'B' for feature NOB. 6. In usage(), added usage for -B and --no-background for feature NOB. 7. In search(), added results_format parameter for bug fix SDF. 8. In search_options::search_options(), added code to check legality of argument to -F option for bug fix SDF. 9. In service_request(), added opt.results_format_arg to call to search() for bug fix SDF. A. Added #include "User.h" and "Group.h" for feature GAR. B. Added global user and group variables for feature GAR. C. In main(), added check of group_arg and user_arg for feature GAR. D. In search(), added static_cast to get rid of float->int conversion warning. E. In search_options::search_options(), added initialization of user_arg and group_arg for feature GAR. F. In search_options::search_options(), added cases for 'G' and 'U' for feature GAR. G. In Usage(), added description of -G and -U for feature GAR. * search.h 1. Added search_background_opt for feature NOB. 2. Added user_arg and group_arg for feature GAR. * searchd.in * searchmonitor.in 1. New files for feature NOB. * search_daemon.c 1. Added #include "SearchBackground.h" for feature NOB. 2. In become_daemon(), added tests of search_background for feature NOB. 3. Added #include "User.h" and "Group.h for feature GAR. 4. In become_daemon(), added code to change UID/GID for feature GAR. * search_options.c 1. Added no-background option for feature NOB. 2. Added user and group options for feature GAR. * swish++.conf 1. Added SearchBackground for feature NOB. 2. s!/tmp/search.pid!/var/run/search.pid! * User.c * User.h 1. New files for feature GAR. * util.h 1. In max_out_limit(), set limit to infinity if running as root for bug fix MFD. * version.h 1. Updated version to "5.5". ******************************************************************************* 5.4.6 ******************************************************************************* BUG FIXES --------- * On systems (such as Solaris) where /bin/sh is still really Bourne shell (as opposed to bash in disguise), -e tests don't work. (This bug fix shall be known as bug fix DEF.) CHANGES, file-by-file --------------------- * GNUmakefile * init_mod_vars-sh 1. s/-e/-f/ for bug fix DEF. * version.h 1. Updated version to "5.4.6". ******************************************************************************* 5.4.5 ******************************************************************************* BUG FIXES --------- * If AssociateMeta, IncludeFile, IncludeMeta, ExcludeFile, or ExcludeMeta were not given in a configuration file, values given via the command line were discarded. (This bug fix shall be known as bug fix CRA.) * On some systems, the auto-building of dependencies got into an infinite loop since the "dep" directory's timestamp was updated for every dependency file and thus everything that depended on it was always out of date. Why this doesn't happen on all systems isn't clear. (This bug fix shall be known as bug fix DTS.) CHANGES, file-by-file --------------------- * config/config.mk 1. Removed "dep" for bug fix DTS. * config/GNUmakefile 1. s/dep/.*.d/ for bug fix DTS. * conf_var.c 1. In parse_file(), removed call to reset_all() for bug fix CRA. * conf_var.h 1. Made reset_all() public. * GNUmakefile * mod.mk 1. Changed "dep/%.d" (back) to ".%.d" for bug fix DTS. 2. In distclean rule, s/dep/.*.d/ for bug fix DTS. * INSTALL.win32 * INSTALL.unix 1. s/dep/.*.d/ for bug fix DTS. * version.h 1. Updated version to "5.4.5". ******************************************************************************* 5.4.4 ******************************************************************************* BUG FIXES --------- * In index(1), the config-file option wasn't recognized because it was spelled as just "config" in the source code. D'oh! (This bug fix shall be known as bug fix LCO.) * Configuration file variables in modules were somehow being corrupted so some weren't being recognized any longer. I really don't know what was going on. But, module-specific variables weren't recognized at all in search(1). Oops. (This bug fix shall be known as bug fix XCV.) CHANGES, file-by-file --------------------- * conf_var.c 1. In map_ref(), added call to init_mod_vars() for bug fix XCV. * conf_var.h 1. Added init_mod_vars() for bug fix XCV. * GNUmakefile 1. Added init_mod_vars.c to I_SOURCES, S_SOURCES, and E_SOURCES for bug fix XCV. 2. Added rule to make init_mod_vars.c for bug fix XCV. * index.c 1. In main(), s/config/config-file/ for bug fix LCO. * init_mod_vars-sh 1. New file for bug fix XCV. * mod/html/mod_html.c * mod/html/mod_html.h * mod/mail/mod_mail.c * mod/mail/mod_mail.h 1. Moved constructor to .h file and removed register_var() for bug fix XCV. * mod/html/vars * mod/mail/vars 1. New files for bug fix XCV. * version.h 1. Updated version to "5.4.4". ******************************************************************************* 5.4.3 ******************************************************************************* BUG FIXES --------- * When compiling without the search daemon, search(1) wouldn't link because it needs conf_enum.o and it wasn't compiled. (This bug fix shall be know as big fix CEO.) * The file thread_pool.c didn't compile under FreeBSD. (This bug fix shall be know as big fix BSD4.) CHANGES, file-by-file --------------------- * config/config.mk 1. In "OS selection" section, added comment for Mac OS X. * GNUmakefile 1. Moved conf_enum.c so that it's always compiled for bug fix CEO. * INSTALL.unix 1. Added fact that g++ 2.95.2 works. 2. Added note about g++ 2.96. * thread_pool.c 1. Added "#ifndef FreeBSD" around use of PTHREAD_MUTEX_ERRORCHECK for bug fix BSD4. * version.h 1. Updated version to "5.4.3". ******************************************************************************* 5.4.2 ******************************************************************************* BUG FIXES --------- * The "classic" results formatting was broken in that the result separator wasn't output in all the places it should be. How I didn't catch this isn't clear. (This bug fix shall be know as bug fix CFS.) CHANGES, file-by-file --------------------- * classic_formatter.c 1. In result(), added missing "results_separator" for bug fix CFS. * version.h 1. Updated version to "5.4.2". ******************************************************************************* 5.4.1 ******************************************************************************* BUG FIXES --------- * The command-line option spec. building introduced in version 5.4 was broken. (This bug fix shall be know as bug fix COS.) CHANGES, file-by-file --------------------- * indexer.c 1. In indexer::all_mods_options(), s/++option_count/*c++ = *s/ for buf fix COS. * version.h 1. Updated version to "5.4.1". ******************************************************************************* 5.4 ******************************************************************************* NEW FEATURES ------------ * Search results can now optionally be output in XML. (This feature shall be known as feature XML.) * The modular indexing rearchitecture is now complete. CHANGES, file-by-file --------------------- * classic_formatter.c * classic_formatter.h 1. New files for feature XML. * conf_var.c 1. In map_ref(), removed ExcludeClass and FilterAttachment. 2. Added: register_var() 3. In map_ref(), added ResultsFormat for feature XML. * conf_var.h 1. Added: register_var() * file_info.c 1. Reordered mem-initializers to match new order in declaration for feature XML. 2. Added file_info( unsigned char const* ) for feature XML. * file_info.h 1. Added file_info( unsigned char const* ) for feature XML. 2. Reordered data members to facilitate new constructor for feature XML. * GNUmakefile 1. Added file_info.c, classic_formatter.c, ResultsFormat.c, results_formatter.c, and xml_formatter.c to S_SOURCES for feature XML. * index.c 1. Removed #include of mod_html .h files. 2. Removed mod_html command-line options. 3. In main(), added code to gather all module options. 4. In main(), moved code to dump HTML elements into mod_html. 5. In usage(), removed mod_html usage. 6. In usage(), added call to: indexer::all_mods_usage(). * indexer.c * indexer.h 1. Added any_mod_claims_option(), all_mods_options(), all_mods_post_options(), all_mods_usage(), claims_option(), option_spec(), post_options(), and usage(). * INSTALL.unix 1. Updated Unix prerequisites. * man/man1/search.1 1. Added XML results description for feature XML. 2. Added -F, --format, and ResultsFormat for feature XML. 3. Corrected wording regaring titles. 4. For -P and --pid-file, added mention of default being none. 5. For -u and --socket-file, added mention of default being /tmp/search.socket. 6. In meta data query examples, removed mention of "HTML or XHTML" since other document types can have meta information. 7. Added XML output caveat for feature XML. 8. Added reference to XML specification for feature XML. * man/man4/swish++.conf.4 1. Added ResultsFormat for feature XML. * mod/html/mod_html.c 1. Moved constructor definition here. 2. In constructor, added call to register_var( "excludeclass" ); 3. Moded global dump_html_elements_opt definition here. 4. Added claims_option(), option_spec(), post_options(), and usage(). * mod/html/mod_html.h 1. Mode constructor definition to mod_html.c. 2. Added claims_option(), option_spec(), post_options(), and usage(). * mod/mail/mod_mail.c 1. Moved constructor definition here. 2. Added call to register_var( "filterattachment" ). * mod/mail/mod_mail.h 1. Moved constructor definition to mod_mail.c. * README 1. Added "XML search results" for feature XML. * ResultsFormat.c * ResultsFormat.h * results_formatter.c * results_formatter.h 1. New files for feature XML. * search.c 1. Added #include of classic_formatter.h, file_info.h, ResultsFormat.h, results_formatter.h, ResultsMax.h, and xml_formatter.h for feature XML. 2. Added global results_format for feature XML. 3. In main(), added test of opt.results_format_arg for feature XML. 4. In search(), replaced result output with new result formatter classes for feature XML. 5. In search_options::search_options(), added initialization of restuls_format_arg for feature XML. 6. In search_options::search_options(), added case 'F' for feature XML. 7. Rewrote write_file_info() using a file_info. 8. In usage(), added usage message for -F option for feature XML. * search.h 1. Added results_format_arg for feature XML. * search_options.c 1. Added "format" for feature XML. * SearchResults.dtd 1. New file for feature XML. * swish++.conf 1. Added ResultsFormat for feature XML. * version.h 1. Upped version to "5.4." * xml_formatter.c * xml_formatter.h 1. New files for feature XML. ******************************************************************************* 5.3.6 ******************************************************************************* NEW FEATURES ------------ * When compiling using g++, added additional compiler options to reduce code size and slightly improve performance. (This feature shall be known as feature GPPO.) BUG FIXES --------- * Indexing files via standard input where the order of the directories wasn't "monotonically increasing," didn't work: files ended up in the wrong directory. As a beneficial consequence, the -D and -G options and the DirectoriesGrow and DirectoriesReserve variables are no longer needed. (This bug fix shall be known as big fix ISI2.) * Destroying a thread_pool's threads didn't work properly in that the clean-up function for all threads didn't get called. (This didn't matter for SWISH++ since search(1) never destroys its thread_pool.) (This bug fix shall be known as big fix TPD.) CHANGES, file-by-file --------------------- * config/config.mk 1. Added -D_XOPEN_SOURCE=500 for compiling search daemon under Linux for bug fix TPD. 2. Added DEBUG variable. 3. Added -fno-rtti to CCFLAGS for feature GPPO. 4. Added -fomit-frame-pointer to OPTIM for feature GPPO. * config/mod.mk 1. s/DEBUG/DEBUGFLAGS/ * config.h 1. Removed DirectoriesGrow_Default and DirectoriesReserve_Default as part of bug fix ISI2. * conf_var.c 1. In parse_file(), added reset_all(). 2. In reset_all(), added check for null pointer. 3. Removed DirectoriesGrow and DirectoriesReserve as part of bug fix ISI2. * DirectoriesGrow.h * DirectoriesReserve.h 1. Removed these files as part of bug fix ISI2. * directory.c 1. Changed do_file() to take a second dir_index argument for bug fix ISI2. 2. Changed return type of check_add_directory() to return a directory index for bug fix ISI2. 3. In check_add_directory(), changed from using a set to a map where the value is the directory index for bug fix ISI2. 4. In do_check_add_file() and do_directory(), made it get and pass dir_index to do_file() for bug fix ISI2. 5. Removed directories_reserve as part of bug fix ISI2. * directory.h 1. s/dir_list/dir_set/ * do_file.c 1. When compiled for index(1), made do_file() take a second argument of dir_index for bug fix ISI2. 2. Added dir_index to call to file_info() constructor for bug fix ISI2. * exit_codes.h 1. s/Exit_No_Init_Condition/Exit_No_Init_Thread_Condition/ 2. s/Exit_No_Init_Mutex/Exit_No_Init_Thread_Mutex/ * file_info.c 1. Made first constructor take dir_index argument for bug fix ISI2. 2. Removed second constructor for bug fix ISI2. 3. Integrated construct() into lone constructor. * file_info.h 1. Removed constructor not taking dir_index for bug fix ISI2. 2. Removed construct(). * GNUmakefile 1. Added DEBUGFLAGS. 2. For $(MOD_LIBS) rule, s/DEBUG/DEBUGFLAGS/ * index.c 1. In load_old_index(), removed new_strdup()'s since file_info is now doing them for bug fix ISI2. 2. Removed DirectoriesGrow and DirectoriesReserve as part of bug fix ISI2. 3. Removed -D and -G options as part of bug fix ISI2. 4. In write_dir_index(), added code to order the directories for bug fix ISI2. 5. Moved definition of exlude_class_names to mod_html.c. * index_header.c 1. s/dir_list/dir_set/ for bug fix ISI2. * man/man1/httpindex.1 1. Added -e's to example. * man/man1/index.1 1. Removed -D, --dirs-reserve, -G, --dirs-grow, DirectoriesGrow, and DirectoriesReserve as part of bug fix ISI2. * man/man4/swish++.conf.4 1. Removed DirectoriesGrow and DirectoriesReserve as part of bug fix ISI2. * mod/html/mod_html.c 1. Moved definition of exlude_class_names here. * search_daemon.c 1. In set_signal_handlers(), removed SA_RESTART. * swish++.conf 1. Removed DirectoriesGrow and DirectoriesReserve as part of bug fix ISI2. * thread_pool.c 1. In thread_destroy(), added code to unlock q_lock for bug fix TPD. 2. In thread_destroy(), added code to decrement q_lock's reference count for bug fix TPD. 3. In thread_destroy(), added code to deallocate thread object storage for bug fix TPD. 4. In thread_main(), added code to increment q_lock's reference count for bug fix TPD. 5. s/Exit_No_Init_Condition/Exit_No_Init_Thread_Condition/ 6. s/Exit_No_Init_Mutex/Exit_No_Init_Thread_Mutex/ 7. In ~thread(), removed destructing_lock_ since I don't think it's needed. 8. In ~thread(), added optimization for a thread committing suicide. 9. Added q_lock::dec_ref() and q_lock::inc_ref() functions for bug fix TPD. A. In thread_pool::thread_pool(), created q_lock with the PTHREAD_MUTEX_ERRORCHECK attribute for bug fix TPD. B. In ~thread_pool(), removed destructing_lock_ since I don't think it's needed. C. In ~thread_pool(), added code to decrement q_lock's reference count for bug fix TPD. * thread_pool.h 1. Overrode thread::operator delete() to do nothing for bug fix TPD. 2. Changed q_lock_ to a reference-counted object for bug fix TPD. 3. Made max/min threads and thread timeout settable after thread_pool creation. * version.h 1. Upped version to "5.3.6." ******************************************************************************* 5.3.5 ******************************************************************************* NEW FEATURES ------------ * The code for modules has been reorganized into subdirectories that build libraries with the goal of having a completely modular indexing architecture similar to the way Apache has modules. This is a work-in-progress. BUG FIXES --------- * Version 5.1 broke indexing file names via standard input. D'oh! (This bug fix shall be known as bug fix FSI.) * Version 5.1 also added unnecessary work for extract(1). (This bug fix shall be known as bug fix ESI.) * The thread::~thread() destructor mistakenly killed the calling thread rather than itself. Oops. This didn't actually matter for SWISH++ since it's never called. (This bug fix shall be known as bug fix TPTD.) CHANGES, file-by-file --------------------- * config.h 1. Moved MOD_HTML parameters to mod/html/html_config.h. * config/config.mk 1. Changed format of MOD_LIST. 2. Added RANLIB. 3. Moved auto-dependency generation here. * config/GNUmakefile 1. Moved TARGET definition before include. 2. Added removal of accidental dep subdirectory. * config/mod.mk 1. New file for modularization. * conf_var.c 1. s/MOD_HTML/mod_html/ 2. s/MOD_MAIL/mod_mail/ * directory.c 1. Made this file #include'd by index.c and extract.c for bug fix FSI. 2. Added: #include "fake_ansi.h" 3. Added "#ifdef INDEX" in various places for bug fix ESI. 4. Added do_check_add_file() for bug fix FSI. * directory.h 1. Removed local #include's and follow_symbolic_links and function declarations for bug fix FSI. * elements.c * elements.h * entities.c * entities.h * ExcludeClass.h * mod_html.c * mod_html.h 1. Moved to mod/html subdirectory. * encoded_char.c * encoded_char.h 1. s/MOD_MAIL/mod_mail/ * extract.c 1. Moved #include of platform.h first so PJL_NO_SYMBOLIC_LINKS would be defined at the right time. 2. Added: #include "FollowLinks.h" for bug fix ESI. 3. Added: #include "directory.c" for bug fix ESI. 4. s/::strdup()/new_strdup()/ * ExcludeFile.c * ExtractFile.c 1. s/::strdup()/new_strdup()/ * file_info.c 1. s/::strdup()/new_strdup()/ * FollowLinks.h 1. s/follow_links/follow_symbolic_links/ * FilterAttachment.h * mod_mail.c * mod_mail.h 1. Moved to mod/mail subdirectory. * GNUmakefile 1. Moved target definition before include. 2. s/C_TARGET/CPP_TARGET/ 3. Added MOD_LIBS, MOD_LIB_PATHS, MOD_LINK. 4. s/I_SRCS/I_SOURCES/, s/I_OBJS/I_OBJECTS/, s/S_SRCS/S_SOURCES/, s/S_OBJS/S_OBJECTS/, s/E_SRCS/E_SOURCES/, s/E_OBJS/E_OBJECTS/ 5. Removed module-specific .c files. 6. Removed $(CCLINK) -- not used. 7. Made use of $(MOD_LINK) 8. Removed entities.c from E_SOURCES -- not used. 9. Added $(MOD_LIBS) to index dependencies. A. Added rule to build init_modules.c B. Added ruleto build module libraries. C. Moved auto-dependency generation to config/config.mk. D. Added MAKE_SUBDIRS function and made use of it in clean, distclean. E. Removed directory.c from I_SOURCES AND E_SOURCES for bug fix ESI. * IncludeFile.c * IncludeMeta.c 1. s/::strdup()/new_strdup()/ * index.c 1. Changed includes to use mod/html form. 2. s/MOD_HTML/mod_html/ 3. Moved #include of platform.h first so PJL_NO_SYMBOLIC_LINKS would be defined at the right time. 4. Added: #include "FollowLinks.h" for bug fix FSI. 5. Added: #include "directory.c" for bug fix FSI. 6. In main(), s/do_file()/do_check_add_file()/ 7. s/::strdup()/new_strdup()/ * indexer.c 1. s/::strdup()/new_strdup()/ * init_modules.c 1. Removed since it's not automatically generated. * init_modules-sh 1. Added to generate init_modules.c automatically. * mod_man.c * mod_man.h 1. Moved to mod/man subdirectory. * mod/html/html_config.h 1. Moved MOD_HTML-specific configuration parameters here. * mod/html/ExcludeClass.h * mod/html/GNUmakefile * mod/html/elements.c * mod/html/elements.h * mod/html/entities.c * mod/html/entities.h * mod/html/mod_html.c * mod/html/mod_html.h * mod/mail/FilterAttachment.h * mod/mail/GNUmakefile * mod/mail/encoded_char.c * mod/mail/mod_mail.c * mod/mail/mod_mail.h * mod/man/GNUmakefile * mod/man/mod_man.c * mod/man/mod_man.h 1. Moved from top-level directory. * stem_word.c * stop_words.c 1. s/::strdup()/new_strdup()/ * thread_pool.c 1. Fixed thread::~thread() for bug fix TPTD. * util.c * util.h 1. Removed unneeded #include's. * version.h 1. Changed version to "5.3.5". ******************************************************************************* 5.3.4 ******************************************************************************* BUG FIXES --------- * File titles turned to garbage when indexing file incrementally. (This bug fix shall be known as IIT.) * option_stream's test main() was incorrectly defined inside the PJL namespace. (This bug fix shall be know as bug fix OSM.) * option_stream didn't report an error for an option that required an argument when no argument was given when said option was the last thing on the commend line. (This bug fix shall be know as bug fix OSN.) CHANGES, file-by-file --------------------- * index.c 1. Merged parse_file_info() function into load_old_index(). 2. In (what is now) load_old_index(), added a strdup() for the file's title for bug fix IIT. 3. In write_dir_index(), switched to using FOR_EACH(). * option_stream.c 1. Got rid of option_stream::option::copy(). 2. Moved the test main() outside of the PJL namespace for bug fix OSM. 3. s/c_/short_name_/, s/index_/argi_/. 4. Added was_short_option_ variable for bug fix OSN. 5. Replaced some duplicated option argument code with a goto. 6. Reworked argument processing for buf fix OSN. * option_stream.h 1. Got rid of option_stream::option::copy() and destructor. 2. Made copy constructor and assignment operator private. 3. Renamed the following: s/c/short_name/, s/index_/argi_/ * version.h 1. Changed version to "5.3.4". ******************************************************************************* 5.3.3 ******************************************************************************* BUG FIXES --------- * SIGPIPE wasn't handled at all so a search client that disconnected unexpectedly could crash the server. Now that this has been fixed, the server also needs to check the state of the outgoing stream during writes for an error: if an error occurs, assume the client disconnected from the socket and stop sending output. (This bug fix shall be known as bug fix PIPE.) * On Linux systems, multiple reads from the search daemon timed out sooner than requested because Linux modifies the timeval struct passed to select() to reflect the amount of time not slept. (This bug fix shall be known as bug fix LMR.) CHANGES, file-by-file --------------------- * search.c 1. In main(), s/search_options opt/search_options const opt/ 2. In dump_single_word(), dump_word_window(), several places in search() and service_request, added a check for the state of the "out" stream for bug fix PIPE. * search_daemon.c 1. Added set_signal_handlers() for bug fix PIPE. * search_thread.c 1. In search_thread::main(): s/search_options opt/search_options const opt/ 2. In timed_read_line(), reworked the timeout such that the timeval struct is always initialized properly for every loop iteration for bug fix LMR. * version.h 1. Changed version to "5.3.3". ******************************************************************************* 5.3.2 ******************************************************************************* BUG FIXES --------- * On some platforms, index(1) would index "0 words" for every file. (This bug fix shall be known as bug fix IZW.) * There was a race condition in the search daemon thread pool code whereby the prototype thread could begin executing before its owning thread_pool was fully constructed. (This bug fix shall be known as bug fix TPR.) CHANGES, file-by-file --------------------- * encoded_char.h 1. Made encoded_char_range::const_iterator's ch_ and decode() compile in only when MOD_MAIL is defined for bug fix IZW. 2. In encoded_char_range::const_iterator::operator*(), made it return *pos_ when MOD_MAIL wasn't defined for bug fix IZW. * exit_codes.h 1. Added Exit_No_Init_Condition and Exit_No_Init_Mutex. * man/man1/search.1 1. Added exit coded 66 and 67. * thread_pool.c 1. In thread_main(), added "::pthread_mutex_lock( &t->run_lock_ );" for bug fix TPR. 2. In thread_pool::thread::thread(), added initialization and locking of run_lock_ for bug fix TPR. 3. In thread_pool::thread::~thread(), added destruction of run_lock_ for bug fix TPR. 4. Changed thread_pool::thread_pool() to take a pointer to non-const thread for bug fix TPR. 5. In thread_pool::thread_pool(), made use of Exit_No_Init_Condition and Exit_No_Init_Mutex. 6. In thread_pool::thread_pool(), added prototype thread to pool and now creating thread_min - 1 additional threads. 7. In thread_pool::thread_pool(), s/create/create_and_run()/ for bug fix TPR. 8. In thread_pool::new_task(), s/create/create_and_run()/ for bug fix TPR. * thread_pool.h 1. Added thread_pool::thread::run_lock_ for bug fix TPR. 2. Added thread_pool::thread::run() and create_and_run() for bug fix TPR. 3. Changed thread_pool::thread_pool() to take a pointer to non-const thread for bug fix TPR. * version.h 1. Changed version to "5.3.2". ******************************************************************************* 5.3.1 ******************************************************************************* BUG FIXES --------- * Searching with more that two "and" terms caused a core cump. This bug was a result of the "enhancement" to doing multiple "and" terms in 5.3. (This bug fix shall be known as bug fix MAT.) * Compiling with all but the text module produced a syntax error. (This bug fix shall be known as bug fix NMS.) CHANGES, file-by-file --------------------- * conf_bool.h * conf_int.h 1. Removed extraneous backslash. * init_modules.c 1. Added #include "indexer" for bug fix NMS. * query.c 1. In perform_and(), added needed "break" for bug fix MAT. * version.h 1. Changed version to "5.3.1". ******************************************************************************* 5.3 ******************************************************************************* BUG FIXES --------- * The weighting of multiple "and" terms has been fixed. Previously, the query: mouse and computer and keyboard was parsed and treated as: (mouse and computer) and keyboard 25% 25% 50% The problem was that the last term always got 50% of the weighting and the rest got 50% divided by the number of terms minus 1. In order to weight all the terms equally, the "and" results for each term are now saved in a list and then and'ed together at the end. (This bug fix shall be known as bug fix MAW.) * The index(1) manual page didn't explicitly state that words are converted to lower case prior to indexing. CHANGES, file-by-file --------------------- * conf_var.c 1. Changed from abort() to internal_error. * exit_codes.h 1. Added: Exit_Internal_Error * filter.c * indexer.c * option_stream.c * thread_pool.c 1. Changed from abort() to internal_error. * man/man1/index.1 1. Added paragraph at the end of the "Word Determination" subsection addressing conversion to lower case prior to indexing. * query.c 1. Changed from abort() to internal_error. 2. Moved declarations for get_meta_id(), parse_meta(), parse_primary(), and parse_optional_relop() here from query.h. 3. In parse_meta() and parse_primary(), got rid of unused default value. 4. Changed what what parse_query() to parse_query2(). 5. Added a new parse_query(). 6. Added and_results_type argument to parsing functions for bug fix MAW. 7. In parse_query2(), deferred and'ing of results for bug fix MAW. 8. Added perform_and() function for bug fix MAW. * query.h 1. Moved declarations for get_meta_id(), parse_meta(), parse_primary(), and parse_optional_relop() to query.c. 2. Added: stop_word_set 3. s/set< string >/stop_word_set/ 4. For parse_query(), got rid of unneeded bool& and int arguments. 5. s/search_results_type/search_results/ 6. s/find_results_type/find_results/ * search.c 1. s/set< string >/stop_word_set/ 2. In search(), got rid of unused "ignore" variable. 3. s/search_results_type/search_results/ 4. s/find_results_type/find_results/ * util.h 1. Added internal_error and report_error(). * version.h 1. Upped version. ******************************************************************************* 5.2 ******************************************************************************* NEW FEATURES ------------ * E-mail attachments can now be filtered by external programs. (This feature shall be know as feature AFP.) CHANGES, file-by-file --------------------- * conf_filter.c * conf_filter.h 1. Replaced FilterFile.c and made generic for feature AFP. * conf_var.c 1. Added filterattachment for feature AFP. * do_file.c * extract.c 1. s/filters/file_filters/ for featuer AFP. * filter.h 1. Added: substitute( std::string const &file_name ); * FilterAttachment.h 1. Added this file for feature AFP. * FilterFile.c 1. Replaced by conf_filter.c * FilterFile.h 1. Made FilterFile derived from conf_filter for feature AFP. 2. s/filters/file_filters/ for featuer AFP. * GNUmakefile 1. Added conf_filter.c for feature AFP. 2. Removed FilterFile.c for feature AFP. * index.c 1. s/filters/file_filters/ for featuer AFP. * man/man1/extract.1 1. Added FilterAttachment for feature AFP. * man/man1/index.1 1. Added mention of FilterAttachment for feature AFP. 2. s/-D/-G/ * man/man4/swish++.conf.4 1. Added "Filter variables" section. 2. Added information on filtering attachments for feature AFP. 3. Added more references. * mod_mail.c 1. Added #include's for , , "FilterAttachment.h", and "Verbosity.h" for feature AFP. 2. Added "attachment_filters" declaration for feature AFP. 3. Added index_via_filter() for feature AFP. 4. In index_headers(), added code for filters for feature AFP. 5. In index_words(), added case for External_Filter for feature AFP. * mod_mail.h 1. Added "External_Filter" for feature AFP. 2. Changed message_type from s pair<> to a struct for feature AFP. * README 1. Added mention of filtering attachments for feature AFP. * swish++.conf 1. Added FilterAttachment section for feature AFP. 2. Added: FilterFile *.ps pstotext %f > @%F.txt 3. Added: FilterFile *.bz2 bunzip2 -c %f > @%F * version.h 1. Upped version. ******************************************************************************* 5.1 ******************************************************************************* NEW FEATURES ------------ * Reduced index storage size by recording directory names once. Note that the old -G option for index(1) has changed to -g and that there is a new -G option. (This feature shall be known as feature DIR1.) BUG FIXES --------- * The swish++.conf(4) manual page was missing FilesReserve and ResultsMax. (This bug fix shall be known as bug fix MFR.) CHANGES, file-by-file --------------------- * bcd.h 1. Added: #include "fake_ansi.h" * config.h 1. Added DirectoriesGrow_Default and DirectoriesReserve_Default for feature DIR1. * config/config.mk 1. Added g++ 3.0-specific warnings to CCFLAGS for development purposes. * conf_bool.h * conf_enum.h * conf_int.h * conf_set.h * conf_string.h 1. Added: #include "fake_ansi.h" * conf_percent.c * conf_percent.h * DirectoriesGrow.h * DirectoriesReserve.h 1. Added these files for feature DIR1. * conf_var.c 1. Added DirectoriesGrow and DirectoriesReserve configuration variables for feature DIR1. * directory.c 1. Added dir_list for feature DIR1. 2. Added directories_reserve. 3. Added check_add_directory() for feature DIR1. 4. s/queue< string >/queue< char const* >/ 5. Switched from using std::string to create the current path to using a simpler char buffer. 6. Made sure the directory that is passed to do_directory() recursively has been strdup()'d. * directory.h 1. Added dir_list for feature DIR1. 2. Added check_add_directory() for feature DIR1. 3. Added: #include "fake_ansi.h" * elements.h * entities.h 1. Added: #include "fake_ansi.h" * ExcludeFile.h 1. Added extern declaration. * extract.c 1. s/do_directory( file_name )/do_directory( ::strdup( file_name ) )/ * ExtractFile.h * ExtractFilter.h 1. Added extern declaration. * ExtractExtension.h 1. Added extern declaration. 2. Added: #include "fake_ansi.h" * fake_ansi.h 1. Removed __STL_NO_NAMESPACES and __STL_USE_NAMESPACES. This stopped working and I can't figure out why. * FilesGrow.c 1. This functionality was replaced by conf_percent.c for feature DIR1. * FilesGrow.h 1. Changed to be derived from conf_percent. 2. Added extern declaration. * file_info.c 1. Added result_separator. 2. Redid the constructor mem-initializers. 3. Moved common constructor code to construct(). 4. Added a second constructor used for reconsituting instances during incremental indexing. 5. Moved code for parse() to index.c. * file_info.h 1. Removed: #include 2. Added: #include "fake_ansi.h" 3. Added a second constructor used for reconsituting instances during incremental indexing. 4. Added dir_index() and dir_index_ for feature DIR1. 5. Added: construct() 6. Made all data mambers private and added accessor functions. 7. s/struct/class/ 8. Added: const_iterator, begin(), end(), ith_info(), and num_files(). * FilterFile.h 1. Added extern declaration. 2. Removed: #include * fnmatch.h 1. Removed unused #ifndef's. 2. Added #undef's. 3. Removed FNM_ERROR since it's not used. * FollowLinks.h 1. Added extern declaration. * GNUmakefile 1. Added conf_percent.c for feature DIR1. 2. Removed FilesGrow.c for feature DIR1. 3. Removed file_info.c from S_SRCS since file_info::out() has been moved to write_file_info() in search.c for feature DIR1. 4. Added query.c to S_SRCS. * IncludeFile.h 1. Added extern declaration. * IncludeMeta.h 1. Added: #include "fake_ansi.h" * Incremental.h 1. Added extern declaration. * index.c 1. Added DirectoriesGrow and DirectoriesReserve for feature DIR1. 2. Added my_write() since ostream::write() now apparantly requires a char* rather than a void* and I'm lazy about having to cast the pointers. 3. Added dirs-reserve and dirs-grow command-line options for feature DIR1. 4. Added #ifdef PJL_GCC_295. 5. In load_old_index(), added loading of directory index for feature DIR1. 6. Moved index-file header-writing code to index_header.c. 7. Added write_dir_index() for feature DIR1. 8. Added new options to usage message for feature DIR1. 9. In usage(), s//title/. A. In main(), added: check_add_directory( "." ); B. s/file_info::parse/parse_file_info/ C. Added parse_file_info(). D. s/do_directory( file_name )/do_directory( ::strdup( file_name ) )/ * IndexFile.h 1. Added: #include "fake_ansi.h" * indexer.h 1. Added: #include "fake_ansi.h" * index_header.c 1. Added this file to have index-file header-writing code only once. * index_segment.h 1. Added dir_index for feature DIR1. * man/man1/index.1 1. Added -D, --dirs-reserve options for feature DIR1. 2. Changed old -G option to -g for feature DIR1. 3. Added new -G, --dirs-grow options for feature DIR1. 4. Added missing FilesGrow variable for bug fix MFR. 5. Added DirectoriesGrow and DirectoriesReserve variables for feature DIR1. * man/man4/swish++.conf.4 1. Added missing FilesReserve variable for bug fix MFR. 2. Added DirectoriesGrow and DirectoriesReserve vairable for feature DIR1. 3. Added "Percentage variables" section. * man/man4/swish++.index.4 1. Added directory index description. 2. Added other module cases describing a file's title. 3. Made separate BCD subsection. * meta_map.h * mmap_file.h * mod_html.h * mod_mail.h 1. Added: #include "fake_ansi.h" * mod_man.c 1. In index_words(), s/register char const* c/char const* c/ since its address is taken. * my_set.h 1. Moved declaration of #include "fake_ansi.h". * omanip.h 1. Added: #define PJL /* nothing */ 2. Added: #include "fake_ansi.h" * option_stream.h * pattern_map.h * PidFile.h 1. Added: #include "fake_ansi.h" * query.c * query.h 1. Split out thr query-parsing code from search.c to here. * ResultsMax.h 1. Added extern declaration. * ResultSeparator.h 1. Added extern declaration. 2. Added: #include "fake_ansi.h" * search.c 1. Added "directories" index_segment global variable for feature DIR1. 2. Moved file_info::out() to write_file_info() for feature DIR1. 3. Moved result_separator definition here for feature DIR1. 4. Moved query-parsing code to query.c. * search.h * SocketAddress.h * SocketFile.h 1. Added: #include "fake_ansi.h" * StemWords.h 1. Added extern declaration. * StopWordFile.h 1. Added: #include "fake_ansi.h" * swish++.conf 1. Added DirectoriesGrow and DirectoriesReserve for feature DIR1. * TempDirectory.h 1. Added: #include "fake_ansi.h" * thread_pool.c 1. Added start_function_type to thread() constructor. * thread_pool.h 1. Added start_function_type to thread() constructor. 2. Added: #define PJL /* nothing */ 3. Added: #include "fake_ansi.h" * util.h 1. Added: #include "fake_ansi.h" * version.h 1. Updated version to "5.1". * WordFilesMax.h * WordPercentMax.h 1. Added extern declaration. ******************************************************************************* 5.0.1 ******************************************************************************* BUG FIXES --------- * This releases fixes a lot of compile issues (mostly namespaces) with g++ 3.0. (This bug fix shall be known as bug fix GCC3.) * The changes to fix the above have apparantly caused bugs in (at least) g++ 2.95.3 to manifest themselves: 1. In some cases, the compiler "forgets" that operator<<( ostream&, string const& ) has been defined. The hack workaround is to use operator<<( ostream&, char const* ) and use string::c_str(). 2. The compiler "forgets" that stream manipulators have been defined. The workaround is not to use them. :-( (This fix shall be known as fix OOS.) CHANGES, file-by-file --------------------- * bcd.h 1. Switched to using local omanip since depending on the underlying C++ implementation is not portable. This was done for GCC3. * config/config-sh 1. Added PJL_GCC_295 since it's used in multiple places. This was done for OOS. * config/config.mk 1. Made OPTIM = -O2 for g++ also since the optimizer under 3.0 takes ridiculously long and uses most of the CPU and memory. 2. s/($(CC),g++)/($(findstring g++,$(CC)),g++)/ * conf_var.h 1. s/cerr/std::cerr/ for OOS. * do_file.c 1. s/basename/pjl_basename/ due to name collision. * fake_ansi.h 1. Replaced __GNUC__, et al, with PJL_GCC_295 for OOS. * fdbuf.c * fdbuf.h 1. Added these files since the ability to attach an fstream to a Unix file descriptor has been removed from ANSI C++. This was done for OOS. * filter.c 1. s/basename/pjl_basename/ due to name collision. * filter.h 1. s/std::unlink/::unlink/ for OOS. * GNUmakefile 1. Added fdbuf.c to S_SRCS for OOS. * index.c 1. Added my_write() since ostream::write() now apparantly requires a char* rather than a void* and I'm lazy about having to cast the pointers. This was done for OOS. 2. s/o.write( /my_write( o, / for OOS. 3. Added #ifdef PJL_GCC_295 for fix OOS. * index_segment.h 1. s/random_access_iterator_tag/std::random_access_iterator_tag/ for OOS. * less.h 1. Added needed "namespace std { ... }" for OOS. * mmap_file.c 1. s/ios::open_mode/ios::openmode/ for OOS. * mmap_file.h 1. Added missing #include <fstream> for OOS. 2. s/ios::open_mode/std::ios::openmode/ for OOS. 3. s/reverse_bidirectional_iterator/std::reverse_bidirectional_iterator/ for OOS. * omanip.h 1. Added this file to roll own ostream manipulator since depending on the underlying C++ implementation is not portable. This was done for OOS. * option_stream.h 1. s/cerr/std::cerr/ for OOS. * pattern_map.h 1. Removed PJL_LOCAL_FNMATCH since it's not needed. 2. s/unary_function/std::unary_function/ for OOS. 3. s/std::fnmatch/::fnmatch/ for OOS. * search.c 1. s/#include <iomanip>/#include "omanip"/ for OOS. 2. Added #ifdef PJL_GCC_295 for fix OOS. * search.h 1. s/cerr/std::cerr/ for OOS. 2. s/cout/std::cout/ for OOS. * search_thread.c 1. Removed #include <fstream> for OOS. 2. Added #include "fdbuf.h" for OOS. 3. Switched to using fdbuf since the ability to attach an fstream to a Unix file descriptor has been removed from ANSI C++. * stem_word.h 1. s/less/std::less/ * util.h 1. Added missing #include <iostream> 2. s/basename/pjl_basename/ due to name collision. 3. s/std::stat/::stat/ for OOS. 4. s/std::lstat/::lstat/ for OOS. 5. s/cerr/std::cerr/ for OOS. 6. s/endl/std::endl/ for OOS. * version.h 1. Updated version to "5.0.1". * word_info.c 1. Added missing "using namespace std;" for OOS. ******************************************************************************* 5.0 ******************************************************************************* NEW FEATURES ------------ * The indexing code has bee rearchitected to be modular allowing for new file formats to be indexed directly (without filters). Consequently, the indexing of HTML files has been turned into a module. The -e option and IncludeFile variable are now INCOMPATIBLE with previous releases. Read the updated documentation. (This feature shall be known as feature MOD.) * A filter module for mail (and news) files has been added. (This feature shall be known as feature MAIL.) * A filter module for manual page files has been added. (This feature shall be known as feature MAN.) * For index, a new -A or --no-assoc-meta option and AssociateMeta configuration variable have been added. (This feature shall be known as feature AMN.) * There is a new %E (second-to-last filename extenstion) substitution. (This feature shall be known as feature 22L.) * FilterFile configuration lines are now different and INCOMPATIBLE with previous releases. The @ character no longer does substitutions but merely marks the target filename. This was done to enable filtering to files having a fixed name to be able to handle filenames with spaces better. (This feature shall be known as feature SM2.) * The search daemon can now answer queries via TCP sockets in addition to Unix domain sockets. (This feature shall be known as feature TCP.) * You can now specify the separator character in search results. (This feature shall be known as feature SRS.) * Added parsing of XHTML 1.1 ruby elements. (This feature shall be known as feature RUBY.) BUG FIXES --------- * The index(1) -T option was ignored. (This bug fix shall be known as bug fix ITO.) * A configuration file that did not end in a newline would cause a segfault. (I think: I never tried it, but it looked like a bug to me.) (This bug fix shall be known as bug fix CNL.) * Configuration error messages output "(null)" (or seg-faulted) for the variable name. I don't see how the compiler didn't catch this since the name_ data member is const and therefore must be initialized in the constructor. (This bug fix shall be known as bug fix NVR.) * Setting the SearchDaemon config. variable to Y didn't allow no command-line arguments to be given. (This bug fix shall be known as bug fix DCL.) * Filter substitution incorrectly rescanned substituted text. (This bug fix shall be known as bug fix FSR.) * Several tweaks were made to make SWISH++ compiled under FreeBSD. (This bug fix shall be known as bug fix BSD3.) * Added -lnsl for compiling the search daemon under Solaris. (This bug fix shall be known as bug fix SOL2.) * Removed some more buffer-overflow bugs. (This bug fix shall be known as bug fix BOB.) * Filename patterns didn't match if the wildcard wasn't first, e.g., foo* (This bug fix shall be known as bug fix WWF.) CHANGES, file-by-file --------------------- * AssociateMeta.h 1. Added this file for feature AMN. * auto_vec.h 1. Removed #ifdef SEARCH_DAEMON since it's now used by code not in the search daemon for bug fix BOB. 2. Added "explicit" to constructor. 3. Added: auto_vec<T>& operator=( T *p ) 4. s/T *const p_/T *p_/ 5. Added PJL namespace. * bcd.c 1. s/fake_ansi.h/platform.h/ 2. s/STATIC_CAST(...)/static_cast<...>/ * config/config.mk 1. Added MOD_* definitions for feature MOD. 2. Added MOD_LIST to CCFLAGS for feature MOD. 3. Removed definition of MAKE. 4. Added separate "OS selection" section since there's now FreeBSD and Solaris also. 5. Added "PTHREAD_LIB= -pthread" for bug fix BSD. 6. Added "SOCKET_LIB+= -lnsl" for bug fix SOL2. 7. Wrapped thread and socket stuff inside "ifdef SEARCH_DAEMON". 8. Added OS variable. 9. Added OPTIM variable since -O3 in the cygwin environment causes a segfault due to an optimizer bug, presumeably. A. Added: -DMOD_MAN for feature MAN. B. If g++, added: -fno-exceptions to reduce code size. * config/src/mutable.c 1. Removed this file since all C++ compilers should now support "mutable". * config/src/new_casts.c 1. Removed this file since compilers should be implementing new casts by now. * config/src/socklen_1_socklen_t.c * config/src/socklen_2_int.c * config/src/socklen_2_unsigned.c 1. Added "#include <sys/types.h>" for bug fix BSD3. * config.h 1. Added "#ifdef MOD_HTML" around HTML and XHTML options for feature MOD. 2. Moved Title_Max_Size and TitleLines_Default down to Miscellaneous section for feature MAIL. 3. Added SocketPort_Default for feature TCP. * conf_bool.h * conf_int.c * conf_int.h * conf_set.c * conf_set.h * conf_string.c * conf_string.h * ExcludeFile.h * FilesGrow.c * FilesGrow.h * FilterFile.c * FilterFile.h 1. Removed "var_name" parameter from parse_value(). * conf_bool.c 1. Removed "var_name" parameter from parse_value(). 2. Added "using namespace std;" since it should have been there all along. 3. Switched to using auto_vec<char> and to_lower_r() for bug fix BOB. 4. Added PJL namespace. * conf_enum.c * conf_enum.h 1. Added for feature TCP. * conf_int.c 1. Switched to using auto_vec<char> and to_lower_r() for bug fix BOB. 2. s/cerr << error/error()/ 3. Added PJL namespace. * conf_set.h 1. Added PJL namespace. * conf_string.c 1. Added (missing) include of platform.h and namespace stuff. 2. Added code to strip leading/trailing quotes for feature SRS. 3. s/cerr << error/error()/ * conf_string.h 1. Added == and != operators. * conf_var.c 1. Removed HTMLFile for feature MOD. 2. Added "#ifdef MOD_HTML" around ExcludeClass for feature MOD. 3. Removed "var_name" parameter from parse_value(). 4. Replaced alias_name() by constructor. 5. Added ExtractFile for feature MOD. 6. In parse_file(), redid the finding of a newline for bug fix CNL. 7. In parse_file(), made use of find_newline(). 8. In map_ref(), added "SocketAddress" for feature TCP. 9. In conf_var::conf_var(), added initialization of name_ for bug fix NVR. A. In conf_var::conf_var(): s/map_ref()[ name_ ]/map_ref()[ to_lower( name_ ) ]/ so the case for variable names is irrelevant. B. In conf_var::map_ref(), made all variable names lower case. C. In conf_var::parse_line(), added "to_lower( line )" so the case for variable names is irrelevant. D. In conf_var::parse_line(): s/ in config. file// E. In conf_var::map_ref(), changed to doing initialization via a table. (This had the side-effect of making "search" work under FreeBSD.) F. Added ResultSeparator variable for feature SRS. G. s/cerr << warning/warning()/ H. Added PJL namespace. I. s/isspace/is_space/ J. Added "associatemeta" for feature AMN. * conf_var.h 1. Added default argument of "cerr" to error() and warning(). * directory.c 1. Moved configuration variable extern declarations to .h files. * do_file.c 1. Added "#ifdef INDEX" around declaration of orig_file_size and orig_file_name. (It should have been there all along.) 2. Reworked calling of the indexer for feature MOD. 3. Added ExtractFile::const_iterator for feature MOD. 4. s/name_set_.contains()/seen_file()/ 5. s/file_info::current_file().num_words_/fi->num_words()/ 6. Removed "filter_list.reserve( 5 )" so as not to waste time and thereby penalize the performance for files that are not filtered. 7. Recalculated basename for bug fix WWF. * elements.c 1. Added "#ifdef MOD_HTML" for feature MOD. 2. s/REINTERPRET_CAST(...)/reinterpret_cast<...>/ 3. Added ruby elements for feature RUBY. * elements.h 1. Added "#ifdef MOD_HTML" for feature MOD. 2. Added PJL namespace. * entities.h * entities.c 1. Added "#ifdef MOD_HTML" for feature MOD. * encoded_char.c * encoded_char.h 1. Added for feature MAIL. * ExcludeClass.h 1. Added "extern ExcludeClass exclude_class_names;". * ExcludeFile.c 1. Added "using namespace std;". 2. Removed "var_name" parameter from parse_value(). * ExcludeMeta.h 1. Added "extern ExcludeMeta exclude_meta_names;". * exit_codes.h 1. Created TCP and Unix versions of the search daemon exit codes. * extract.c 1. s/IncludeFile/ExtractFile/ for feature MOD since extraction doesn't use modules. 2. Reworked -e and -E options to allow multiple, comma-separated patterns just like for index(1). 3. Added PJL namespace. 4. In extract_words(), removed "buf" and now using "word" exclusively. * ExtractFile.c * ExtractFile.h 1. Added for feature MOD. * fake_ansi.h 1. Got rid of faking "mutable" since all C++ compilers should now support this. 2. Removed new casts section since compilers should be implementing them by now. 3. Added hack to fix g++/STL/iterator bug. * file_info.c 1. s/fake_ansi.h/platform.h/ 2. s/REINTERPRET_CAST(...)/reinterpret_cast<...>/ 3. Added: #ifndef PJL_NO_NAMESPACES (it should have been there all along). 4. Added definition of result_separator variable for feature SRS. 5. s/' '/result_separator for feature SRS. * file_info.h 1. s/ostream/std::ostream/ 2. Made all but list_ data members private. 3. Added public accessor functions for now-private data members. 4. Added inc_words() and seen_file(). 5. Added PJL namespace. * file_list.c 1. Removed: #include "fake_ansi.h" 2. Removed PJL_NO_MUTABLE section. 3. s/THIS->// * file_list.h 1. Removed: #include "fake_ansi.h". 2. s/REINTERPRET_CAST(...)/reinterpret_cast<...>/ 3. Removed pointer and reference type. 4. Made const_iterator derived from std::iterator. * file_vector.c * file_vector.h 1. Replaced by mmap_file.[ch] * FilesReserve.h 1. Added "extern FilesReserve files_reserve;". * filter.c 1. Removed all WIN32 special cases. 2. Added (missing) #include "platform.h" 3. s/find()/rfind()/ 4. Added code to increment pos past substituted text for bug fix FSR. 5. Added code for %E for feature 22L. 6. Changed handling of @ for feature SM2. 7. Made use of basename() added to util.h. * filter.h 1. s/::unlink/std::unlink/ * FilterFile.c 1. Changed handling of @ for feature SM2. 2. Consequently, now require only 1 substitution. 3. Added %E as a valid substitution for feature 22L. * fnmatch.c 1. Added: #include "platform.h" 2. Added: #ifndef PJL_NO_NAMESPACES * GNUmakefile 1. Reorganized HTML sources for feature MOD. 2. Added MOD_MAIL sources for feature MAIL. 3. Added conf_enum.c, SearchDaemon.c, and SocketAddress.c for feature TCP. 4. Added IncludeMeta.c for feature MAIL. 5. Added splitmail target for feature MAIL. 6. s/ifndef WIN32/ifdef SEARCH_DAEMON/ 7. Added fnmatch.c conditionally for WIN32 to E_SRCS. 8. Removed WIN32 special case for platform.h. 9. s/=/:=/ A. Reworded C++ compiler section. B. Added MOD_MAN sources for feature MAN. * html.c * html.h 1. Replaced by mod_html.c and mod_html.h, respectively, for feature MOD. * IncludeFile.h 1. Removed "var_name" parameter from parse_value(). 2. Removed the alias for HTML_File for feature MOD. 3. s/pattern_map< bool >/pattern_map< indexer* >/ for feature MOD. * IncludeFile.c 1. Removed "var_name" parameter from parse_value(). 2. Changed form of line to include indexer for feature MOD. 3. Added "using namespace std;". * IncludeMeta.c 1. Added this file for feature MAIL. 2. Added PJL namespace. * IncludeMeta.h 1. Added "extern IncludeMeta include_meta_names;". 2. Changed base class from conf_set to conf_var and map for feature MAIL. * index.c 1. Added "#ifdef MOD_HTML" for feature MOD. 2. Performed following substitutions for feature MOD. s/html.h/mod_html.h/ s/index.h/indexer.h/ 3. Changed the syntax for -e for feature MOD. 4. Removed the -h option for feature MOD. 5. Moved index_word() to indexer.c for feature MOD. 6. Allowed multiple patterns to be specified via -E option. 7. In main(), performed following substitution: s/TempDirectory_Default/0/ for bug fix ITO. 8. Updated the usage message for feature MOD. 9. Moved configuration variable extern declarations to .h files. A. In main() for case 'm', performed following substitution: s/include_meta_names.insert( to_lower( opt.arg() ) ) /include_meta_names.parse_value( opt.arg() )/ B. Added "#include <sys/time.h>" for bug fix BSD3. C. s/REINTERPRET_CAST(...)/reinterpret_cast<...>/ D. s/remove_temp_files()/remove_temp_files( void )/ for picky HP-UX compiler. E. In rank(), s/num_words_/num_words()/ F. In write_file_index(), made use of new file_info member functions. G. Removed all WIN32 special cases. H. Added PJL namespace. I. Added associate_meta global variable for feature AMN. J. Added "no-assoc-meta" and 'A' command-line options for feature AMN. * index.h 1. Replaced by indexer.h for feature MOD. * indexer.c * indexer.h 1. Added for feature MOD. 2. Added PJL namespace. * index_segment.c 1. Removed: #include "fake_ansi.h" 2. s/REINTERPRET_CAST(...)/reinterpret_cast<...>/ 3. Added PJL namespace. * index_segment.h 1. Made index_segment::const_iterator derived from std::iterator. 2. Added PJL namespace. * init_modules.c 1. Added for features MOD and MAIL. * INSTALL.win32 1. Changed from mingw to cygwin. 2. Removed note about extract(1). 3. Changed build instructions to match Unix version. * itoa.c 1. s/fake_ansi.h/platform.h/ 2. Added PJL namespace. * itoa.h 1. Added PJL namespace. * less.h 1. s/binary_function/std::binary_function/ * man/man1/extract.1 1. Added description for multiple patterns for -e, --pattern, -E, and --no-pattern options. 2. s/pjl@best.com/pauljlucas@mac.com/ * man/man1/httpindex.1 * man/man4/swish++.index.4 1. s/pjl@best.com/pauljlucas@mac.com/ * man/man1/index.1 1. s/pjl@best.com/pauljlucas@mac.com/ 2. Added description of modules and mod_mail for feature MAIL. 3. Removed -h, --html-pattern, and HTMLFile. 4. Reworked description of -m and --meta. 5. Added references for feature MAIL. 6. Added references for feature MAN. 7. Added -A, --no-assoc-meta, and AssociateMeta for feature AMN. 8. Added mention of and reference for Ruby elements for feature RUBY. 9. Made -T option no longer refer to <TITLE> element. * man/man1/search.1 1. Added -R, --separator, ResultSeparator for feature SRS. 2. Made "select" in daemon example more concise. * man/man1/splitmail.1 1. Added for feature MAIL. * man/man3/WWW.3 1. s/pjl@best.com/pauljlucas@mac.com/ 2. Redid formatting of references. 3. Removed trim_whitespace(), url_decode(), and url_encode() since they are no longer used now that the search.cgi example uses CGI.pm * man/man4/swish++.conf.4 1. Added section for enumeration variables and SearchDaemon for feature TCP. 2. Changed IncludeFile from a set variable to an other variable for feature MOD. 3. Added SocketAddress for feature TCP. 4. Added section for IncludeMeta for feature MAIL. 5. s/pjl@best.com/pauljlucas@mac.com/ 6. Added: "For variables_names, case is irrelevant." 7. Added note about preserving whitespace in string values. 8. Added ResultSeparator for feature SRS. 9. Added more IP address detail for SocketAddress. A. Added "# WRONG!" comment to filter example. B. Added "AssociateMeta" for feature AMN. C. Added missing FollowLinks. D. Added "Man" module for feature MAN. * mmap_file.c 1. Added "#include <sys/time.h>" for bug fix BSD3. 2. s/REINTERPRET_CAST( caddr_t )( -1 )/MAP_FAILED/ 3. Removed all WIN32 code. 4. s/fake_ansi.h/platform.h/ 5. Added PJL namespace. * mmap_file.h 1. Removed all WIN32 code. 2. Added PJL namespace. * mod_html.c 1. Replaced html.c for feature MOD. 2. Reworked everything to use encoded_char_ranges. 3. Moved configuration variable extern declarations to .h files. 4. In parse_html_tag(), s/tag/name/ 5. Added PJL namespace. 6. Removed "buf" and now using "word" by itself. 7. s/isxdigit/is_xdigit/ 8. s/isdigit/is_digit/ 9. s/isalpha/is_alpha/ A. s/isspace/is_space/ B. Reworked meta names are handled for feature AMN. C. In tag_cmp(), "fixed" increment and end-of-string test. * mod_html.h 1. Replaced html.h for feature MOD. 2. Made find_title(), index_words(), and new_file() public so they could be accessed from mod_mail.c. 3. Made index_words() and parse_html_tag() take an encoded_char_range or encoded_char_range::const_iterator argument so they could parse HTML that is encoded. 4. Moved configuration variable extern declarations to .h files. 5. Added PJL namespace. * mod_mail.c * mod_mail.h 1. Added for feature MAIL. * mod_man.c * mod_man.h 1. Added for feature MAN. * my_set.h 1. Added PJL namespace. * option_stream.c * option_stream.h 1. Added PJL namespace. * pattern_map.h 1. s/::find_if/std::find_if/ 2. Added: #ifdef PJL_LOCAL_FNMATCH 3. Added "typename" to declaration of map_type. 4. s/value_type const&/argument_type/ * PidFile.h 1. Added "extern PidFile pid_file_name;". * platform.h.win32 1. Removed since not longer needed under cygwin. * postscript.h 1. Added PJL namespace. * README 1. Added new feature descriptions. 2. s!www.best.com/~pjl!homepage.mac.com/pauljlucas! 3. Added mention of Christoph Conrad. * RecurseSubdirs.h 1. Added "extern RecurseSubdirs recurse_subdirs;". * ResultSeparator.h 1. Added this file for feature SRS. * search.c 1. s/html.h/indexer.h/ for feature MOD. 2. Added #include "SocketAddress.h" for feature TCP. 3. s/am_daemon/daemon_type/ and s/daemon_opt/type_type_arg/ for feature TCP. 4. Made the daemon configuration variables global and become_daemon() take no arguments becuase the argument list was getting way too long. 5. In search_options::search_options(), added socket_address_arg for feature TCP. 6. In search_options::search_options(), added -a option for feature TCP. 7. In usage(), updated message for feature TCP. 8. Added "#include <sys/time.h>" for bug fix BSD3. 9. In main(), moved check of number of command-line arguments after conf_var::parse_file() and command-line override code for bug fix DCL. A. Switched to using auto_vec<char> and to_lower_r() all the time for bug fix BOB. B. Removed all WIN32 special cases. C. Added: #include "ResultsSeparator.h" for feature SRS. D. In main(), added code for result_separator for feature SRS. E. In dump_single_word(), search(), and service_request(): s/' '/result_separator/ for feature SRS. F. In search_options::search_options(), added 'R' case for feature SRS. G. In usage(), added line for -R for feature SRS. H. Added PJL namespace. * search.h 1. s/bool daemon_opt/char const* daemon_opt_arg/ for feature TCP. 2. Added socket_address_arg for feature TCP. 3. s/ostream/std::ostream/ (it should have been that way all along). 4. Added result_separator_arg for feature SRS. 5. Added PJL namespace. * searchc.in 1. Added stuff to connect via a TCP socket to the search daemon for feature TCP. 2. Updated Perl book references for 3rd ed. * search_daemon.c 1. Moved configuration variable extern declarations to .h files. 2. Added "#include <sys/time.h>" for bug fix BSD3. 3. In accept_failed(), added "#ifdef EPROTO" for bug fix BSD3. 4. Partitioned the code into smaller functions. 5. Added PJL namespace. * SearchDaemon.c 1. Added a bunch of #include's ane extern declarations since become_daemon() now uses globals rather than parameters. This was done for feature TCP. 2. Added accept_failed() function for feature TCP. 3. In become_daemon(), added code for TCP sockets for feature TCP. * SearchDaemon.h 1. Changed to be derived from conf_enum for feature TCP. * SearchDaemon.c 1. Added this file for feature TCP. * search_options.c 1. Added "separator", 'R' option for feature SRS. * search_thread.c 1. s/fake_ansi.h/platform.h/ 2. Added PJL namespace. 3. s/isspace/is_space/ * search_thread.h 1. Added PJL namespace. * SocketAddress.h * SocketAddress.c 1. Added these files for feature TCP. * SocketFile.h 1. Added "extern SocketFile socket_file_name;". * socket_options.c 1. Added "socket-address" for feature TCP. 2. s/daemon/daemon-type/ and made it take an argument for feature TCP. * SocketQueueSize.h 1. Added "extern SocketQueueSize socket_queue_size;". * SocketTimeout.h 1. Added "extern SocketTimeout socket_timeout;". * splitmail.in 1. Added this utility for feature MAIL. * stem_word.c 1. Performed following substitution: s/replace_suffix( char *word, rule_list* ) /replace_suffix( char *word, rule_list const* )/ It should have been that way all along. * stop_words.c 1. Added "shall", "you'll", "you're". 2. Added PJL namespace. 3. s/word_buf/word/ 4. s/word_len/len/ * stop_words.c 1. Added PJL namespace. * swish++.conf 1. Added ExtractFile for feature MOD. 2. Removed HTMLFile for feature MOD. 3. Added module name to IncludeFile for feature MOD. 4. Changed SearchDaemon for feature TCP. 5. Added SocketAddress for feature TCP. 6. Added missing variables and sorted alphabetically properly. 7. Added IncludeMeta values for mail/news. 8. Added ResultSeparator variable for feature SRS. 9. Changed @ in FilterFile lines for feaure SM2. A. Added "AssociateMeta" for feature AMN. * swish++.conf.4 1. Added note about preserving whitespace in string values. 2. Added ResultSeparator for feature SRS. 3. Added more IP address detail for SocketAddress. 4. Added "# WRONG!" comment to filter example. 5. Added "For variables_names, case is irrelevant." 6. Sorted Other variables. 7. Added section for IncludeMeta. 8. Removed HTMLFile. 9. Added ExtractFile. A. Added section for enumeration variables and SearchDaemon. B. Changed IncludeFile from a set variable to an other variable. C. Added SocketAddress. D. s/pjl@best.com/pauljlucas@mac.com/ E. Added %E substitution for feature 22L. F. Added @ changes for feature SM2. G. Added code to increment pos for bug fix FSR. * thread_pool.c 1. s/fake_ansi.h/platform.h/ 2. Added: #ifndef PJL_NO_NAMESPACES (it should have been there all along). 3. s/STATIC_CAST(...)/static_cast<...>/ 4. Added PJL namespace. * thread_pool.h 1. Added PJL namespace. 2. s/queue/std::queue/ 3. s/set/std::set/ 4. Made ~thread() public. * ThreadsMax.h 1. Added "extern ThreadsMax max_threads;". * ThreadsMin.h 1. Added "extern ThreadsMin min_threads;". * ThreadTimeout.h 1. Added "extern ThreadTimeout thread_timeout;". * TitleLines.h 1. Changed comments to reflect that TitleLines isn't used exclusively for HTML or XHTML files any more for feature MAIL. 2. Added "extern TitleLines num_title_lines;" * token.c 1. s/(cfc)/static_cast<char (*)(char)>/ * util.c 1. Moved "char_buffer_pool<128,5> buf" to file scope. 2. s/fake_ansi.h/platform.h/ 3. Added: to_lower_r(char const*, char const*) for bug fix BOB. * util.h 1. Added find_newline() and skip_newline(). 2. Added "#include <sys/time.h>" for bug fix BSD3. 3. Removed #ifdef SEARCH_DAEMON around to_lower_r(char const*) since it's now used by code not in the search daemon for bug fix BOB. 4. Added: to_lower_r(char const*, char const*) for bug fix BOB. 5. s/file_vector::const_iterator/char const */ 6. Added: is_alnum(), is_alpha(), is_digit(), is_punct(), is_space(), is_upper(), and is_xdigit(). * Verbosity.h 1. Added "extern Verbosity verbosity;". * version.h 1. Updated version to "5.0". * word_info.h 1. s/html.h/indexer.h/ for feature MOD. 2. Added PJL namespace. * man/man3/WWW.3 1. Removed trim_whitespace(), url_decode(), and url_encode() since they are no longer used now that the search.cgi example uses CGI.pm * option_stream.h 1. s/ostream/std::ostream/ * util.h 1. Removed all WIN32 special cases. 2. Added: basename(). * Win32-Makefile-index.v * Win32-Makefile-search.v 1. Removed these since they are not needed under cygwin. * word_info.h 1. s/ostream/std::ostream/ * word_util.c 1. s/isdigit/is_digit/ 2. s/ispunct/is_punct/ 3. s/isupper/is_upper/ * word_util.h 1. s/STATIC_CAST(...)/static_cast<...>/ 2. s/isalnum/is_alnum/ * www_example/search.cgi 1. Updated Perl book references for 3rd ed. 2. s/the.index/swish++.index/ 3. Added $SOCKET_ADDRESS for feature TCP. 4. Rewrote to use standard CGI.pm module. 5. Added code to do TCP sockets for feature TCP. 6. Fixed printing of file size in results. ******************************************************************************* 4.8 ******************************************************************************* NEW FEATURES ------------ * The filename pattern matching (FNP) feature introduced in 4.5 has finally been ported to Windows. (This feature shall be known as feature WFNP.) BUG FIXES --------- * The GNUmakefile didn't build dependencies properly for files that are conditionally #include'd. (This bug fix shall be knows as bug fix IDB.) * The directory separator character ('/' for Unix) is apparantly transformed into '\' for Windows by the intermediate Windows port of POSIX functions. However, in the case where '/' is inserted into a string and that string is printed, the mere printing won't do the transformation. (This bug fix shall be knows as bug fix WDSC.) CHANGES, file-by-file --------------------- * auto_vec.h 1. Renamed and cleaned-up from managed_ptr.h. * config.h 1. Added: TempDirectory_Default[] = "/temp"; when compiling for Windows. * conf_int.c 1. s/managed_vec/auto_vec/ * copying.dj * fnmatch.c * fnmatch.h 1. Added these files for feature WFNP. * directory.c 1. Added Dir_Sep_Char for bug fix WDSC. * GNUmakefile 1. Added: I_SRCS+= fnmatch.c for feature WFNP. 2. Performed the following substitution: s/CPPFLAGS/CFLAGS/ for bug fix IDB. 3. Added some comments for the clean, distclean, and dist targets. * index_segment.c 1. Performed the following substitution: s/long/size_type/ It should have been that way all along. * index_segment.h 1. Performed the following substitution: s/long size_type/unsigned long size_type/ ...no need for it to be signed. 2. Performed the following substitution: s/char* value_type/char const* value_type/ It should have been that way all along. 3. Eliminated "pointer" and "reference" types since they weren't used. 4. Performed the following substitution: s/long/size_type/ It should have been that way all along. * managed_ptr.h 1. Renamed to auto_vec.h. * man/man1/index.1 1. Added mention of "/temp" for Windows. 2. Added section on differences for the Windows command line. * pattern_map.h 1. Added: #include "fnmatch.h" for feature WFNP. * search.c 1. Performed the following substitution: 1/managed_vec/auto_vec/ * version.h 1. Updated version to "4.8". ******************************************************************************* 4.7 ******************************************************************************* NEW FEATURES ------------ * Added 'b' and 'B' substitutions for filters that are the base name and base name minus the extension of a file name, respectively. This is useful when you need the temporary files created in a location other than where the originals are, for example when the originals are on a filesystem that you don't have write access to. Note that, for consistency, the 'E' substitution has been renamed to 'F'. This is therefore an incompatible change with previous versions of SWISH++. (This feature shall be knows as feautre BBS.) CHANGES, file-by-file --------------------- * filter.c 1. Added code to determine the base name of a file for feature BBS. 2. Added 'b' and 'B' cases for feature BBS. 3. Renamed 'E' case to 'F' case for consistency. * FilterFile.c 1. Added 'b' and 'B' substitutions as legal for feature BBS. 2. Changed 'E' substitution to 'F' for consistency. 3. Edit corresponding error message for feature BBS. * GNUmakefile 1. Undid the split of the dist and distclean targets done in version 4.5 since that change started to bug me to much. * man/man4/swish++.conf.4 1. Changed description of filter substitutions to match feature BBS. * swish++.conf 1. Changed all '@E' to '@F' corresponding to filter.c item #3. * version.h 1. Updated version to "4.7". ******************************************************************************* 4.6.6 ******************************************************************************* BUG FIXES --------- * Fixed segmentation fault when parsing HTML files that contain tags longer than Tag_Name_Max_Size characters. (This bug fix shall be known as bug fix HTL.) CHANGES, file-by-file --------------------- * html.c 1. In parse_html_tag(), added tag buffer overflow check for bug fix HTL. * man/man4/swish++.conf.4 1. Added (missing) mention of "HTMLFile". * man/man4/swish++.index.4 1. Fixed typo: s/numm/null/ * version.h 1. Updated version to "4.6.6". ******************************************************************************* 4.6.5 ******************************************************************************* BUG FIXES --------- * Adding files incrementally to an index that has meta names caused a SEGFAULT. (This bug fix shall be known as bug fix IIM.) CHANGES, file-by-file --------------------- * index.c 1. In load_old_index(), performed the following substitution for bug fix IIM: s/meta_names[ *meta_name ] = parse_bcd( p );/ meta_names[ ::strdup( *meta_name ) ] = parse_bcd( p );/ * version.h 1. Updated version to "4.6.5". ******************************************************************************* 4.6.4 ******************************************************************************* BUG FIXES --------- * File having path names longer than 255 characters weren't indexed. (This bug fix shall be known as bug fix PATH.) CHANGES, file-by-file --------------------- * do_file.c * extract.c * index.c 1. Performed the following substitution for bug fix PATH: s/NAME_MAX/PATH_MAX/ * util.h 1. Performed the following substitution for bug fix PATH: s/NAME_MAX = 255/PATH_MAX = 1024/ * version.h 1. Updated version to "4.6.4". ******************************************************************************* 4.6.3 ******************************************************************************* BUG FIXES --------- * DD elements weren't implicitly terminated by a new <DT> tag. (This bug fix shall be known as bug fix DDDT.) CHANGES, file-by-file --------------------- * elements.c 1. Added: "dt", "/dt", to the "dd" line for bug fix DDDT. * version.h 1. Updated version to "4.6.3". ******************************************************************************* 4.6.2 ******************************************************************************* BUG FIXES --------- * When using filters, the post-filtered filename and size were stored in the index rather than the original filename and size. (This bug fix will be known as bug fix FFS.) CHANGES, file-by-file --------------------- * do_file.c 1. Added orig_file_size for bug fix FFS. 2. Moved test for encountering file during incremental indexing to before the filter filename substitutions for bug fix FFS. 3. Added orig_file_name for bug fix FFS. 4. Changed "new file_info()" call to use orig_file_name and orig_file_size for bug fix FFS. * filter.c * filter.h 1. Performed following substitution: s/target_file_/target_file_name_/ * util.h 1. Added file_size() for bug fix FFS. * version.h 1. Updated version to "4.6.2". ******************************************************************************* 4.6.1 ******************************************************************************* BUG FIXES --------- * extract(1) incorrectly required arguments for -l and -r. (This bug fix will be known as bug fix LRO.) CHANGES, file-by-file --------------------- * extract.c 1. In main(), fixed opt_spec for bug fix LRO. * version.h 1. Updated version to "4.6.1". ******************************************************************************* 4.6 ******************************************************************************* NEW FEATURES ------------ * Added the ability to specify the extension appended to files for extract(1). (This feature will be known as feature SEE.) * Added the ability to run extract(1) as a filter. (This feature will be known as feature EF.) BUG FIXES --------- * extract(1) didn't print the file name of files that didn't exist in its error message. (This bug fix will be known as bug fix DNE.) CHANGES, file-by-file --------------------- * config.h 1. Added ExtractExtension_Default for feature SEE. * conf_var.c 1. In map_ref(), added ExtractExtension for feature SEE. 2. In map_ref(), added ExtractFilter for feature EF. * do_file.c 1. Changed to use user-specified extension for feature SEE. 2. Changed to have the ability write to standard output for feature EF. * exit_codes.h 1. Added Exit_No_Such_File for feature EF. * extract.c 1. Added global extract_extension variable for feature SEE. 2. In main(), added -x and --extension options for feature SEE. 3. In usage(), added description of -x and --extension options for feature SEE. 4. In extract_*() functions, changed ofstream argument to ostream for feature EF. 5. Added extract_as_filter global variable for feature EF. 6. In main() and usage(), added -f and --filter option for feature EF. 7. In main(), added code for the filter case for feature EF. 8. In main(), fixed bug DNE. * ExtractExtension.h 1. Added this file for feature SEE. * ExtractFilter.h 1. Added this file for feature EF. * man/man1/extract.1 1. Added description of -x and --extension options for feature SEE. 2. Added description of -f and --filter options for feature EF. * man/man4/swish++.conf.4 * swish++.conf 1. Added ExtractExtension variable for feature SEE. 2. Added ExtractFilter variable for feature EF. * version.h 1. Updated version to "4.6". ******************************************************************************* 4.5 ******************************************************************************* NEW FEATURES ------------ * Added the ability to index, not index, and filter files based on filename pattern rather than merely extension. (This feature will be known as feature FNP.) BUG FIXES --------- * If an HTML file doing selective non-indexing via CLASS attributes wasn't well-formed such that an HTML element having the CLASS attribute didn't end properly, then all words in all subsequent files indexed would be discarded. (This bug fix shall be known as bug fix ECC.) * The --verbosity option in index(1) wasn't recognized. (This bug fix shall be known as bug fix VLO.) CHANGES, file-by-file --------------------- * config/config.mk 1. Added -pedantic to CCFLAGS to make code cleaner. * config/config-sh 1. Changed &- to /dev/null because of a weird interaction with g++ -pedantic. * conf_var.c 1. Performed the following substitution for configuration variable names for feature FNP: s/(.+)Extension/$1File/ * do_file.c 1. Changed extension-based processing to pattern-based for feature FNP. 2. Added "true" argument to call of index_words() for bug fix ECC. * entities.c 1. Added "apos", "Scaron", "scaron", and "Yuml" for support of XHTML. * ExcludeExtension.h * IncludeExtension.c * IncludeExtension.h * FilterExtension.c * FilterExtension.h * ExcludeFile.h * IncludeFile.c * IncludeFile.h * FilterFile.c * FilterFile.h 1. Replaced *Extension files with *File equivalents for feature FNP. * exit_codes.h 1. Added Exit_End_Enum_Marker to make compile with the -pedantic option of g++. * extract.c 1. Same as conf_var.c item #1. 2. Performed following substitution for feature FNP: s/extension/pattern/ 3. Made usage() take an ostream& argument just so it parallels the way it's done in index.c. * file_info.c 1. In file_info::out(), eliminated unused num_words variable. * file_list.c 1. Removed #include <sys/types.h> since it's apparantly not needed. * file_vector.h 1. Replaced off_t with size_t since mmap(2) uses size_t. * GNUmakefile 1. Added/removed source files for feature FNP. 2. Added ".PHONY: all" so a "make -t" doesn't "touch all". 3. Changed from using .%.d to dep/%.d dependency files since Windows doesn't like filenames beginning with a dot. 4. Split dist and distclean targets since only distclean should remove the dependencies. * html.h * index_segment.h * option_stream.h * token.h 1. Removed comma and end of enum list to make compile with the -pedantic option of g++. * html.c 1. Added mention of XHTML in comments. * index.c 1. Same as conf_var.c item #1. 2. Same as extract.c item #2. 3. In main() for the 'C', 'm', and 'M' command-line options cases, removed unnecessary strdup() since the variables are derived from conf_set that is-a string_set that uses std::string so the strings are copied anyway. 4. In load_old_index(), removed unnecessary strdup() for similar reason to item 3. 5. In index_word(), performed following substitution: s/if ( exclude_class_count )/if ( exclude_class_count > 0 )/ just because it seemed "more correct." 6. Corresponding change as index.h item #1. 7. In index_words(), removed "static bool new_file" variable for bug fix ECC. 8. In index_words(), added: if ( is_new_file ) exclude_class_count = 0; for bug fix ECC. 9. In index_words(), removed last "new_file = true;" for bug fix ECC. A. In main(), performed following substitution: s/verbose/verbosity/ for bug fix VLO. * index.h 1. Added "is_new_file" argument for bug fix ECC. * man/man1/index.1 * man/man1/extract.1 * man/man4/swish++.conf.4 1. Modified description for feature FNP. * my_set.h 1. Performed following substitition: s/key_type/T const &/ to make it compile with the -pedantic option of g++. 2. Added specialization of my_set< char const* > (see the comment for why). * pattern_map.h 1. Added this file for feature FNP. * README 1. Changed "extension" to "patterns" for feature FNP. * stop_words.c 1. Added "mustn't" to the list of stop-words. * swish++.conf 1. Same as conf_var.c item #1. * version.h 1. Updated version to "4.5". * word_util.c 1. Added #include <iostream> if DEBUG_is_ok_word is defined. ******************************************************************************* 4.4 ******************************************************************************* NEW FEATURES ------------ * The FilterExtension variable now allows you to specify literal % and @ characters by simply doubling the character. (This feature will be known as feature FLC.) BUG FIXES --------- * Added various #include lines and replacement for ENODATA to make it compile on FreeBSD systems. (This bug fix will be known as bug fix BSD2.) * Fixed a bug in config.pl whereby a \$ would cause all $ after it not to expand because the while loop exited prematurely. (This didn't currently matter, but it might in the future.) (This bug fix will be known as bug fix CEV.) * Fixed a bug in install-sh whereby source files in subdirectories would not be chown'd and chmod'd properly. (This didn't matter for SWISH++ the way it's distributed, but it might in the future.) (This bug fix will be known as bug fix ISD.) CHANGES, file-by-file --------------------- * config.pl 1. * config/man.mk 1. Added empty "all" rule so text versions of manual pages are not automatically built. * directory.c 1. Added #include <sys/types.h> for bug fix BSD2. * file_vector.c 1. Added #include <sys/types.h> for bug fix BSD2. 2. Added test for ENODATA and, if not available, use something else. (For bug fix BSD2.) * filter.c 1. Added code for feature FLC. 2. Added special-case code for sleep(3) for Windows. * FilterExtension.c 1. Added code for feature FLC. * GNUmakefile 1. Performed substitutions of the form: s/@cd $(DIR) && $(MAKE) $@/@$(MAKE) -C $(DIR) $@/ since GNU make has a -C option. 2. Added conditionals so as not to regenerate .d files when making clean, distclean, or dist. 3. Added "txt" as a target. * install-sh 1. Added code to strip directories via basename and xargs for bug fix ISD. * man/Makefile 1. Renamed to GNUmakefile. * man/GNUmakefile 1. Renamed from Makefile. 2. Replaced all targets with simpler %. * man/man1/GNUmakefile * man/man1/Makefile * man/man3/GNUmakefile * man/man3/Makefile * man/man4/GNUmakefile * man/man4/Makefile 1. Renamed Makefile to GNUmakefile. * man/man4/swish++.conf.4 1. Added description of feature FLC. * search.c * search_daemon.c * util.h 1. Added #include <ctime> for bug fix BSD2. * version.h 1. Updated version to "4.4". ******************************************************************************* 4.3.1 ******************************************************************************* BUG FIXES --------- * Indexing via standard input did NOT index all files: it still required you to specify extensions via -e. This is totally wrong. It seems it's been wrong since version 3.1. I'm surprised nobody noticed. (This bug fix will be know as bug fix SII.) * Fixed build problem on Debian Linux systems having to do with the type of the 3rd argument to accept(3). (This bug fix will be know as bug fix A3A.) CHANGES, file-by-file --------------------- * config/config-sh 1. Completely reworked to support multiple tests for the same thing for bug fix A3A. * config/src/explicit.c * config/src/mutable.c * config/src/namespaces.c * config/src/new_casts.c 1. Performed following subtitution: s/DEFINE/FAIL/ for bug fix A3A. * config/socklen.c 1. Replaced by other socklen*.c files. * config/socklen_1_socklen_t.c * config/socklen_2_int.c * config/socklen_2_unsigned.c 1. Replaced socklen.c with individual tests for bug fix A3A. * do_file.c 1. Performed following substitution: s/exclude_extensions.empty()/!include_extensions.empty()/ for bug fix SII. * GNUmakefile 1. Added platform.h as a dependency for all .*.d files so it will get built first since it is #include'd by most other files. * httpindex.in 1. Added more options that are passed to index(1). * man/man1/httpindex.1 1. Corresponding changes for httpindex.in. * search_daemon.c 1. Replaced use of PJL_SOCKLEN_NOT_INT with PJL_SOCKLEN_TYPE for bug fix A3A. * version.h 1. Updated version to "4.3.1". ******************************************************************************* 4.3 ******************************************************************************* NEW FEATURES ------------ * 'search' has a new -P (or --pid-file) option and PidFile configuration variable to specify a file to write its process ID when running as a daemon. (This feature will be known as feature PID.) BUG FIXES --------- * The FilesGrow feature was broken because I forgot a needed set of parentheses to get the precedence right. :-( (This bug will be known as bug fix FGP.) * For 'search', the description of the -p and --word-percent option in the usage message wasn't printed unless it was compiled as a daemon (and these options have nothing to do with it being a daemon). (This bug will be known as bug fix WPO.) CHANGES, file-by-file --------------------- * config.h 1. Performed the following substitution: s/swish++.socket/search.socket/ * config/config.pl 1. Rewrote it so that it parses Makefiles directly rather than needing to be passed arguments. * config/Makefile 1. Condensed a few things. * conf_var.c 1. Added PidFile variable for feature PID. * exit_codes.h 1. Added Exit_No_Write_PID for feature PID. * FilesGrow.h 1. Added parentheses around ?: operator for bug fix FGP. * GNUmakefile 1. Added in place of Makefiles. * html.c 1. Made is_html_comment() automatically skip it. * index.c 1. In index_words(), performed following substitution: s/char/file_vector::value_type/ It should have been that way all along. * Makefile * Makefile.win32 1. Replaced by GNUmakefile. * man/man1/search.1 1. Added description of feature PID. 2. Added -a, --socket-address for feature TCP. 3. Made changes to daemon (daemon-type) for feature TCP. 4. Added stuff for TCP sockets for feature TCP. 5. Updated exit status codes for feature TCP. 6. s/pjl@best.com/pauljlucas@mac.com/ 7. Updated Perl book references for 3rd ed. * man/man4/swish++.conf.4 1. Added mention of PidFile. * PidFile.h 1. Added this file for feature PID. * search.c 1. Added #include "PidFile.h" for feature PID. 2. Added "char const*" argument to become_daemon() function declaration for feature PID. 3. In main(), added: PidFile pid_file_name; and code to override with a command-line option for feature PID. 5. In search_options::search_options(), added pid_file_name_arg and case for it for feature PID. 6. In usage(), added description of -P and --pid-file options for feature PID. 7. In usage(), fixed bug WPO. * searchc.in 1. Same as config.h item #1. * search_daemon.c 1. Added "#include <fstream>" for feature PID. 2. Removed "#ifndef WIN32" since the search daemon feature isn't supported for Windows anyway. 3. Added "pid_file_name" argument to become_daemon() for feature PID. 4. Added code to write the process ID for feature PID. * search.h 1. Added pid_file_name_arg to struct for feature PID. * search_options.c 1. Added "pid-file" option for feature PID. * SocketFile.c 1. Removed this file to remove absolute path name requirement. * StopWordFile.h 1. Performed following substitution: s/string/std::string/ It should have been that way all along. * swish++.conf 1. Added PidFile for feature PID. 2. Same as config.h item #1. * version.h 1. Updated version to "4.3". * www_example/search.cgi 1. Same as config.h item #1. ******************************************************************************* 4.2 ******************************************************************************* NEW FEATURES ------------ * You can now index incrementally. (This feature will be known as feature II.) BUG FIXES --------- * You could get a segmentation fault if you indexed an HTML file that has no title and has fewer than title_lines lines. (This bug fix will be known as bug fix FTL.) * There was a small memory leak with stop-words when search(1) ran as a daemon. (This bug fix will be known as bug fix SWL.) * When running as a daemon, the thread timeout value was set to a garbage value. (This bug fix will be known as bug fix TTG.) * In index(1), the --files-reserve option was incorrectly named --file-reserve. (This bug fix will be known as bug fix FRO.) CHANGES, file-by-file --------------------- * config/config.mk 1. Added GCC_WARNINGS for development purposes. * config.h 1. Added FilesGrow_Default for feature II. * conf_bool.h 1. Make parse_value() protected rather than private for feature II. 2. Performed the following substitution: s/string/std::string/ * conf_int.c 1. Made parse_value() thread-safe. 2. Same as conf_bool.h item #1. 3. Same as conf_bool.h item #2. * conf_set.h 1. Same as conf_bool.h item #1. 2. Added CONF_SET_ASSIGN_OPS macro. * conf_string.h 1. Added operator+=() for feature II. 2. Performed the following substitution: s/string/std::string/ * conf_var.c 1. Added definition of conf_var::~conf_var(). 2. In map_ref(), added "Incremental" and "FilesGrow" for feature II. * conf_var.h 1. Added #include <string>. 2. Added virtual ~conf_var() because the class has virtual functions. 3. Performed the following substitution: s/string/std::string/ * do_file.c 1. In do_file(), added code to check for encountering the same file when incrementally indexing for feature II. 2. In do_file(), moved the code that checks the title for null to file_info.c. * ExcludeClass.h * ExcludeExtension.h * ExcludeMeta.h * IncludeMeta.h 1. Added CONF_SET_ASSIGN_OPS. * exit_codes.h 1. Adjusted exit codes for feature II. * file_index.h * file_index.c 1. These files were deleted; their functionality was consolidated into index_segment.[ch]. * file_info.c 1. A num_words parameter was added to the constructor for feature II. 2. operator<< was eliminated for feature II. 3. Added file_info::out() and file_info::parse() for feature II. 4. Eliminated the class-specific operator new and moved the functionality inside the constructor. * file_info.h Everything in file_info.c plus: 1. The private data members were made public and const for feature II. 2. Added name_set_ for feature II. * file_list.c 1. In calc_size(), code was added to skip occurrences for feature II. 2. In operator++(), code was added to parse occurrences for feature II. * file_list.h 1. Performed the following substitution: s/word_index/index_segment/ for feature II. * FilesReserve.h 1. Added extern declaration. * file_vector.h 1. In end(), performed following substitution: s/size()/size_/ so g++ can inline the functions. * filter.c 1. Changed types of pos and target_pos to string::size_type and changed -1 to string::npos. (It should have been this way all along.) * html.c 1. In grep_title(), removed register storage class for 'c' since its address is taken. 2. In grep_title(), added "return 0" at the end for bug fix FTL. * Incremental.c * Incremental.h 1. Added these files for feature II. * index.c 1. Added global variables "files_grow", "incremental", and "partial_index_file_names" for feature II. 2. Added load_old_index() function for feature II. 3. In main() and usage(), changed "file-reserve" option to "files-reserve" for bug fix FRO. 4. In main(), added code to new -I and -G options for feature II. 5. In main(), added code to call load_old_index() for feature II. 6. In index_words(), removed "register" from declaration of argument 'c' since its address is taken. 7. In merge_indicies(), changed from using num_temp_files to partial_index_file_names.size() for feature II. 8. In write_file_index(), added code to write the format directly since file_info::operator<<() was eliminated. 9. In write_partial_index(), added code to add the partial index file name to the global variable "partial_index_file_names" for feature II. A. In write_word_index(), added code to write the occurrence data for feature II. B. In usage(), added description of new -I, --incremental, -G, and --files-grow options for feature II. * index_segment.c * index_segment.h 1. Added these files. * man/man1/index.1 * man/man1/search.1 1. Added description of incremental indexing and new supporting options and variables for feature II. * man/man4/swish++.conf.4 1. Added description of "Incremental" and "FilesGrow" variables for feature II. * man/man4/swish++.index.4 1. Updated the description of the index file format for feature II. * managed_ptr.h 1. Added various member functions. * option_stream.c 1. Removed the constructor without the ostream argument and just made the other constructor have a default value. * search.c 1. Performed the following substitutions: s/result_type/search_result_type/ s/results_type/search_results_type/ that fixed the problem with deriving sort_by_rank from binary_function. 2. Changed the types of "files", "meta_names", "stop_words", and "words" to "index_segment" for feature II. 3. In main(), added word_file_max_arg and word_percent_max_arg for feature II. 4. Moved the become_daemon() into the new search_daemon.c file. 5. In dump_single_word(), added occurrence data to output. 6. Added is_too_frequent() function for feature II. 7. In parse_query(), added a default case to the switch statement. 8. In parse_primary(), eliminated the call to strdup() for bug fix SWL. 9. In parse_primary(), added code to check to see if a word is too frequent for feature II. A. In search_options::search_options(), added code for word_file_max_arg and word_percent_max_arg for feature II. B. In service_request(), added occurrence data to dump output for the entire index. C. In usage(), added description for =f and -p options for feature II. * search.h 1. Added word_file_max_arg and word_percent_max_arg for feature II. * search_daemon.c 1. Moved the daemon code into this file. * search_options.c 1. Added "word-files" and "percent-files" arguments for feature II. * stop_words.c * stop_words.h 1. Added new contructor for feature II. * thread_pool.c 1. In thread_pool::thread_pool(), performed following substitution: s/timeout_( timeout_ )/timeout_( timeout )/ for bug fix TTG. * util.c * util.h 1. Moved get_index_info() to index_segment.c for feature II. * version.h 1. Updated version to "4.2". * word_info.h 1. Removed "union" in word_info::file for feature II. 2. Added initialization of rank_ data member for feature II. ******************************************************************************* 4.1 ******************************************************************************* NEW FEATURES ------------ * Generated index files are now approximately 24% smaller. (This feature will be known as feature SIF.) BUG FIXES --------- * When index(1) generated partial indicies, it never removed words that occurred in more files than the allowable percentage. (This bug fix will be known as bug fix PIR.) * The -T/--temp-dir options were left out of the option specification for index(1) so you could never specify these options on the command line. (This bug fix will be known as bug fix TDO.) * Added a call to setsid() for becomming a search daemon. It should have been there all along. (This bug fix will be known as bug fix SID.) CHANGES, file-by-file --------------------- * bcd.c * bcd.h 1. Added these files for feature SIF. * config.h 1. Added #error directives to force people to read config.h and set important values for their system. * exit_codes.h 1. Added Exit_No_Unlink. * file_list.c 1. Rewrote file_list::calc_size() for bug fix PIR. I was counting spaces. Why? I don't know. There were never any spaces in the file list data. As far as I can tell, this never worked. 2. Rewrote file_list::const_iterator::operator++() to parse index file word data in new format for feature SIF. * file_list.h 1. Performed following substitution: s/ptr/ptr_/ 2. Performed following substitution: s/char const/unsigned char const/ for feature SIF since we now have to deal with byte values greater than 0x7F. * index.c 1. In merge_indicies() and write_word_index(), changed the format in which the index file is written from ASCII to BCD for feature SIF. 2. It seems as though the merge_indicies() code was slightly broken. While it appeared to merge properly, I don't think it ever threw out words that exceeded any thresholds. I rewrote chunks of it for bug fix PIR. 3. Factored out some code into a new is_too_frequent() function. 4. In write_meta_name_index(), changed the ASCII numerical output to BCD for feature SIF. 5. In write_full_index(), changed the comment describing the format of the index file for feature SIF. 6. In main(), added -T/--temp-dir to option spec. for bug fix TDO. * Makefile * Makefile.win32 1. Added bcd.o target and dependencies for feature SIF. 2. Added word_info.o target and dependencies. * man/man1/index.1 1. Added Exit_No_Unlink to exit codes section. * man/man4/swish++.index.4 1. Changed the desription of the index file format for feature SIF. * search.c 1. In get_meta_id(), changed the parsing of the META ID from ASCII to BCD for feature SIF. 2. In become_daemon(), added call to setsid() for bug fix SID. * version.h 1. Updated version to "4.1". * word_info.c 1. Added this file to factor out the code for writing META IDs. * word_info.h 1. Added write_meta_ids() function declaration. ******************************************************************************* 4.0 ******************************************************************************* NEW FEATURES ------------ * 'search' now has the ability to run in the background as a multi-threaded daemon process functioning as a search server. (This feature will be known as feature MSD.) CHANGES, file-by-file --------------------- * config/config.mk 1. Added SEARCH_DAEMON, PTHREAD_LIB, and SOCKET_LIB variables for feature MSD. * config/Makefile 1. Added src/errno$(CCCEXT) dependency to $(TARGET) for feature MSD. 2. Added src/socklen$(CCCEXT) dependency to $(TARGET) for feature MSD. * config/config-sh 1. Performed following substitution: s/trap "x=$?; rm -f *$CCOEXT $TARGET; exit $x" 0 1 2 15/ trap "rm -f *$CCOEXT $TARGET; exit 1" 0 1 2 15 since it wasn't saving the exit code and always exiting with 0 (I don't know why.) This was done for feature MSD. 2. Added code to handle the new ERROR case for feature MSD. * INSTALL.win32 1. Added reference to Makefile.win32. * Makefile 1. Added new objects and dependencies for feature MSD. 2. Added new DEBUG_threads flag for feature MSD. * Makefile.win32 1. Added this not including the objects and dependencies for feature MSD until somebody helps me port MSD to Windows. * man/man1/search.1 1. Added description of feature MSD. * man/man4/swish++.conf.4 1. Added new configuration file variables for feature MSD. * README 1. Added synopsis of feature MSD. * search.c 1. Added #include of headers for socket-related stuff for feature MSD. 2. Added global "am_daemon" variable for feature MSD. 3. Added a "set<string>&" parameter to parse_meta(), parse_primary(), and parse_query() functions to collect the stop words found for feature MSD. 4. In main(), moved command-line option processing code into new search_options object for feature MSD. 5. In main(), moved search-request code into new search() and service_request() functions for feature MSD. 6. Added new become_daemon() function for feature MSD. 7. Added "out" parameters to dump_single_word() and dump_word_window() functions for feature MSD. 8. In both dump_single_word() and dump_word_window(), made lower_word a managed_vec<char> if compiled as a search daemon. 9. In usage() added description for new options for feature MSD. * managed_ptr.h * SearchDaemon.h * search_thread.c * search_thread.h * SocketFile.h * SocketQueueSize.h * src/socklen.c * thread_pool.c * thread_pool.h * ThreadsMax.h * ThreadsMin.h 1. New files for feature MSD. * stem_word.c 1. In stem_word(), added mutex around access to cache for feature MSD. * util.c * util.h 1. Added to_lower_r() function for feature MSD. * version.h 1. Updated version to "4.0". ******************************************************************************* 3.1 ******************************************************************************* NEW FEATURES ------------ * All executables now accept alternate long ("GNU-style") command-line options. (This feature will be known as feature LOPT.) * Added -? option to print usage ("help") message. (This feature will be known as feature HELP.) * Added new -h option and HTMLExtention configuration variable to allow filename extensions that are to be treated as HTML to be specified. (This feature will be known as feature HEXT.) * In 'search', allow max-results to be specified as 0. This allows the -R option to be eliminated. (This feature will be known as feature MR0.) * Added code to max-out various resource limits since SWISH++ is resource- intensive. This may alleviate out-of-memory conditions on some platforms. (This feature will be known as feature MAXR.) * Added code to map ISO 8859-1 (Latin 1) characters to their closest ASCII equivalent so that they are treated eactly like character entity references. This should also improve use of SWISH++ in other languages. (This feature will be known as feature ISOMAP.) BUG FIXES --------- * Feature CLASS has had a subtle bug in it since its inception in version 2.0. It was over-zealous in closing HTML elements. For example, given: <TABLE> <TR> <!-- 2 --> <TD> <!-- 1 --> <TABLE> <!-- 0 --> <TR> <TD> Hello </TD> </TABLE> <!-- trouble --> </TD> </TABLE> The "trouble" tag should have stopped closing elements at tag "0" but it kept closing elements "1" and "2" because </TD> and </TR> have </TABLE> in their set of close tags. The fix is to make a tag check to see whether the element it's closing is its own start tag: if so, stop. (This bug fix will be known as bug fix CLASS1.) * Really long start tags, those with lots of attributes, exceeding 128 characters, would cuase a buffer overflow and, occasionally, a core dump. The fix is to change the way getting the name of tags is done. (This bug fix will be known as bug fix RLST.) CHANGES, file-by-file --------------------- * config/config/mk 1. Added suffix and rule for ".in" files. * conf_bool.h * conf_int.h * conf_set.h 1. Added reset() member function. 2. Added var_name argument to parse_value() for feature HEXT. * conf_string.h 1. Added reset() member function. 2. Changed use of "char const*" to "string" so as not to have to sorry about properly managing the value. 3. Added var_name argument to parse_value() for feature HEXT. * conf_var.c 1. Added alias_name() function for feature HEXT. 2. In parse_const_value() and parse_line(), added var_name argument to parse_value() call for feature HEXT. 3. In msg(), performed following substitution: s/cerr/o/ It should have been that way all along. 4. Made parse_config_file() a member function. 5. Added reset_all() member function. 5. In map_ref(), added "HTMLExtension" for feature HEXT. * conf_var.h 1. Added alias_name() function for feature HEXT. 2. Made error() use the 'o' reference instread of cerr. 3. Added argument to warning(). 4. Added reset_all() member function. * config.h 1. Added WordPercentMax_Default. It should have been there all along. 2. Added Tag_Name_Max_Size for bug fix RLST. * do_file.c 1. Removed is_html_ext() for feature HEXT. 2. Changes MAXNAMLEN to NAME_MAX. 3. In do_file(), changed access to include_extensions for feature HEXT. * entities.c * entities.h 1. Moved num_entities[] to word_util.[ch] for feature ISOMAP. * exit_codes.h 1. Flipped errors 50 and 51. * extract.c 1. Removed MAXNAMLEN in favor of NAME_MAX in util.h. 2. Replaced getopt() code with opt_stream code for feature LOPT. 3. In main(), added code to max-out RLIMIT_CPU for feature MAXR. 4. In main(), added code for new -? option for feature HELP. 5. In usage(), added long option descriptions for feature LOPT. 6. In usage(), added description for -? (and --help) options for features HELP and LOPT, respectively. 7. Corresponding change to file_vector.h item #1. 8. Performed following substitution: s/ERROR/error()/ 9. In extract_words(), added call to iso8859_to_ascii() for feature ISOMAP. * file_index.h 1. Corresponding changes for file_vector.h item #1. * file_info.c 1. Performed following substitution: s/num_files_reserve/files_reserve/ * file_vector.h 1. Detemplatized file_vector class to be only of char. I realized that file_vectors of other types may not have their elements suitably alogned and could cause alighment faults. * file_vector.c 1. Corresponding change to file_vector.h item #1. 2. In file_vector::init(), added code to max-out RLIMIT_VMEM for feature MAXR. 3. For Unix, error() now returns the value of the standard errno variable. 4. In open(), made it so that it won't attempt to mmap() the file if it has zero size; instead set error to ENODATA. * FilterExtension.c 1. Added missing #include <cstdlib> * FilterExtension.h 1. Added reset() member function. * html.c 1. Added #include "word_util.h" 2. Corresponding change to file_vector.h item #1. 3. Made find_attribute() insist that the "attribute" argument be passed as lower case. This way, it doesn't have to convert it to lower case. 4. Made tag_cmp() insist that the "tag" argument be passed as lower case. This way, it doesn't have to convert it to lower case. 5. In parse_html_tag(), corresponding changes for items #3 and #4. 6. In parse_html_tag(), added code for bug fix CLASS1. 7. In parse_html_tag(), changed first call to to_lower( begin, end ) to be inline code for bug fix RLST. 8. In convert_entity(), performed following substitution: s/num_entities/iso8859_map/ for feature ISOMAP. * html.h 1. Corresponding change to file_vector.h item #1. * index.c 1. Same as extract.c item #1. 2. Same as extract.c item #2. 3. Same as extract.c item #3. 4. Same as extract.c item #4. 5. Same as extract.c item #5. 6. Same as extract.c item #6. 7. Same as extract.c item #7. 8. Same as extract.c item #8. 9. Same as extract.c item #9. A. In main(), added code to max-out RLIMIT_AS and RLIMIT_DATA for feature MAXR. B. In usage(), added missing description for -T option. C. Same as file_info.c #1 plus: s/num_files_reserve_arg/files_reserve_arg/ * index.h 1. Corresponding change to file_vector.h item #1. * IndexFile.h 1. Corresponding change to conf_string item #2. * itoa.c * itoa.h 1. Moved ltoa() into its own file. * Makefile 1. Added dependencies for option_stream.[cho] for feature LOPT. 2. Added dependencies for itoa.[ch]. 3. Made TARGET names more explicit. 4. Performed following substitution: s/the.index/swish++.index/ This should have been done in release 3.0. * man/man1/extract.1 * man/man1/index.1 * man/man1/search.1 1. Added descriptions for feature LOPT. * my_set.h 1. Eliminated my_set::key_type since it's already defined in set. 2. Made base_type private. 3. Changed what was "string_set" to "char_ptr_set" and added a new "string_set" that really is a set of strings. * option_stream.c * option_stream.h 1. Added for feature LOPT. * ResultsMax.h 1. Changed allowable lower bound for ResultsMax to 0 for fearure MR0. * search.c 1. Same as extract.c item #2. 2. In main(), added code to max-out RLIMIT_AS for feature MAXR. 3. Changed use of istrstream to token_stream. 4. In parse_meta(), made tokens const. 5. In parse_meta() and parse_optional_relop(), changed lines of the form: t.put_back(); to: query.put_back( t ); 6. In parse_primary(), used improved less_stem to eliminate conditional calls of binary_search() and equal_range(). 7. Same as extract.c item #5. 8. In main(), added call to setlocale(3) for feature LOCALE. * stem_word.h 1. Enhanced less_stem class to accept a Boolean argument whether to stem or not. 2. Made stem_word() a member function. * stem_word.c 1. Corresponding change to stem_word.h item #2. 2. Added #include "word_util.h" * stop_words.c 1. Added #include "word_util.h" 2. Performed following substitution: s/ERROR/error()/ 3. Corresponding change to file_vector.h item #1. 4. Added call to iso8859_to_ascii() for feature ISOMAP. * StopWordFile.h * TempDirectory.h 1. Corresponding change to conf_string item #2. * token.c 1. Removed token::hold() in favor of new token_stream class. 2. In operator>>(), replaced call to to_lower() with ::transform() thus making operator>>() thread-safe (since to_lower() isn't thread-safe). 3. Added #include "word_util.h" 4. Added call to iso8859_to_ascii() for feature ISOMAP. * token.h 1. Added token_stream class to hold "put back" tokens. This makes it thread-safe. * util.c 1. Moved ltoa() to itoa.c so it's in its own .o file since only index(1) uses it so there's no reason for it to be linked into extract(1) or search(1). 2. Added max_out_limit() for feature MAXR. 3. Moved is_ok_word() to word_util.h thereby making it easier for others to customize or replace it with a custom one (perhaps with different heuristics for other languages). 4. Moved parse_config_file() to be a member of conf_var. 5. Corresponding change to file_vector.h item #1. * util.h 1. Added definition of NAME_MAX to replace MAXNAMLEN to be more POSIXly correct. 2. Correpsonding change as do_file.c item #1. 3. Added declaration of max_out_limit() for feature MAXR. 4. Moved declarations of ltoa() and itoa() to itoa.h. 5. The is_ok_word(), is_vowel(), is_word_begin_char(), is_word_char(), and is_word_end_char() have been moved to word_util.h thereby making it easier for others to customize or replace them with custom ones (perhaps with different heuristics for other languages). 6. Replaced ERROR macro with error() function. 7. Added error_string() function. 8. Corresponding change to file_vector.h item #1. * version.h 1. Updated version to "3.1". * word_index.c * word_index.h 1. Corresponding change to file_vector.h item #1. * WordPercentMax.h 1. Added WordPercentMax_Default to constructor. (It should have been here since version 3.0.) * word_util.c 1. Corresponding changes to util.c item #3 and util.h item #5. 2. Moved num_entities[] definition from entities.c to word_util.c and renamed it iso8859_map[] for feature ISOMAP. * word_util.h 1. Corresponding changes to util.c item #3 and util.h item #5. 2. Moved num_entities[] declaration from entities.h to word_util.h and renamed it iso8859_map[] for feature ISOMAP. 3. Added iso8859_to_ascii() function for feature ISOMAP. * WWW.pm 1. Made regular expression for e_mail more accurate. * www_example/search.cgi 1. Added code to pass along -s option to do stemming. * www_example/search.html 1. Added checkbox for stemming. ******************************************************************************* 3.0.3 ******************************************************************************* NEW FEATURES ------------ * A -H option has been added to 'index' to dump the built-in set of recognized HTML elements to standard output (so you can check to see if a certain tag is recognized or not). (This feature will be known as feature OPTH.) * Boolean configuration file variables now accept "on" and "off" values. (This feature will be known as feature ON_OFF.) BUG FIXES --------- * There was a small memory leak when indexing META names. (This bug fix will be known as bug fix ML1.) * Reporting errors in a configuration file says what line number the error is on. However, the same error-reporting code is also used to print errors when command-line arguments are invalid. The line number variable wasn't cleared so it would print an erroneous line number for an invalid command-line option. (This bug fix will be known as bug fix CLN.) * Parsing of Boolean values in configuration files was completely broken. (This bug fix will be know as bug fix PBV.) * WWW::extract_description() did it wrong for ALT attributes with an empty value, i.e., ALT="". (This bug fix will be known as bug fix ADE.) CHANGES, file-by-file --------------------- * conf_bool.c 1. In parse_value(), added code to accept "on" and "off" for feature ON_OFF. 2. In parse_value(), added '!' characters before ::strcmp() calls for bug fix PBV. * conf_bool.h * conf_int.h * conf_string.h 1. Made assignment operators protected since (1) they're not inherited and (2) it's an abstract class. * conf_int.c * conf_string.c 1. Performed following substitution: s/cerr/error()/ * conf_var.c 1. Corresponding change to conf_var.h #1 2. In parse_line(), added: current_config_file_line_no_ = 0; for bug fix CLN. * conf_var.h 1. Made msg() accept an ostream& to write to. 2. Performed following substitution: s/string/std::string/ * do_file.c 1. Corresponding change to my_set.h #1. * elements.c 1. Added element_map::instance() for feature OPTH. 2. Added explicit case for element::forbidden. * elements.h 1. Corresponding change for elements.c item #1. 2. Made element_map::element_map() private for feature OPTH. 3. Added operator<<( ostream&, element_map::value_type const& ) for feature OPTH. * extract.c 1. Corresponding change to my_set.h #1. 2. In usage(), performed following substitution: s/Dump default stop-words/Dump stop-words/ since it dumps whatever stop-words are being used, not just the built-in default set. * filter.h 1. Performed following substitution: s/string/std::string/ * FilterExtension.c 1. Performed following substitution: s/cerr/error()/ * html.c 1. In convert_entity(), changed access to the char_entity_map for feature OPTH. 2. In parse_html_tag(), corresponding change to my_set.h #1. 3. In parse_html_tag(), corresponding change to elements.c #1. 4. In parse_html_tag(), changed the way META names are looked up for bug fix ML1. Specifically, we no longer unconditionally do a strdup(): this was the source of the memory leak. * index.c 1. Added #include "elements.h" for feature OPTH. 2. In main() and usage(), added code for feature OPTH. 3. Corresponding change to my_set.h #1. 4. Corresponding change to extract.c #2. * less.h 1. Started using binary_function's first_argument_type, second_argument_type, and result_type typedefs. * Makefile 1. Added dependency for index.c on elements.h feature OPTH. * man/man1/index.1 1. Added description for new -H option for feature OPTH. 2. Mentioned which verbosity level is the default. 3. Added a reference to the "Index of Elements" in the HTML 4.0 specification. * man/man4/swish++.conf.4 1. Added "on" and "off" for feature ON_OFF. * my_set.h 1. Performed following substitution: s/find/contains/ to distinguish it from STL find() functions that return iterators. * search.c 1. Corresponding change to my_set.h #1. 2. In usage(), removed "standard out" verbiage. * stem_word.c 1. In stem_word(), removed use of char_buffer_pool. * stem_word.h 1. Corresponding change to less.h item #1. * util.c 1. Performed following substitution: s/string/std::string/ * util.h 1. Used S_ISxxx() macros for file tests rather than S_IFxxx. * version.h 1. Updated version to "3.0.3". * WWW.pm 1. Changed lines 103 and 104 from: $s =~ s/<[^>]+?ALT\s*=\s*(['"])([^>]+)\1[^>]*?>/$2/gi; $s =~ s/<[^>]+?ALT\s*=\s*(['"])([^'"]+)\1?\s*$/$2/i; to: $s =~ s/<[^>]+?ALT\s*=\s*(['"])([^>]*?)\1[^>]*?>/$2/gi; $s =~ s/<[^>]+?ALT\s*=\s*(['"])([^'"]*)\1?\s*$/$2/i; for bug fix ADE. ******************************************************************************* 3.0.2 ******************************************************************************* BUG FIXES --------- * The -r option for index and extract was broken by release 3.0; it's fixed now. (This bug fix will be known as bug fix DASHR.) CHANGES, file-by-file --------------------- * directory.c 1. On line 104, reversed the order of the conditions to now be: if ( is_directory( path ) && recurse_subdirectories ) for bug fix DASHR. For directories, a stat(2) wasn't being performed so the is_plain_file() call in do_file() didn't work. * extract.c * index.c 1. In main(), performed following substitutions for command line argument variables: s/char*/char const*/ * search.c 1. In main(), performed following substitutions for command line argument variables: s/char*/char const*/ 2. Performed following substitutions: s/dump_match/dump_match_arg/ s/dump_window_size/dump_window_size_arg/ s/skip_results/skip_results_arg/ * version.h 1. Updated version to "3.0.2". ******************************************************************************* 3.0.1 ******************************************************************************* BUG FIXES --------- * The code failed to compile under g++ 2.95 because it caught errors that previous versions of g++ allowed to compile. (This bug fix will be known as GCC2.95.) * There were a few mistakes in the section 1 manual pages to cover all the changes to version 3.0. (This bug fix will be known as MAN3.) CHANGES, file-by-file --------------------- * elements.c 1. On line 276, added an intermediate cast to int to get rid of an error trying to convert directly from a char* to an enum for bug fix GCC2.95. * index.c 1. In rank_full_index(), added another local scope for bug fix GCC2.95. * man/man1/extract.1 * man/man1/index.1 * man/man1/search.1 1. Performed following substitution: s/the.index/swish++.index/ for bug fix MAN3. 2. Fixed some formatting errors. * man/man4/swish++.conf.4 1. Fixed some formatting errors. * search.c 1. Performed following substitution: s/result_type/results_type/ s/sorted_result_type/sorted_results_type/ and added new result_type type for bug fix GCC2.95. 2. In main(), performed following substitution;: s/typedef vector< result_type::value_type > sorted_result_type; /typedef vector< result_type > sorted_results_type;/ for bug fix GCC2.95. * util.h 1. Rewrote is_directory() and is_plain_file() in terms of file_exists(). * version.h 1. Updated version to "3.0.1". * word_index.h 1. Added definitions for: word_index::const_iterator::operator+=() word_index::const_iterator::operator-=() for bug fix GCC2.95. ******************************************************************************* 3.0 ******************************************************************************* NEW FEATURES ------------ * SWISH++ now allows flexible file filtering for extraction and indexing. (This feature will be known as feature FFF.) * SWISH++ now allows configuration files since they were necessary for feature FFF. If I had to add them, I might as well do it right. (This feature will be known as feature CONF.) * SWISH++ now compiles and runs under Windows (95/98/NT). (This feature will be known as feature WIN32.) * 'index' now accepts a -T option that allows the directory to use for temporary files to be specified. (This feature will be known as feature TEMP.) * 'index' and 'extract' now report the number of files examined in addition to the number indexed or extracted, respectively. (This feature will be known as feature EXAM.) BUG FIXES --------- * In the admitedly rare case of a malformed HTML file ending in a '<' character (without a newline, i.e., '<' is the *VERY* last character in the file), 'index' would core-dump. (This bug fix will be known as bug fix EGT.) CHANGES, file-by-file --------------------- * conf_bool.c * conf_bool.h * conf_int.c * conf_int.h * conf_set.c * conf_set.h * conf_string.c * conf_string.h * conf_var.c * conf_var.h * ExcludeClass.h * ExcludeExtension.h * ExcludeMeta.h * FilesReserve.h * filter.c * filter.h * FilterExtension.c * FilterExtension.h * FollowLinks.h * IncludeExtension.h * IncludeMeta.h * IndexFile.h * man/man4/swish++.conf.4 * RecurseSubdirs.h * ResultsMax.h * StemWords.h * StopWordFile.h * TitleLines.h * Verbosity.h * WordFilesMax.h * WordPercentMax.h 1. New files for feature CONF. * config.h 1. Added Config_Filename_Default for feature CONF. 2. Performed following substitution: s/the.index/swish++.index/ * config/config.mk 1. Added -DWIN32 to CCFLAGS for feature WIN32. 2. Added more comments to CCFLAGS. 3. Added CCLINK for feature WIN32. 4. Added a "You shouldn't have to change anything below this line" line. 5. Added more comments for the "Manual pages" section and the DISTILL variable. 6. Added .SUFFIXES at bottom. * config/config-sh 1. Renamed from config.sh so some versions of make don't get confused with the .sh suffix and try to build it. 2. Define PJL_NO_SYMBOLIC_LINKS if WIN32 is defined for feature WIN32. * config/Makefile 1. Removed test for bool type: bool is now a requirement of the C++ compiler. This was necessary for feature CONF since it specializes a template on bool. 2. Performed following substitution: s/config.sh/config-sh/ corresponding to config/config-sh item #1. * config/src/bool.c 1. This file was removed corresponding to config/Makefile item #1. * directory.c 1. Performed following substitutions: s/bool recurse_subdirectories/RecurseSubdirs recurse_subdirectories/ s/int verbosity/Verbosity verbosity/ for feature CONF. 2. Added PJL_NO_SYMBOLIC_LINKS for WIN32. 3. Moved definition of stat_buf to util.c. * directory.h 1. Include platform.h for new PJL_NO_SYMBOLIC_LINKS symbol. 2. Moved stat_buf and file test functions to util.h. * do_file.c 1. The common code between 'index' and 'extract' was moved here. 2. The increment of "num_examined_files" was added for feature EXAM. * exit_codes.h 1. New header file. * extract.c 1. Added explicit definition of MAXNAMLEN under Windows for feature WIN32. 2. Performed following substitutions: s/string_set exclude_extensions/ExcludeExtension exclude_extensions/ s/string_set include_extensions/IncludeExtension include_extensions/ for feature CONF. 3. Corresponding change to directory.c item #1. 4. Added extract_words() function to parallel index.c's index_words() function. 5. In main(), redid the way in which command line options are processed such that they take precedence over configuration file variables for feature CONF. 6. In main(), made -l option conditional on whether we're compiling under Window or not for feature WIN32. 7. In main(), added -c option for feature CONF. 8. In main(), added code to test whether a file or directory actually exists before calling do_directory or do_file(). 9. Moved code for do_file() to do_file.c to factor out code common between extract and index. A. In usage(), added description of -c option for feature CONF. B. In usage(), made description of -l option conditional on Windows for feature WIN32. C. Changed all calls to exit(3) to use new exit code enums. D. Added "num_examined_files" global variable for feature EXAM. E. In main(), added code to print "num_examined_files" for feature EXAM. * fake_ansi.h 1. Removed __cplusplus test. 2. Removed section for bool type: bool is now a requirement of the C++ compiler. This was necessary for feature CONF since it specializes a template on bool. * file_index.h * file_index.c 1. Removed #include "fake_ansi.h" since bool is now required. * file_list.c 1. Added #include "fake_ansi.h". 2. Removed erroneous #include "html.h". * file_vector.h 1. Added #include's for Windows for feature WIN32. 2. Added conditional compilation for file_vector_base's size_type and fd_ for Windows for feature WIN32. * file_vector.c 1. Removed #include "fake_ansi.h" since bool is now required. 2. Added conditional compilation for Windows for feature WIN32. * html.c 1. Performed following substitutions: s/no_index_class_count/exclude_class_count/ s/no_index_class_names/exclude_class_names/ 2. In parse_html_tag(), added: if ( c == end ) return; for bug fix EGT. * html.h 1. Performed following substitutions: s/no_meta_id/No_Meta_ID/ s/meta_id_not_found/Meta_ID_Not_Found/ to make all enum's have capital letters. * index.c 1. Corresponding change to extract.c item #1. 2. Corresponding change to extract.c item #2. 3. Corresponding change to extract.c item #5. 4. Corresponding change to extract.c item #6. 5. Corresponding change to extract.c item #7. 6. Corresponding change to extract.c item #8. 7. Corresponding change to extract.c item #9. 8. Corresponding change to extract.c item #A. 9. Corresponding change to extract.c item #B. A. Corresponding change to extract.c item #C. B. Performed following substitutions: s/no_index_class_count/exclude_class_count/ s/no_index_class_names/exclude_class_names/ C. Performed following substitutions: s/int num_files_reserve/FilesReserve num_files_reserve/ s/int num_title_lines/TitleLines num_title_lines/ s/int word_file_file_max/WordFilesMax word_file_max/ s/int word_file_percent_max/WordPercentMax word_percent_max/ for feature CONF. D. In main() and write_partial_index(), added "ios::binary" to "out" ofstream for feature WIN32. E. In main(), added code for -T option for feature TEMP. F. Corresponding change to extract.c item #D. G. Corresponding change to extract.c item #E. * index.h 1. Corresponding change as html.h #1. * INSTALL.unix 1. Remaned from INSTALL due to introduction of INSTALL.win32 * INSTALL.win32 1. New file for feature WIN32 * Makefile 1. Added more comments for DEBUG options. 2. Added new targets for feature CONF. 3. Redid a lot of dependencies as a result. * man/man1/index.1 1. Added descriptions of configuration file variable for feature CONF. 2. Added Filters subsection to DESCRIPTION for feature FFF. 3. Added description of -c option for feature CONF. 4. Added caveat that the -l option is not available under Windows for feature WIN32. 5. Added description of -T option for feature TEMP. 6. Added CONFIGURATION FILE section for feature CONF. 7. Added Filters subsection to EXAMPLES for feature FFF. 8. Expanded EXIT STATUS section to list specific exit codes. 9. Added compress(1), gunzip(1), gzip(1), uncompress(1), and swish++.conf(4) to SEE ALSO section. * man/man1/extract.1 1. Added descriptions of configuration file variable for feature CONF. 5. Added caveat that the -l option is not available under Windows for feature WIN32. 6. Expanded EXIT STATUS section to list specific exit codes. 7. Added swish++.conf to FILES section for feature CONF. 8. Performed following substitution: s/the.index/swish++.index/ 9. Added swish++.conf(4) to SEE ALSO section for feature CONF. * man/man1/search.1 1. Added decription of -c option for feature CONF. 2. Added CONFIGURATION FILE section for feature CONF. 3. Expanded EXIT STATUS section to list specific exit codes. 4. Added swish++.conf to FILES section for feature CONF. 5. Performed following substitution: s/the.index/swish++.index/ 6. Added swish++.conf(4) to SEE ALSO section for feature CONF. * man/man4/Makefile 1. Corresponding change to swish++.index.4 item #1. 2. Added swish++.conf.4 for feature CONF. * man/man4/swish++.index.4 1. This file was renamed from swish++.4. * search.c 1. Performed following substitution: s/bool stem_words/StemWords stem_words/ for feature CONF. 2. Corresponding change as html.h #1. 3. Corresponding change to extract.c item #C. 4. Corresponding change to extract.c item #5. 5. Corresponding change to extract.c item #A. * stop_words.c 1. Added local static variable to constructor. 2. Corresponding change to extract.c item #C. * stop_words.h 1. Removed private static data member. * swish++.conf 1. Added template configuration for feature FFF. * token.c 1. Performed following substitution: s/fake_ansi.h/platform.h/ since bool is now required. * util.c 1. Moved stat_buf here from directory.h. 2. Added parse_config_file() for feature CONF. * util.h 1. Corresponding change to util.c item #1. 2. Moved file test functions here from directory.h. 3. orresponding change to util.c item #2. * version.h 1. Updated version to "3.0". * word_index.c 1. Removed #include "fake_ansi.h" since bool is now required. * word_index.h 1. Removed #include "fake_ansi.h" since bool is now required. * word_info.h 1. Corresponding change as html.h #1. ******************************************************************************* 2.0.1 ******************************************************************************* BUG FIXES --------- * The code parsed HTML attributes inside HTML comments. This is (obviously) the wrong thing to do. HTML comments declarations are now really, really ignored. Honest. (This bug fix will be known as bug fix ACP.) * The code parsed HTML attributes inside <!DOCTYPE ...> declarations. This is also (obviously) the wrong thing to do. <!DOCTYPE...> declarations are now also ignored. (This bug fix will be known as bug fix EXP.) * The set of HTML end tags that close some HTML elements was incomplete. (This bug fix will be known as bug fix HC1.) CHANGES, file-by-file --------------------- * elements.c 1. For the <colgroup> element, added <colgroup> for bug fix HC1. 2. For the <td> element, added <tbody>, </tbody>, </td>, <tfoot>, </tfoot>, <tr>, and </tr> for bug fix HC1. 3. For the <tfoot> element, added <tbody> and <thead> for bug fix HC1. 4. For the <th> element, added <tbody>, </tbody>, <tfoot>, </tfoot>, </th>, <tr>, and </tr> for bug fix HC1. 5. For the <thead> element, added <tbody> and <tfoot> for bug fix HC1. 6. For the <tr> element, added <tbody>, </tbody>, <tfoot>, </tfoot>, and </thead> for bug fix HC1. * html.c 1. In parse_html_tag(), added "if ( ... ) return;" around call to skip_html_tag() for bug fix ACP. 2. In parse_html_tag(), added check to see if first character of an HTML tag is '!' for bug fix EXP. 3. In skip_html_tag(), changed return type to "bool" and added "return" statements for bug fix ACP. * version.h 1. Updated version to "2.0.1". ******************************************************************************* 2.0 ******************************************************************************* NEW FEATURES ------------ * SWISH++ can now selectively not index text in HTML files within HTML elements that are members of specified classes. (This feature will be known as feature CLASS.) * The 'search' command now offers optional stemming. Indexing is unaffected. (This feature will be known as feature STEM.) * In all earlier versions, the number of total words reported was actually the total number of words indexed; now, it is the total number of words parsed and the former "total words" is now reported as the number of words indexed. (This feature will be known as feature NTW.) * The 'search' command now outputs an additional comment "results" followed by the total number of search results. Additionally, there is a new -R command- line option to print this alone. (This feature will be known as feature PRC.) CHANGES, file-by-file --------------------- * elements.c * elements.h 1. Added these files for feature CLASS. * html.c 1. Added #include "elements.h" for feature CLASS. 2. Added extern references to no_index_class_names and no_index_class_count corresponding to index.c #1. 3. Performed the following substitution: s/to_upper/to_lower/ to eliminate the to_upper() function entirely. 4. In grep_title(), performed the following substitution: s/TITLE/title/ so we can eliminate the to_upper() function entirely. 5. In parse_html_tag(), corresponding change for html.h #1. 6. In parse_html_tag(), added code for feature CLASS. * html.h 1. For parse_html_tag() function, added: bool is_new_file = false for feature CLASS. * index.c 1. Added global variables: string_set no_index_class_names; int no_index_class_count; for feature CLASS. 2. Added global variable: long num_indexed_words; for feature NTW. 3. In main(), added -C option for feature CLASS. 4. In main(), added code to print num_indexed_words for feature NTW. 5. In index_word(), performed following substitution: s/num_total_words/num_indexed_words/ for feature NTW. 6. In index_word(), added new: ++num_total_words; for feature NTW. 7. In index_word(), added code to test no_index_class_count for feature CLASS. 8. In index_words(), added: static bool new_file; variable for feature CLASS. 9. In usage(), added description for -C option for feature CLASS. A. In merge_indicies(), changed write-header code to neither allocate nor write the offsets for stop words or meta names if there are zero of them. B. In rank_full_index(), added check to see if there are no indexed words: if not, return. C. In write_full_index(), added check to see if there are no indexed words: if not, return. D. In write_full_index(), changed write-header code to neither allocate nor write the offsets for stop words or meta names if there are zero of them. * Makefile 1. Added -DDEBUG_parse_class for feature CLASS. 2. Added elements.o object for feature CLASS. 3. Added -DDEBUG_stem_word for feature STEM. 4. Added target for stem_word.o for feature STEM. * man/man1/index.1 1. Added description for -C option and examples for feature CLASS. * man/man1/search.1 1. Added description of new stemming option for feature STEM. 2. Added description of new -R option for feature PRC. * search.c 1. Added global variable: bool stem_words; for feature STEM. 2. In main(), performed following substitution: s/dDi:m:Ms:SVw:/dDi:m:Mr:RsSVw:/ for features PRC and STEM. 3. In main(), changed what was option 's' to option 'r' and added a new option 's' for feature STEM. 4. In main, added a new -R option for feature PRC. 5. In parse_primary(), added "less_stem" object to word_token case as well as having two exclusive calls to binary_search() and equal_range() depending upon stem_words for feature STEM. 6. In usage(), corresponding changes to items #3 and #4. * stem_word.c * stem_word.h 1. Added these files for feature STEM. * postscript.h 1. Added more comments. * util.c 1. Moved is_vowel() function to util.h and made it so that it does not call tolower(). 2. In is_ok_word(), performed following substitution: s/is_vowel( *c )/is_vowel( tolower( *c ) )/ corresponding to item #1. 3. In ltoa() and to_lower(), made use of new char_buffer_pool class. * util.h 1. Added char_buffer_pool class since its functionality is being used 3 times now. 2. Moved is_vowel() function here from util.c. 3. Added lots more comments. * version.h 1. Updated version to "2.0". ******************************************************************************* 1.7 ******************************************************************************* NEW FEATURES ------------ * Since version 1.4, SWISH++ indexed the text in the ALT attributes of AREA and IMG elements. SWISH++ now adds a few attributes. The complete set is: Attribute Element --------- ------- TITLE any ALT AREA, IMG, INPUT STANDBY OBJECT SUMMARY TABLE (This feature will be known as IEA.) * Added Word_Min_Vowels to config.h so vowel checks can be disabled (or made more stringent). (This feature will be know as feature WMV.) BUG FIXES --------- * When a given word appeared through many files, its ranks came out rather "flat" in the search results. This has been fixed. (This bug fix will be known as bug fix 10K.) CHANGES, file-by-file --------------------- * config.h 1. Added Word_Min_Vowels definition for feature WMV. * extract.c 1. Split out function extract_word() from do_file() to parallel changes in index.c. 2. Moved 'in_postscript' variable to be at file scope due to #1. 3. In do_file(), added missing 'const' to declaration: static ext_proc_map const ext_procs; It should have been there all along. * html.c 1. Added declarations for find_attribute(), skip_html_comment() and skip_html_tag() to the top of the file. They should have been there all along. 2. In convert_entity(), added missing 'const' to declaration: static chat_entity_map const char_entities; It should have been there all along. 3. Modified find_attribute() so that the 'begin' and 'end' iterators are touched only if the attribute is found. 4. Split out function skip_html_tag() from parse_html_tag() because it's cleaner that way. 5. In parse_html_tag(), was able to eliminate the 'parse_elements' parameter due to #4. 6. In parse_html_tag(), added code for feature IEA. * html.h 1. Corresponding change for html.c #5. * index.c 1. Split out function index_word() from index_words() because it's cleaner that way. 2. Peformed following substitution: s/1000.0/10000.0/ for bug fix 10K. 3. In usage(), peformed following substitution for the -M option: s/in index/to index/ * man/man1/index.1 1. Additions for feature IEA. * util.c 1. In is_ok_word(), added Word_Min_Vowels for feature WMV. 2. In is_ok_word(), deleted 'consonants' variable since it wasn't being used. 3. Redid to_lower() function to use multiple buffers. 4. Overloaded to_lower() function to take a pair of iterators. * util.h 1. Corresponding change for util.c #4. * version.h 1. Updated version to "1.7". ******************************************************************************* 1.6 ******************************************************************************* NEW FEATURES ------------ * The value of the CONTENT attribute for META elements can now selectively be indexed based on the value of the NAME attribute, either by explicit inclusion or exclusion. (This feature will be known as feature MIE.) * The WWW Perl library has a new function, extract_meta(), that can extract the value of the CONTENT attribute from a META element having a given NAME attribute from an HTML file. This can be used to display meta information in search results, e.g., for a given search result, also display its author, publication date, etc. (This feature will be known as feature EMC.) BUG FIXES --------- * If parentheses were used in conjunction with 'not' in a query involving meta names, it didn't work, e.g.: search author = not hawking worked as expected, but: search author = not ( hawking ) didn't even though it is (supposed to be) equivalent. (This bug fix will be known as bug fix MNP.) CHANGES, file-by-file --------------------- * html.c 1. Added #include "my_set.h" for feature MIE. 2. At global scope, added declarations: extern string_set exclude_meta_name, include_meta_names; for feature MIE. 3. In function parse_html_tag(), added code for feature MIE. * index.c 1. Added declarations: string_set exclude_meta_name; string_set include_meta_names; for feature MIE. 2. In main(), added "m:M:" to opts[] and cases for 'm' and 'M' command-line options for feature MIE. 3. In usage(), added explanation of -m and -M options for feature MIE. * Makefile 1. Added dependency of my_set.h to html.o for feature MIE. * man/man1/index.1 1. Added description of new -m and -M command-line options for feature MIE. * man/man3/WWW.3 1. Added description for extract_meta() function for feature EMC. * search.c 1. Added "int = no_meta_id" to declarations and definitions of parse_meta() and parse_query() functions for bug fix MNP. 2. In parse_primary()'s lparen_token case, added "meta_id" to recursive call of parse_query() for bug fix MNP. * version.h 1. Updated version to "1.6". * WWW.pm 1. Added extract_meta() function for feature EMC. 2. Rewrote extract_description() in terms of extract_meta(). ******************************************************************************* 1.5.1 ******************************************************************************* NEW FEATURES ------------ * Both 'index' and 'extract' now have a new verbosity level 4 that prints filenames that are not indexed or extracted, respectively, and why. (This feature was added to help fix bug fix HTH.) (This feature will be known as feature IEV4.) * The 'httpindex' script's -v option now works exactly like that of 'index'. (This feature will be known as feature HTV.) BUG FIXES --------- * META attribute name parsing had a bug where the find_attribute() function could occasionally run past the 'end' of where it was supposed to look. (This bug fix will be known as bug fix FAE.) * The 'httpindex' script would hang if it told 'index' to index a file and, for whatever reason, 'index' couldn't since 'index' would silently skip the file. (This bug fix will be known as bug fix HTH.) * The WWW::extract_description() function returned the first $description::chars characters of a file untouched if the file did not end with one of the filename extensions matched by the pattern /\.(?:[a-z]?html?|txt)$/i. What it should do is return a null description. (This bug fix will be known as bug fix EDN.) * The 'httpindex' script didn't test the extracted description to see if it is null: if it is, it should not attempt to overwrite the original file with the description and instead just delete the file. (This bug fix will be known as bug fix HTND.) CHANGES, file-by-file --------------------- * extract.c 1. In main(), changed upper-bound for verbosity to 4 for feature IEV4. 2. In do_file(), added additional print statements for feature IEV4. 3. In usage(), changed message to show verbosity range as 0-4 for feature IEV4. * html.c 1. In find_attribute(), made it correctly skip attribute names that don't match for bug fix FAE. 2. In find_attribute(), made it so that 'c' is never incremented past 'end' (as it sometimes incorrectly was) for bug fix FAE. * httpindex.in 1. Performed following substitution: s/-v3/-v4/ for bug fix HTH. 2. Added code to test the extracted description to see if it is null for bug fix HTND. 3. If a file can not be overwriten with its description (using the -d option), a warning is now merely issued rather than dieing as in version 1.5. 4. Added code for feature HTV. * index.c 1. Same as extract.c #1. 2. Same as extract.c #2. 3. Same as extract.c #3. * man/man1/extract.1 * man/man1/index.1 1. Updated description for feature IEV4. * version.h 1. Updated version to "1.5.1". * WWW.pm 1. Added a "default case" to WWW::extract_description() for bug fix EDN. ******************************************************************************* 1.5 ******************************************************************************* NEW FEATURES ------------ * A new command, httpindex, has been added to assist in indexing files on remote servers. (This feature will be known as feature HTTP.) BUG FIXES --------- * The regular expressions in extract_description() in WWW.pm had some bugs. (This bug fix will be known as bug fix WRE.) * The ignore stop words feature (feature ISW) added in version 1.2 that was broken, fixed, and fixed again is being fixed yet again so that ignored words are reported even if there are no other results. (This bug fix will be known as bug fix ISW4.) CHANGES, file-by-file --------------------- * config/config.mk 1. Added PERL variable for feature HTTP. * config/config.pl 1. This Perl configuration script was added for feature HTTP. * extract.c 1. Moved #include <dirent.h> after <sys/types.h> for BSD systems. * file_vector.c 1. Modified the behavior of file_vector<T> not to return an error if the file being mapped is of zero length for feature HTTP. * httpindex.in 1. This Perl 5 script was added for feature HTTP. * index.c 1. In do_file(), added a check to skip an empty file since file_vector<T> now opens empty files for feature HTTP. 2. Moved #include <dirent.h> after <sys/types.h> for BSD systems. * INSTALLATION 1. A third prerequisite of Perl 5 was added for feature HTTP. 2. A fourth prerequisite of wget was added for feature HTTP. * Makefile 1. A target was added for httpindex for feature HTTP. 2. The Makefile now also installs WWW.pm since it it required by httpindex. * man/man1/httpindex.1 1. Added this manual page for feature HTTP. * man/man1/Makefile 1. Added targets for new httpindex.1 manual page for feature HTTP. * search.c 1. In main(), moved: if ( skip_results >= results.size() ) return 0; past the code that prints the stop words for bug fix ISW4. * version.h 1. Updated version to "1.5". * WWW.pm 1. This file has been moved up from the subdirectory www_example. 2. In extract_description(), the regular expressions to extract the META NAME=description descriptions had missing '?'s added for bug fix WRE. 3. In extract_description(), the regular expression to remove a trailing ALT attribute was fixed for bug fix WRE. ******************************************************************************* 1.4.1 ******************************************************************************* BUG FIXES --------- * The META names words were associated with was completely wrong. It worked in a small number of test cases (my original test cases -- figures), but not in the general case. (This bug fix will be known as bug fix MID.) * In 1.4, a given word could be associated with at most 1 meta name per file. This limitation was an oversight. It has been corrected. (This bug fix will be known as bug fix MMM.) CHANGES, file-by-file --------------------- * extract.c 1. Performed following substitution: s/string_set.h/my_set.h/ * file_list.c 1. Performed following substitution: s/meta_index/meta_id/ for bug fix MID. 2. Added code to read multiple meta-IDs for bug fix MMM. * file_list.h 1. Changed declaration of file_list::value_type to be simply word_info::file since the structures are the same. * file_list.c 1. Same as file_list.c #1. * html.h 1. Same as file_list.c #1. * index.c 1. Same as file_list.c #1. 2. Same as extract.c #1. 3. Added remove_tmp_files() function and set it to be called viat atexit() so that temporary files are removed even if the program terminates prematurely. 4. In index_words(), added code to add multiple meta-IDs for bug fix MMM. 5. In merge_indices(), added code to write multiple meta-IDs for bug fix MMM. 6. In write_meta_name_index(), added code to write out the numeric ID for META name for bug fix MID. 7. In write_word_index(), same as #4. * index.h 1. Same as file_list.c #1. * less.h 1. Added explicit default constructor since g++ 2.8.0 complains if it isn't there and you try to define a "const less" object. * my_set.h 1. This file was renamed from string_set.h. 2. Made string_set class generic for any type T since we now use a set< short > in word_info::file. 3. Changed declaration of string_set to be simply: typedef my_set< char const* > string_set; * postscript.h 1. Same as extract.c #1. * search.c 1. Same as extract.c #1. 2. Same as file_list.c #1. 3. In dump_single_word(), performed following substitution: s/less< char const* >/less< char const* > const/ since it can be const (and everything that can be const should be). 4. In dump_word_window(), same as #3. 5. Added get_meta_id() function for bug fix MID. 6. In parse_meta(), performed following substitution: s/::distance( meta_names.begin(), found.first ) /get_meta_id( found.first )/ for bug fix MID. 7. In parse_meta(), same as #3. 8. In parse_primary(), same as #3. 9. In parse_primary(), added code in while loop at end of function to check all meta-IDs associated with a word for bug fix MMM. * stop_words.h 1. Same as extract.c #1. * string_set.h 1. This file was renamed to my_set.h. (See it for additional changes.) * version.h 1. Updated version to "1.4.1". * word_info.c 1. Same as file_list.c #1. * word_info.h 1. Changed word_info::file struct to include a set of meta-IDs for bug fix MMM. 2. Changed word_info::file struct to use shorts for occurrences and rank to conserve memory (since additional memory is now being taken up by the set of meta-IDs). 3. Gave the word_info::file struct 3 speparate constructors so the minimal amount of code is executed depending on how an object is constructed. ******************************************************************************* 1.4 ******************************************************************************* NEW FEATURES ------------ * SWISH++ now indexes and can search META data. (This feature will be known as feature META.) * SWISH++ now indexes the words in ALT attributes within AREA and IMG elements. (This feature will be known as feature ALT.) * SWISH++ can now index files and directories specified via standard input instead of via the command line. When doing this, extensions of files to index need not explicitly be specified via the -e option, i.e., 'index' assumes you know what you're doing when specifying filenames. (This feature will be known as feature ISI.) * For both 'index' and 'extract', a new -r command line option was added to suppress recursively indexing files in subdirectories. This option is most useful in conjunction with the new ISI feature. (This feature will be known as feature CLR.) * Added an optimization option for detemining whether a character is a "word character" by eliminating the call to strchr() in is_word_char(). This yields about a 10% performance improvement during indexing. (This feature will be known as feature WCO.) * The code for the 'index' was profiled and a couple of performance tweaks were made yielding about a 7% performance improvement. (This feature will be known as feature PPT.) BUG FIXES --------- * A small bug whereby the last word of a file was not indexed if the last line didn't end in a newline (or a whitespace character in general) was fixed. (This bug fix will be known as bug fix ILW.) CHANGES, file-by-file --------------------- * config.h 1. Added OPTIMIZE_WORD_CHARS, OPTIMIZE_WORD_BEGIN_CHARS, and OPTIMIZE_WORD_END_CHARS for feature WCO. * directory.c 1. Added a comment regarding do_file(). 2. Added check of new global "recurse_subdirectories" variable for feature CLR. * entities.h 1. Added a comment regarding the use of "less< key_type >" with the map. * ext_proc.h 1. Performed following substitution: s/map_type::const_iterator i/map_type::const_iterator const i/ It should have been that way all along. * extract.c 1. In main(), added code for feature ISI. 2. In main(), added handling of new -r option for feature CLR. 3. In do_file(), redid main 'while' loop and added 'if's for bug fix ILW. 4. In do_file(), replaced calls to strchr() with new is_word_begin_char() and is_word_end_char() functions for feature WCO. 5. In usage(), added missing description for -E option. 6. In usage(), added description for new -r option for feature CLR. * file_info.h 1. Added current_file() function for feature META. * file_list.c 1. In operator++(), added meta-index parsing code for feature META. * file_list.h 1. Added value_type::meta_index data member for feature META. * html.c 1. Added #include "index.h" and #include "meta_map.h" for features ALT and META. 2. Throughout the entire file, improved the SEE ALSO references, added URLs. 3. Added find_attribute() for features ALT and META. 4. Performed following substitution: s/skip_html_tag/parse_html_tag/ for features ALT and META. 5. Added parse_elements parameter to parse_html_tag() and code to parse ALT attributes and META elements for features ALT and META. * html.h 1. Added definitions of no_meta_index and meta_index_not_found for feature META. 2. Performed following substitution: s/skip_html_tag/parse_html_tag/ for feature META. * index.c 1. Added #include "index.h" and #include "meta_map.h" for feature META. 2. Added definition of meta_names for feature META. 3. In main(), added code for feature ISI. 4. In main(), added handling of new -r option for feature CLR. 5. Refactored do_file() by splitting out the actual word indexing part into a new function index_words() for feature META. The index_words() function is now also called by parse_html_tag() to index the words in the CONTENT attribute of META elements. 6. In do_file(), replaced 3 function calls to strcmp() to see if a file is an HTML file with a callto a new, inlined is_html_ext() function for feature PPT. 7. In index_words(), redid main 'while' loop and added 'if's for bug fix ILW. 8. In index_words(), added 'if' so as not to parse '<' as the start of and HTML tag if meta_index >= 0 for feature META. 9. In index_words(), replaced calls to strchr() with new is_word_begin_char() and is_word_end_char() functions for feature WCO. A. In merge_indicies(), added code to write meta index for feature META. B. In merge_indices(), redid code for writing the word index to use low ASCII characters as separators for feature META. C. Replaced a lot of 'for' loops iterating over an entire sequence with a new FOR_EACH or TRANSFORM_EACH macro. (I got tired of typing.) D. In rank_full_index(), moved the code to compute the ranks AFTER the tests to see whether a word occurs too frequently. It was originally placed before since file_count needed to be calculated, but I realized this is known ahead of time as simply info.files_.size(). E. Added write_meta_name_index() for feature META. F. In write_full_index(), added call to write_meta_name_index() for feature META. G. In write_word_index(), redid code for writing the word index to use low ASCII characters as separators for feature META. H. In usage(), added description for new -r option for feature CLR. * index.h 1. Added this new file for feature META. * Makefile 1. Added new dependencies for feature META. * man/Makefile 1. Added missing "pdf" target. * man/man1/extract.1 1. Added description of new -r option for feature CRL. 2. Added description of feature ISI. * man/man1/index.1 1. Added description of META element indexing for feature META. 2. Improved references, added URL. 3. Added description of new -r option for feature CRL. 4. Added description of feature ISI. * man/man1/search.1 1. Added description and examples of META element searching for feature META. * man/man4/swish++ 1. Modified description of index file format for feature META. * meta_map.h 1. Added this file for feature META. * search.c 1. Added #include "html.h" for feature META. 2. Added definition of meta_names for feature META 3. In main(), added dump_meta_names and new -M command line option for feature META. 4. In main(), used new enum for calls to word_index::set_index_file(). 5. Replaced a lot of 'for' loops iterating over an entire sequence with a new FOR_EACH or TRANSFORM_EACH macro. (I got tired of typing.) 6. In dump_word_window(), added missing description for 'match' parameter. 7. In parse_query(), performed following substitution: s/parse_primary/parse_meta/ for feature META. 8. Added parse_meta() function for feature META. 9. In parse_primary(), added meta_index parameter for feature META. A. In parse_primary(), added code to add words to result only if the meta-name matches for feature META. B. In usage(), added description of new -M option for feature META. * stop_words.c 1. In stop_wrod_set::stop_word_set(), redid main 'while' loop and added 'if's for bug fix ILW. * token.c 1. Reworked token::hold() to accomodate more than one put_back() in a row for feature META since parse_meta() requires two look-ahead tokens. 2. Added new case for the '=' token for feature META. * token.h 1. Added equal_token for feature META. 2. Corresponding change to token.c item #1. * util.c 1. In is_ok_word(), performed following substitution: s/int const len = ::strlen( word )/int const len = c - word/ for feature PPT. 2. In to_lower(), replaced call to transform() with simple while loop for feature PPT. * util.h 1. Added new FOR_EACH and TRANSFORM_EACH macros since I got tired of typing. 2. Added new is_html_ext() function for feature PPT. 3. Redid is_word_char() function for feature WCO. 4. Added is_word_begin_char() and is_word_end_char() functions for feature WCO. * version.h 1. Updated version to "1.4". * word_index.h 1. Added enum for word indices to word_index class. * word_info.h 1. Added #include "html.h" for feature META. 2. Added word_info::file::meta_index_ data member and modified constructor accordingly for feature META. ******************************************************************************* 1.3.2 ******************************************************************************* BUG FIXES --------- * The ignore stop words feature (feature ISW) added in version 1.2 was slightly broken in 1.2.1; it was "fixed" in 1.2.2 (bug fix ISW2), but not quite in that if left hand side of a query was ignored, thw whole thing was. (This bug fix will be known as bug fix ISW3.) * In 'index', the check for whether filename extensions were supplied was too early in the code so the -S option didn't work. (This bug fix will be known as bug fix CFE.) CHANGES, file-by-file --------------------- * config/man.mk 1. Made "make dist" make the manual pages in PDF format in addition to text format. * index.c 1. Relocated code to check whether filename extensions were supplied for bug fix CFE. 2. In main(), used an ostream_iterator() to dump stop words. 3. In do_file(), split tests for stop-words into two separate 'if' statements so to_lower() isn't called unless absolutely necessary. * search.c 1. In parse_query(), redid ignore-handling code for bug fix ISW3. 2. In main(), used an ostream_iterator() to dump stop words. * stop_words.c 1. Added stop-words: billions, eighteen, fifteen, fourteen, millions, ninteen, second, seconds, seventeen, sixteen, tens, third, thirteen, trillions. * util.c 1. In is_ok_word() on line 192, changed floating point calculation to integer by multiplying LHS by 100 to increase performance. 2. On the same line, performed the followingg substitution: s/>=/>/ so the code matches the documentation that says, "... contains more than a third capital letters ..." * version.h 1. Updated version to "1.3.2". * www_example/search.cgi 1. Removed extraneous 'o' (optimize) options from regular expressions. * www_example/WWW.pm 1. Added GNU Public Licensce notice at top. 2. In trim_whitespace(), used map() rather than a for loop. 3. Removed extraneous 'o' (optimize) options from regular expressions. ******************************************************************************* 1.3.1 ******************************************************************************* BUG FIXES --------- * Unbeknownst to me, I introduced a bug in 1.2.2 that broke wildcard searches. (Doh!) This has been fixed. (This bug will will be known as bug fix WCF.) CHANGES, file-by-file --------------------- * man/man1/search.1 1. Make it explicitly clear that wildcards are not permitted for the -d and -w options. * token.c 1. Moved the line: ::strcpy( t.lower_buf_, to_lower( t.buf_ ) ); before: if ( t.type_ ) return in; for bug fix WCF. * version.h 1. Updated version to "1.3.1". ******************************************************************************* 1.3 ******************************************************************************* NEW FEATURES ------------ * In "search," a "window" of words can be dumped around the query words. (This feature will be known as feature DWW.) * In "search," the -d option to dump the index for a word now dumps all the query words instead of a single word. Additionally, a stop-word used to print "stop-word"; now it prints "# ignored: " followed by the word. (This feature will be known as feature DQW.) * In "search," the -d option to dump the index for a word now prints the comment: # not found: word if 'word' is not found in the index. (This feature will be known as feature NFW.) CHANGES, file-by-file --------------------- * directory.c 1. Changed order of #include's putting direct.h last so that it compiles OK under FreeBSD 2.2.7. * man/man1/search.1 1. Corresponding changes for features DWW, DQW, and NFW. * search.c 1. In main(), performed the following substitutions: s/char const *dump_word/bool dump_word_index/ s/dump_word/dump_word_index/ s/d:Di:m:s:SV/dDi:m:s:SV/ for feature DQW. 2. In main(), performed the following substitution: s/dDi:m:s:SV/dDi:m:s:SVw:/ 3. In main(), performed the following substitution: s/dump_entire_index || dump_stop_words || dump_word /dump_entire_index || dump_stop_words/ for feature DQW since the -d option no longer takes an argument. 4. In main(), added code to handle new -w option for feature DWW. 5. In main(), added 'while' loop to code to dump multiple words for feature DQW. 6. In dump_single_word(), performed following substitution: s/"stop-word"/"# ignored: " << word/ for feature DQW. 7. In dump_single_word(), added printing of new "not found" comment key for feature NFW. 8. Added function dump_word_window() for feature DWW. 9. In usage(), performed following substitution: s/-d word/-d/ for feature DQW. A. In usage(), added text for new -w option for feature DWW. * version.h 1. Updated version to "1.3". ******************************************************************************* 1.2.2 ******************************************************************************* NEW FEATURES ------------ * A heuristic was added not to index a word if it contains more than a threshold number of consecutive punctuation characters. (This feature will be known as feature MCP.) * Files can now be indexed by exclusion of filename extensions rather than by inclusion via a new -E command-line option. (This feature will be known as feature EFE.) BUG FIXES --------- * The ignore stop words feature (feature ISW) added in version 1.2 was slightly broken in 1.2.1 in that the list of ignored words was no longer reported. (This bug fix will be known as bug fix ISW2.) CHANGES, file-by-file --------------------- * config.h 1. Performed following substitution: s/Word_Hex_Min_Size/Word_Hex_Max_Size/ The original name was inconsistent with the other parameters. 2. Added "Word_Max_Consec_Puncts" for feature MCP. * config.mk 1. Performed following substitution: s/install.sh/install-sh/ so some versions of make don't get confused with the .sh suffix and try to build it. * extproc.c 1. Added definitions for WEXITSTATUS and WIFEXITED if not defined on a particular system. * extract.c 1. Corresponding change for config.h item 1. 2. Performed following variable substitution: s/extensions/include_extensions/ for feature EFE. 3. Added variable exclude_extensions for feature EFE. 4. In main(), added code to handle new -E option for feature EFE. 5. In do_file(), added check against new exclude_extensions variable for feature EFE. 6. In usage(), added text for new -E option for feature EFE. * index.c 1. Performed following variable substitution: s/extensions/include_extensions/ for feature EFE. 2. Added variable exclude_extensions for feature EFE. 3. In main(), added code to handle new -E option for feature EFE. 4. In do_file(), added check against new exclude_extensions variable for feature EFE. 5. In usage(), added text for new -E option for feature EFE. * man/man1/extract.c 1. Changed description for feature EFE. * man/man1/index.c 1. Changed description for feature MCP. 2. Changed description for feature EFE. * search.c 1. Deleted is_stop_word() function for bug fix ISW2. 2. In dump_single_word(), added code formerly in is_stop_word() here for bug fix ISW2. 3. In parse_primary(), added code formerly in is_stop_word() here for bug fix ISW2. * token.c 1. Changed token so that it is not converted to all lower-case for bug fix ISW2. Previously, acronyms were not recognized in lower case and keywords ("and," "or," and "not") were not recognized in upper case. 2. Added code to make a copy of the token string in all lower case. This is still needed for stop-word determination. * token.h 1. Added second buffer to hold all-lower-case version of token text for bug fix ISW2. * util.c 1. In is_ok_word(), added code for feature MCP. * version.h 1. Updated version to "1.2.2". ******************************************************************************* 1.2.1 ******************************************************************************* NEW FEATURES ------------ * In "search," the original -d option that used to dump the entire index now dumps the index entry for a single word. Correspondingly, a new -D option now does what -d used to do. (This feature will be known as feature DSW.) * In "search," the dump of the index entries now includes the rank. (This feature will be known as feature DIR.) BUG FIXES --------- * Numeric entity references were not converted to their ASCII equivalents. (I don't know how I missed this.) (This bug fix will be known as bug fix NER.) * A search query that contained only stop-words returned all files (up to the specified limit or default maximum). (This bug fix will be known as bug fix RSW.) CHANGES, file-by-file --------------------- * config/config.mk 1. Added comment at top to remind people that they must do a "make distclean" before recompiling if they change any definitions. 2. Performed the following substitution: s!/usr/ucb/install!$(ROOT)/install.sh! * entities.c 1. Added num_entities[] for bug fix NER. 2. Performed following substitutions: s/entity_map/char_entity_map/ s/entity/char_entity/ s/entity_name/name/ so as to distinguish them from the newly-added num_entities[]. 3. Added: "ETH", 'D', "eth", 'd', to char_entity_table[]. * entities.h 1. Added: extern char const num_entities[ 256 ]; for bug fix NER. 2. Corresponding changes for entities.c item 2. * html.c 1. Corresponding changes for entities.c item 2. 2. Made use of new num_entities[] for bug fix NER. * index.c 1. On line 400, performed the following substitution: s/*lower_word/*const lower_word/ It should have been that originally. * install.sh 1. Created this shell script to use for installs instead of having to rely on the OS having a "Berkeley-esque" install command. * man/man1/search.1 1. Changed description for feature DSW. * search.c 1. Added new find_result_type typedef. 2. Added a new dump_single_word() function for feature DSW. 3. Added "bool &ignore" parameter to parse_query() and parse_primary() functions for bug fix RSW. 4. Factored out code that determines whether a word was indexed or not into a new function is_stop_word(). 5. In main(), added code for feature DSW. 6. In main(), added code for feature DIR. 7. In parse_query(), added code to ignore stop-words properly for bug fix RSW. 8. On line 423, performed following substitution: s/iterator/const_iterator/ It should have been that originally for "const correctness." 9. In parse_primary(), made use of new is_stop_word() function corresponding to item 4. A. In parse_primary(), added code under "not" case to check to see whether the primary should be ignored for bug fix RSW. B. In usage(), added text to usage message for feature DSW. * util.c 1. On line 65, performed following substitution: s/STATIC_CAST/REINTERPRET_CAST/ It should have been that originally. * version.h 1. Updated version to "1.2.1". * www_example/search.cgi 1. Added "&'" characters to those that are not stripped from the query. 2. Added: next if /^#/; so as to ignore comments we know nothing about that future releases of SWISH++ may emit. ******************************************************************************* 1.2 ******************************************************************************* NEW FEATURES ------------ * SWISH++ now stores the list of stop-words in the generated index file so they can be ignored on searches later. Previosuly, using a stop-word in a query would always yield 0 results since the stop-word isn't in the index. After thinking about it, this is just plain stupid. (This feature will be referred to as feature ISW.) * You can now specify the number of files to reserve space for on the command line for "index" overriding the default. (This feature will be referred to as feature ICF.) * You can now specify the number of lines to look into a file for HTML <TITLE> tags on the command line for "index" overriding the default. (This feature will be referred to as feature ICt.) * Added default values to usage messages. (This feature will be referred to as feature UDV.) BUG FIXES --------- * The detection of malformed queries was completely broken. I don't see how this went undetected for this long. (This bug fix will be referred to as bug fix DMQ.) * In the example WWW.pm Perl library, not all the "Unix-unfriendly" characters were stripped from filenames upon upload. (This bug fix will be referred to as bug fix UUC.) CHANGES, file-by-file --------------------- * config.h 1. Performed following substitutions: s/Title_Lines/Title_Lines_Default/ for feature ICt. 2. Performed following substitutions: s/Files_Default/Files_Reserve_Default/ for feature ICF. 3. Added: Index_Filename_Default * extract.c 1. Added code to usage() for feature UDV. 2. In do_file(), added code to check whether a word is a stop-word explicitly for feature ISW. This corresponds to the change for util.c item 2. * file_index.c 1. Moved index file header parsing code into a new function get_index_info() for feature ISW. 2. Added: #include "util.h" * file_info.c 1. Added: extern int num_files_reserve; for feature ICF. 2. Performed following substitution: s/Files_Default/num_files_reserve/ for feature ICF. * html.c 1. Added: extern int num_title_lines; for feature ICt. 2. Performed following substitution: s/Title_Lines/num_title_lines/ * index.c 1. Added write_stop_word_index() function for feature ISW. 2. Added: int num_files_reserve = Files_Reserve_Default; for feature ICF. 3. Added: int num_title_lines = Title_Lines_Default; for feature ICF. 4. Performed following substitutions: s/total_words/num_total_words/ s/unique_words/num_unique_words/ 5. Performed following substitution: s/"the.index"/Index_Filename_Default/ 6. Added 'F' option to command line parsing code and usage message for feature ICF. 7. Added 't' option to command line parsing code and usage message for feature ICt. 8. In do_file(), added code to check whether a word is a stop-word explicitly for feature ISW. This corresponds to the change for util.c item 2. 9. In merge_indices(), removed extra_stop_words and am now using stop_words since they all have to be written to the index file together. This was done for feature ISW. A. In merge_indices(), added code to write additional header information for the stop-words. B. In rank_full_index(), now add computed stop-words to global set so they can all be written to the index file together. This was done for feature ISW. C. In write_full_index(), added code to write additional header information for the stop-words. D. Added code to usage() for feature UDV. * Makefile 1. Added new dependencies for feature ISW. * man/man1/index.1 1. Added description of new option for feature ICF. 2. Added description of new option for feature ICt. * man/man1/search.1 1. Added description of comments "search" outputs for feature ISW. 2. Added description of new -S option for feature ISW. * man/man4/swish++.4 1. Added description of new index file format for feature ISW. * search.c 1. Added definitions: word_index stop_words; string_set stop_words_found; for feature ISW. 2. Performed following substitution: s/"the.index"/Index_Filename_Default/ 3. In main(), added code for new -S option to dump the stop- words from an index file. This was done for feature ISW. 4. In main(), added test of EOF for the query_stream to ensure the entire query is parsed successfully for bug fix DMQ. 5. In main(), added code to output stop-words ignored in the query for feature ISW. 6. In parse_query(), changed: if ( !parse_primary( query, temp1 ) ) break; to: if ( !parse_primary( query, temp1 ) ) return false; for bug fix DMQ. 7. In parse_optional_relop(), changed code in default case by adding a check for a ')' token for bug fix DMQ. 8. In parse_primary(), added code to search stop-words for a word in a query and ignore it for feature ISW. 9. In parse_primary() for the lparen_token case, performed following substitution: s/lparen_token/rparen_token/ for bug fix DMQ. A. Added code to usage() for feature UDV. * stop_words.c 1. Added "let's". 2. In constructor, changed use of "new" to "strdup". 3. Change corresponding to util.c item 3. * util.c 1. Added function get_index_info() to extract number of offset information of an index file for feature ISW. 2. In is_ok_word(), removed check for stop-words for feature ISW. The calling code must now check for stop-words itself. This was necessary because "search" checks for stop-words differently than either "index" or "extract" does. 3. Added function: char const *to_lower( char const *s ) for feature ISW. 4. Added "missing": #include <cstring> * util.h 1. Change corresponding to util.c item 1. 2. Change corresponding to util.c item 2. * version.h 1. Updated version to "1.2". * word_index.c 1. Added int parameter since a word_index is now used for both the regular word index (0) and the new stop-word index (1). 2. Moved index file header parsing code into a new function get_index_info() in util.c for feature ISW. * word_index.h 1. Added int parameter to both constructor and set_file_index() since a word_index is now used for both the regular word index (0) and the new stop-word index (1). * www_example/WWW.pm 1. In parse_multipart(), added $'()*/\ characters to those stripped from filenames for bug fix UUC. * www_example/search.cgi 1. Added code to handle ignored words returned by "search" for feature ISW. ******************************************************************************* 1.1 ******************************************************************************* NEW FEATURES ------------ * SWISH++ is now out of beta test. (Nobody has submitted a bug report in a while.) * From "index," you can now dump the built-in default set of stop-words to a file to edit and then use to index. (This feature will be referred to as feature ESW.) * Some example Perl 5 code for interfacing SWISH++ to a web-based search form has been provided. (This feature will be referred to as feature W3E.) BUG FIXES --------- * The definition of the THIS macro in fake_ansi.h was just wrong and there is no way to fix it; so it and all references to it have been deleted. (This bug fix will be referred to as bug fix XTHIS.) CHANGES, file-by-file --------------------- * extract.c 1. In main(), added code to process the new command-line options of -s for feature ESW. 2. In usage(), augmented message for feature ESW. * fake_ansi.h 1. Deleted definition of THIS macro for bug fix XTHIS. * file_list.c 1. Deleted references to THIS macro formerly defined in fake_ansi.h and defined a local version instead for bug fix XTHIS. * index.c 1. In main(), added code to process the two new command-line options of -s and -S for feature ESW. 2. In usage(), augmented message for feature ESW. * Makefile 1. Added specific build rules for stop_words.c for feature ESW. 2. Added dependency on stop_words.h to index.c for feature ESW. 3. Cleanedup rules for "clean," "dist," and "distclean." * man/Makefile 1. Added provision to build man3 subdirectory for feature W3E. * man/man1/index.1 1. Added descriptions of new command-line options for feature ESW. 2. Added missing description of additional processing done for HTML files. * man/man3/Makefile * man/man3/www.3 1. New files for W3E. * stop_words.c 1. Added global pointer to set-word set for feature ESW. 2. Added constructor for stop_word_set to initialize the set of stop-words either from the built-in default set or from a file. * stop_words.h 1. New file for ESW. * string_set.h 1. Changed definition of string_set to be derived from rather than contain a std::set for feature ESW. * util.c 1. Moved stop_word_set definitions to stop_words.c for feature ESW. * version.h 1. Updated version to "1.1". * www_example/WWW.pm 1. Added form data parsing library in Perl 5 for feature W3E. * www_example/search.cgi * www_example/search.html 1. Added example code for feature W3E. ******************************************************************************* 1.1b3 ******************************************************************************* BUG FIXES --------- * Fixed a bug where unbalanced quotes inside comments would cause a core dump. After rereading the HTML 4.0 specification regarding comments, quotes are not to be balanced or otherwise treated specially inside comments. (This bug fix will be referred to as bug fix CQU.) CHANGES, file-by-file --------------------- * ext_proc.c 1. In process_file(), made pid_error static as it should have been all along. * html.c 1. Added inclusion of util.h to access to_upper() function for bug fix CQU. 2. Added following functions for bug fix CQU: is_html_comment() skip_html_comment() tag_cmp() 3. In grep_title(), changed for loop to while loop to have more precise control over when the iterator is advanced for bug fix CQU. 4. In grep_title(), now check to see if an HTML tag is a comment. 5. In grep_title(), replaced code to check title tag by a call to the new tag_cmp() function. 6. In skip_html_tag(), added calls to is_html_comment() and skip_html_comment() since comments must be skipped differently. (For bug fix CQU.) * Makefile 1. Added util.h to html.o dependencies for bug fix CQU. 2. Added "the.index" to the $(RM) line for the clean target. 3. Deleted the second erroneous dist target. * itoa.c 1. Deleted this extraneous file. * util.c 1. In ltoa(), made Buf_Size and Num_Buffers static as they should have been all along. * util.h 1. Added to_upper() inline function for bug fix CQU. * version.h 1. Updated version to "1.1b3". ******************************************************************************* 1.1b2 ******************************************************************************* NEW FEATURES ------------ * For HTML files having titles longer than Title_Max_Size in length, the last three characters are replaces by an ellipsis ("..."). (This feature will be referred to as feature ELL.) BUG FIXES --------- * Fixed a core dump in grep_title() for HTML files having titles that exceed Title_Max_Size in length. (This bug fix will be referred to as bug fix GT1.) CHANGES, file-by-file --------------------- * file_vector.c 1. Performed following substitution: s/sysent.h/unistd.h/ for portability. * html.c 1. Added code for feature ELL. 2. Fixed grep_title() for bug fix GT1. * version.h 1. Updated version to "1.1b2". ******************************************************************************* 1.1b1 ******************************************************************************* NEW FEATURES ------------ * The search command has a new -s option to specify the number of initial results to skip. Used in conjuntion with -m, results can be returned in "pages." (This feature will be referred to as feature SSR.) CHANGES, file-by-file --------------------- * search.c 1. Added comment for sort_by_rank struct. This was an omission. 2. Added -s option in main() for feature SSR. 3. Added skip_results variable in main() for feature SSR. 4. Added -s option in usage() for feature SSR. 5. Removed extra semicolon in usage() that cause only part of the usage message to print. * version.h 1. Updated version to "1.1b1". * man/man1/search.1 1. Added description of -s option for feature SSR.