* spamprobe.cc (process_stream): Added -o tokenized option
to allow people to use an external tokenizer with spamprobe.
* SpamFilter.cc (scoreToken): Reduced sorting overhead by
pre-computing and integer sort value with sorting priorities
reflected in the value. This eliminates several calculations
inside of the sort routine.
* SpamFilter.cc (computeRatio): Capped ratios in calculations to
within MIN_PROB and MAX_PROB. Widened that range. This avoids
problems with div/0 and makes it easier to sort terms.
* spamprobe.cc (dump_words): dump command can now optionally
accept a regular expression as an argument and will only dump
terms matching the regular expression.
(purge_terms): Added purge-terms command to purge from the
database all terms matching a regular expression.
analysis of terms contained in emails. Works with procmail, maildrop or a
similar tool to produce a complete server or client side spam filtering
system.
This version uses Peter Graf's PBL as a database.