2003-05-06 19:40:18 +02:00
|
|
|
Bogofilter is a mail filter that classifies mail as spam or ham
|
|
|
|
(non-spam) by a statistical analysis of the message's header and
|
2004-02-09 01:59:06 +01:00
|
|
|
content (body). It is able to learn from the user's classifications
|
|
|
|
and corrections.
|
2003-04-03 14:06:07 +02:00
|
|
|
|
2003-05-06 19:40:18 +02:00
|
|
|
The statistical technique is known as the Bayesian technique and
|
|
|
|
its use for spam was first described by Paul Graham in his article
|
|
|
|
A Plan For Spam. Gary Robinson, in his weblog Rants, suggests some
|
|
|
|
refinements for improved discrimination between spam and ham.
|
|
|
|
Bogofilter's primary algorithm uses the f(w) parameter and the
|
2003-04-03 14:06:07 +02:00
|
|
|
Fisher inverse chi-square technique that he describes.
|
|
|
|
|
2003-05-06 19:40:18 +02:00
|
|
|
Bogofilter is run by an MDA script to classify an incoming message
|
|
|
|
as spam or ham (using wordlists stored by BerkeleyDB). Bogofilter
|
2004-02-09 01:59:06 +01:00
|
|
|
provides processing for plain text and HTML. It supports multi-part
|
2003-05-06 19:40:18 +02:00
|
|
|
mime message with decoding of base64, quoted-printable, and
|
2003-04-03 14:06:07 +02:00
|
|
|
uuencoded text and ignores attachments, such as images.
|