Bayesian statistical filtering is a relatively new technique to identify spam. It is
based on Bayesian statistical analysis, a probability theory published by Thomas Bayes in
the 18th century.
Each message is analyzed and broken down into tokens of message artifacts (i.e. words
found in the message body) and header items (IP addresses, email addresses).
For each token, a spam probability is computed from a database table.
This table can be the
preconfigured default Bayesian token database created automatically
during Praetor installation or a table customized specifically for your
environment using the
Praetor Bayesian training
process. For tips and suggestions on training Praetor's Bayesian
filter, read the Bayesian Training
Tips page.
A value is calculated by using the frequency that the token was observed as
spam or ham in the table. Combining all individual token probabilities, an overall
spam probability index called "spamicity" is computed.
There are many SQL database accesses and floating point numerical computations made just
to analyze one message.
|