13.4  SpamAssassin

To face spam, Kerio MailServer uses SpamAssassin, a famous antispam filter. SpamAssassin consists of several testing methods:

Note: For easier solution of problems regarding SpamAssassin that might arise, enable the SpamAssassin Processing option in the Debug log settings. To read more on the Debug log, see chapter 25.9  Debug log.

Content evaluation

Content evaluation is based on statistical filtering using the message's contents (keywords, number of capital letters, message format, etc.). Each incoming message is assigned a numeric score according to the number of characters significant for spam messages. A higher score indicates a higher probability of spam.

Bayesian filter

Another module involved is the Bayesian filter. It is a special antispam filter which is able to “learn” to recognize spam messages. This filter compares the individual spam characteristics with actual messages. The method consists of two concurrent modes:

  • Autolearn” — the filter learns by itself.

  • Learn” — users are involved in the learning process. Users have to reassign the incorrectly evaluated messages to correct types (spam / non-spam) so that the filter learns to recognize them in the future.

200 unique spams and 200 unique hams (legitimate messages) must be collected to make the filter work. This means that such messages must vary. Each spam message is involved only once. Other occurrences of an identical message will be ignored.

Bayesian filter sums spams and hams learned by the learn and autolearn methods. The SpamAssassin tab contains statistics that monitor how many messages have been marked as spam or ham and whether the filter is already active or has not learn enough spam and ham messages yet. Once activated, the learning process keeps on introducing new items in the database.

Note: SpamAssassin checks only messages which do not exceed the size of 128 KB since spam messages are mostly not so large and checking of large messages might overload or slow down the server's performance.

Since individual users must check the messages in the “Learn” mode, the spam evaluation tools must be embedded in mail clients. By default, these tools include only MS Outlook with the Kerio Outlook Connector and the Kerio WebMail interface. Users can click special buttons in the toolbar to mark an incorrectly evaluated message as non-spam.

For email clients with IMAP accounts as well as for MS Entourage (for IMAP and Exchange accounts), there is another method of how to teach the Bayesian filter. These users can mark incorrectly classified messages by moving them to appropriate folders. If users want to mark a message as spam, they can move such messages to Junk E-mail. To mark a message as not spam, they can move it to Inbox.

TIP

To use this method as efficiently as possible, set users a spam rule (either when creating user accounts in  Kerio MailServer or by defining a corresponding sieve rule for incoming mail). Any messages marked by Kerio MailServer as spam will be automatically moved to the Junk E-Mail folder. Messages that are incorrectly marked as spam can be moved to Inbox by hand. Spam messages let in by mistake can be moved to the Spam folder manually. This ensures proper and efficient learning and improvement of the Bayesian filter.

Online SURBL database

This part of the filter tests contents of messages (links to websites possibly included in message bodies) against special online databases.

SpamAssassin can use multiple online databases. In Kerio MailServer, it, however, uses only the SURBL database since the other databases are already used for other tests.