A guide to Spamhalter in
Pegasus Mail and Mercury/32.

This document is last updated 17-July-2006 and it's content is stolen from Pegasus Mail Help file.


Why does this page only refers to Pegasus Mail?

Mercury/32 version 4.1 and Pegasus Mail version 4.41 share the same code for Spamhalter. The pages you are reading right now are created from sources that are linked to Pegasus Mail only.

You can read "Mercury/32" everywhere you find "Pegasus Mail".

Just for the record: Mercury/NLM (for Novell) does not have this feature.

Getting started with Spamhalter

It is important to understand that any Bayesian spam filter requires a two-part initial corpus - that is, one collection of messages that are spam, and another collection of messages that are not spam. You should not enable Spamhalter until you have a total of around 100 messages of each type that you can use as your initial corpus (these messages can be in any folder - they need not all be in the same folder). Once you are ready to begin using Spamhalter, first enable it in the Spamhalter... menu on the Spam and content control submenu of the Tools menu. Next, select the spam messages in your corpus as a group, then right-click one of the messages and choose Train messages as spam from the popup menu. Spamhalter will learn from the messages, which may take some time. Once the training process has completed, select the non-spam messages from your corpus as a group, then right-click one and choose Train messages as not-spam from the popup menu. Spamhalter will now learn from the non-spam messages.

IMPORTANT NOTE: The messages that make up your corpus NEED NOT be in the same folder; what's more, there is nothing to prevent you from taking groups of messages from many folders in a piecemeal manner, selecting them, then right clicking and training as spam or not-spam, then moving to the next folder and repeating the process.

Once Spamhalter has been trained on your basic corpus, it will automatically apply what it has learned to every new message that appears in your new mail folder. Any new message it regards as spam will be moved into whatever folder you select in the Spamhalter configuration dialog as the Spamhalter spam folder.

In the early stages of using Spamhalter, it is important that you check your spam folder regularly, because until it has built up a fairly large statistical database, Spamhalter is likely to produce a number of false positives - messages incorrectly classified as spam. Moving a false positive message out of the spamhalter folder into any other folder will automatically force Spamhalter to re-classify that message and amend its statistical tables, reducing the likelihood of the misclassification in future. Similarly, if you receive spam that Spamhalter does not detect, simply move it into your Spamhalter spam folder and Spamhalter will automatically be trained on that message, increasing the likelihood that it will correctly detect similar messages in future.

Basic operation

As noted above, moving messages in and out of your Spamhalter spam folder automatically trains Spamhalter, so using it is very easy. If, for some reason, you want to classify a message as either spam or not-spam without moving it, you can right-click the message and choose either Train as spam or Train as not-spam. These options are equivalent to moving the message in or out of your spam folder.

If you want to see why a particular message has been classified as either spam or not-spam, right-click that message and choose Explain classification: Spamhalter will open a small dialog showing you the words or phrases it has used to establish its classification and the weight those words had in the process.

Whitelisting

There may be certain addresses that you want to prevent from ever being classified by Spamhalter - addresses that are known to be good, or from which it is important that there be no risk at all of false-positive detection (that's where Spamhalter incorrectly classifies a message as spam when it is not). To achieve this, simply enter the sender's address in your Global Whitelist. When an address appears in your global whitelist, then any message appearing to originate from that address will be exempted from classification by both Spamhalter and Content Control.

Status indication

When Spamhalter is enabled, Pegasus Mail displays an extra status indicator when you are reading a message - a small icon of a traffic light. If the traffic light is green, then the message has been classified as not-spam by Spamhalter. If the traffic light is red, then the message has been classified as spam by Spamhalter. If the traffic light is grey, then Spamhalter either has not yet classified the message, or could not determine absolutely whether or not the message was spam.

Sorting and grouped views

You can sort messages in any folder by their "spamminess" by choosing "Sort by spamminess" from the Messages menu (in preview mode) or the Folder menu (if you have the folder open in its own window). Similarly, you can group messages by spamminess using the Group by spamminess grouped view on the Grouped views submenu.of the Messages/Folder menu.

Copyright information

Spamhalter was developed by Lukas Gebauer of Ararat s.r.o (http://www.ararat.cz/) in the Czech Republic, and is incorporated in Pegasus Mail with permission. The core Spamhalter code is Copyright (c) 2000-2006, Lukas Gebauer. Pegasus Mail's author offers his thanks and appreciation to Lukas for his efforts.

Ready to use spamfilter database

Althoug it's described above how easy it is to setup the first database that Spamhalter is using, I zipped my database (now 3.860.207 bytes) and made it available using this link. Just unzip the file to your home mailbox location while Pegasus Mail is not running and you will have a well trained database.

 

 


If you have more information that should be placed on those Content Control pages, please feel free to contact me by e-mail and I will make that information available.
Back to Han's Linkpage