SpamAssassin

From ArchWiki

SpamAssassin is a mail filter to identify spam.

Installation

Install the spamassassin package.

Create a sa-update-keys directory in /etc/mail/spamassassin and change the owner and group:

# mkdir -p /etc/mail/spamassassin/sa-update-keys /etc/mail/sa-update-keys
# chown -R spamd:spamd /etc/mail/spamassassin /etc/mail/sa-update-keys
# chmod 755 /etc/mail/spamassassin
# chmod 700 /etc/mail/spamassassin/sa-update-keys

Next start/enable spamassassin.service.

Usage

Go over /etc/mail/spamassassin/local.cf and configure it to your needs.

Updating rules

Update the SpamAssassin matching patterns and compile them:

[spamd]$ /usr/bin/vendor_perl/sa-update && /usr/bin/vendor_perl/sa-compile

You will want to run this periodically, the best way to do so is by setting up a systemd timer.

Create the following service, which will run these commands:

/etc/systemd/system/spamassassin-update.service
[Unit]
Description=spamassassin housekeeping stuff
After=network.target

[Service]
User=spamd
Group=spamd
Type=oneshot

ExecStart=/usr/bin/vendor_perl/sa-update
SuccessExitStatus=1
ExecStart=/usr/bin/vendor_perl/sa-compile
ExecStart=!/usr/bin/systemctl -q --no-block try-restart spamassassin.service

# uncomment the following ExecStart line to train SA's bayes filter
# and specify the path to the mailbox that contains spam email(s)
#ExecStart=/usr/bin/vendor_perl/sa-learn --spam <path_to_your_spam_mailbox>

Then create the timer, which will execute the previous service daily:

/etc/systemd/system/spamassassin-update.timer
[Unit]
Description=spamassassin house keeping

[Timer]
OnCalendar=daily
Persistent=true

[Install]
WantedBy=timers.target

Now you can start and enable spamassassin-update.timer.

Set maximum size for scanning

The default maximum size for scanning is 500 KB (see spamc(1p)). You can modify it: create the spamc configuration file. For example :

/etc/mail/spamassassin/spamc.conf
# spamc global configuration file

# max message size for scanning = 1Mo
-s 1000000

Using a SQL database

SpamAssassin can load user preferences, Bayesian filter data and auto-whitelist from a SQL database. This is specially helpful for a virtual user mail setup, where users do not have a $HOME/.spamassassin directory with their SpamAssassin data.

Note: Since TxRep plugin is a newer and enhanced replacement for Auto-Welcomelist and Auto-Whitelist, this is the covered implementation in this article

MySQL

Install perl-dbd-mysql. Then, create the database:

$ mysql -u root -p
CREATE DATABASE <db_name>;
GRANT ALL ON <db_name>.* TO '<db_user>'@'localhost' IDENTIFIED BY '<password>';

Git-clone SpamAssassin's source. Under the sql/ directory you will find the required files to create the database tables. Note that TYPE has been replaced by ENGINE in recent MySQL versions, so replace it accordingly in the used .sql files if needed.

Create the tables for user preferences, Bayesian filter data and TxRep, respectively:

$ mysql -u root -p <db_name> < userpref_mysql.sql
$ mysql -u root -p <db_name> < bayes_mysql.sql
$ mysql -u root -p <db_name> < txrep_mysql.sql

TxRep is optional, so skip it if you're not using it. In case you want to use it but haven't configured it yet, please refer to Mail::SpamAssassin::Plugin::TxRep(3)

Make sure to have the following your configuration file:

/etc/mail/spamassassin/local.cf
## MySQL database setup
# User scores
user_scores_dsn             DBI:mysql:<db_name>:localhost
user_scores_sql_username    <db_user>
user_scores_sql_password    <password>

# Bayesian filter
bayes_store_module          Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn               DBI:mysql:<db_name>:localhost
bayes_sql_username          <db_user>
bayes_sql_password          <password>

# TxRep plugin
txrep_factory               Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsn                DBI:mysql:<db_name>:localhost
user_awl_sql_username       <db_user>
user_awl_sql_password       <password>

Finally, restart spamassassin.service.

Plugins

ClamAV

Install and setup clamd as described in ClamAV.

Follow one of the above instructions to call SpamAssassin from within your mail system.

Install the perl-cpanplus-dist-arch package. Then install the ClamAV perl library as follows:

# /usr/bin/vendor_perl/cpanp -i File::Scan::ClamAV

Add the 2 files from https://wiki.apache.org/spamassassin/ClamAVPlugin into /etc/mail/spamassassin/. Edit /etc/mail/spamassassin/clamav.pm and update $CLAMD_SOCK to point to your Clamd socket location (default is /run/clamav/clamd.ctl).

Finally, restart spamassassin.service.

Razor

Note: The last version was released 2008.[1]

Vipul's Razor is a distributed, collaborative, spam detection and filtering network.

Make sure you have installed SpamAssassin first, then:

Install the razor package.

Register with Razor.

# mkdir /etc/mail/spamassassin/razor
# chown spamd:spamd /etc/mail/spamassassin/razor
[spamd]$ cd /etc/mail/spamassassin/razor
[spamd]$ /usr/bin/vendor_perl/razor-admin -home=/etc/mail/spamassassin/razor -register
[spamd]$ /usr/bin/vendor_perl/razor-admin -home=/etc/mail/spamassassin/razor -create
[spamd]$ /usr/bin/vendor_perl/razor-admin -home=/etc/mail/spamassassin/razor -discover

To tell SpamAssassin about Razor, add the following line to /etc/mail/spamassassin/local.cf:

razor_config /etc/mail/spamassassin/razor/razor-agent.conf

To tell Razor about itself, add the following line to /etc/mail/spamassassin/razor/razor-agent.conf:

razorhome = /etc/mail/spamassassin/razor/

Finally, restart spamassassin.service.

Tips and tricks

Maintaining TxRep SQL table

It is recommended to keep TxRep SQL table clear of stale data, for performance and storage reasons. Here is a sample query that can be run on a regular schedule:

DELETE FROM txrep WHERE last_hit <= (now() - INTERVAL 120 day);