Monit
Monit, not to be confused to M/Monit, is an AGPL3.0 licensed system and process monitoring tool. Monit can automatically restart crashed services, display temperatures from standard hardware (through lm_sensors and hard drives from smartmontools for example). Service alerts can be sent based on a wide criteria including a single occurrence or occurrences over a period of time. It can be accessed directly through the command line or ran as a web app using its integrated HTTP(S) server. This allows quick and streamlined snapshot of a given systems status.
Installation
Install the monit package and any software for optional testing such as lm_sensors or smartmontools. Once you have completed the configuration, be sure to enable and start monit.service
.
Configuration
Monit keeps a main configuration file as /etc/monitrc
. You can choose to edit this file but if you wish to run scripts (such as to get hard drive temperatures or health status) you should uncomment the last directive of include /etc/monit.d/*
, save /etc/monitrc
and create /etc/monit.d/
.
/etc/monitrc
file (and potentially files stored in /etc/monit.d
) to have 0700
permissions. Failure to comply will result in Monit failing to start.Configuration syntax
Monit utilizes a configuration syntax that makes it very easy to read; essentially check WHAT
followed by if THING condition THEN action
format. Any occurrence of if
, and
, with(in)
, has
, us(ing|e)
, on(ly)
, then
, for
, of
in the configuration file is for human readability only and are completely ignored by Monit.
Checks are usually performed in cycles
. This is defined at the beginning of the configuration file, for example a 30second poll is defined with:
set daemon 30
Checks with 4 cycles
would therefore happen every 2 minutes
Configuration examples
Mailserver declaration
set mailserver smtp.myserver.com port 587 username "MyUser" password "MyPassW0rd" using tlsv12
Email notification format
set mail-format { from: Monit@MyServer subject: $SERVICE $EVENT at $DATE message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION. }
$SERVICE
are not generic examples but are specific variable names which Monit replaces with what the alert is, on what system, etc.CPU, memory and swap utilization
check system $HOST if loadavg (15min) > 15 for 5 times within 15 cycles then alert if memory usage > 80% for 4 cycles then alert if swap usage > 20% for 4 cycles then alert
Filesystem(s) usage
check filesystem rootfs with path / if space usage > 90% then alert check filesystem NFS with path /mnt/nfs_share if space usage > 90% then alert
Process monitoring
check process sshd with pidfile /var/run/sshd.pid start program "systemctl start sshd" stop program "systemctl stop sshd" if failed port 22 protocol ssh then restart
check process smbd with pidfile /run/samba/smbd.pid group samba start program = "/etc/init.d/samba start" stop program = "/etc/init.d/samba stop" if failed host 192.168.1.250 port 139 type TCP then restart depends on smbd_bin check file smbd_bin with path /usr/bin/smbd group samba if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor
depends on smbd_bin
, this makes the testing of Samba require the actual smbd
processHard drive health and temperature using scripts
Temperature
Create the file /etc/monit.d/scripts/hdtemp.sh
as well as the /etc/monit.d/scripts
folder if necessary.
/etc/monit.d/scripts/hdtemp.sh
#!/usr/bin/sh HDDTP=`/usr/bin/smartctl -A /dev/sd${1} | grep Temp.*Cels | awk -F " " '{printf "%d",$10}'` #echo $HDDTP # for debug only exit $HDDTP
monitrc or /etc/monit.d/*.monit file
check program SSD-A-Temp with path "/etc/monit.d/scripts/hdtemp.sh a" every 5 cycles if status > 40 then alert group health check program HDD-B-Temp with path "/etc/monit.d/scripts/hdtemp.sh b" every 5 cycles if status > 40 then alert group health
In this example, the /etc/monit.d/scripts/hdtemp.sh
script assumes your drive path is /dev/sdX
where X
is filled in by the letter at the end of the check
declaration. A similar method is used for the SMART health status in the next example.
SMART health status
/etc/monit.d/scripts/hdhealth.sh
#!/usr/bin/sh STATUS=`/usr/bin/smartctl -H /dev/sd${1} | grep overall-health | awk 'match($0,"result:"){print substr($0,RSTART+8,6)}'` if [ "$STATUS" = "PASSED" ] then # 1 implies PASSED TP=1 else # 2 implies FAILED TP=2 fi #echo $TP # for debug only exit $TP
monitrc or /etc/monit.d/*.monit file
check program SSD-A-Health with path "/etc/monit.d/scripts/hdhealth.sh a" every 120 cycles if status != 1 then alert group health check program HDD-B-Health with path "/etc/monit.d/scripts/hdhealth.sh b" every 120 cycles if status != 1 then alert group health
group
declaration will cause Monit to display all assigned checks with the same group name (in this case health) together.Alert recipients: global or subsystem based
Alerts can be set globally, where a given user / email address is alerted for any alert
condition; or you can set an alert recipient for each type of check (eg network alerts go to recipient A; process alerts go to recipient B). You can set as many global or subsystem recipients as you like, just make multiple declarations.
Global alerts
Global alerts are set outside of any subsystem checks; for ease of reading they should be set in the same location as the mailserver declaration.
SET ALERT email@domain
Subsystem alerts
Subsystem alerts are set very similarly to global alerts except they lack the SET
flag.
ALERT email@domain