Prometheus
Prometheus is an open-source metrics collection and processing tool. It consists primarily of a timeseries database and a query language to access and process the metrics it stores. Separate services perform metric exposure, from which the Prometheus server can pull. It provides a very minimal web UI out of the box. To get a functional dashboard system, third-party tools like Grafana can be used.
Installation
Install the prometheus package. After that you can enable and start prometheus.service
and access the application via HTTP on port 9090 by default.
The default configuration monitors the prometheus
process itself, but not much beyond that. To perform system monitoring, you can install prometheus-node-exporter which performs metric scraping from the local system. You can start and enable the prometheus-node-exporter
service. It will open port 9100 by default. Once the service is running, you will need to configure prometheus
to scrape the exporter service periodically in order actually to collect the data. Do this by following the steps to add metrics as shown below.
prometheus
listens on *:9090
and prometheus-node-exporter
listens on *:9100
, so make sure to change the configuration or enable the relevant firewall rules. See also the Prometheus security model.Configuration
The Prometheus configuration is done through YAML files, the main one being located at /etc/prometheus/prometheus.yml
.
Adding metrics
You can add new places to scrape metrics from by adding them to the scrape_configs
array. To add the local node exporter as a source, next to the prometheus process itself, the configuration would look like this:
scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'node' static_configs: - targets: ['localhost:9100']
Exporters
The Arch Linux repository contains a subset of the available exporters:
- prometheus-node-exporter - system metrics
- prometheus-blackbox-exporter - blackbox probing of endpoints over HTTP, HTTPS, DNS, TCP and ICMP
- prometheus-memcached-exporter - memcached metrics
- prometheus-mysqld-exporter – MySQL server metrics
The exporters are implemented as services. For example to run the node exporter, enable and start prometheus-node-exporter.service
.
Using the UI
Prometheus comes with a very limited web UI to verify configuration, query and graph metrics. You can reach it at http://localhost:9090 by default. You can find an in-depth explanation of Prometheus' query language in the Prometheus documentation.
Alerting
alertmanager can send out custom alerts when certain conditions are met configured in /etc/prometheus/alert.rules.yml
and what alert to send out is configured in /etc/alertmanager/alertmanager.yml
. Alertmanager supports various ways to notify users such as email, slack, and more. To configure email alerts add the following snippet:
global: resolve_timeout: 5m smtp_smarthost: 'smtp.example.com:25' smtp_from: 'alertmanager@example.com' route: group_by: ['instance', 'severity'] group_wait: 30s group_interval: 5m repeat_interval: 3h receiver: team-1 receivers: - name: 'team-1' email_configs: - to: 'admin@example.com'
For prometheus to send alerts to alertmanager include the following snippet in /etc/prometheus/prometheus.yml
:
alerting: alertmanagers: - static_configs: - targets: - localhost:9093
To configure an alert for when a systemd unit fails add the following snippet to /etc/prometheus/alert.rules.yml
. For more rules read the alerting rules documentation.
- name: systemd_unit interval: 15s rules: - alert: systemd_unit_failed expr: | node_systemd_unit_state{state="failed"} > 0 for: 3m labels: severity: critical annotations: description: 'Instance : Service failed' summary: 'Systemd unit failed'
Tips and tricks
Telegraf instead of exporters
Telegraf can be used instead of multiple exporters when used with Prometheus Output Plugin. This reduces metrics collection into a single binary and offers more flexible configuration when compared to standard Prometheus exporters.