5.6 KiB
5.6 KiB
Alertmanager - Alert Routing
Table of Contents
- Overview
- What is Alertmanager?
- Why Use Alertmanager?
- Configuration in AI-Homelab
- Official Resources
- Docker Configuration
Overview
Category: Alert Management
Docker Image: prom/alertmanager
Default Stack: monitoring.yml
Web UI: http://SERVER_IP:9093
Purpose: Handle Prometheus alerts
Ports: 9093
What is Alertmanager?
Alertmanager handles alerts from Prometheus. It deduplicates, groups, and routes alerts to notification channels (email, Slack, PagerDuty, etc.). It also manages silencing and inhibition of alerts. The alerting component of the Prometheus ecosystem.
Key Features
- Alert Routing: Send to right channels
- Grouping: Combine similar alerts
- Deduplication: No duplicate alerts
- Silencing: Mute alerts temporarily
- Inhibition: Suppress dependent alerts
- Notifications: Email, Slack, webhooks, etc.
- Web UI: Manage alerts visually
- Free & Open Source: Prometheus project
Why Use Alertmanager?
- Prometheus Native: Designed for Prometheus
- Smart Routing: Alerts go where needed
- Deduplication: No spam
- Grouping: Related alerts together
- Silencing: Maintenance mode
- Multi-Channel: Email, Slack, etc.
Configuration in AI-Homelab
/opt/stacks/monitoring/alertmanager/
alertmanager.yml # Configuration
data/ # Alert state
alertmanager.yml
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'discord'
receivers:
- name: 'discord'
webhook_configs:
- url: 'YOUR_DISCORD_WEBHOOK_URL'
send_resolved: true
- name: 'email'
email_configs:
- to: 'alerts@yourdomain.com'
from: 'alertmanager@yourdomain.com'
smarthost: 'smtp.gmail.com:587'
auth_username: 'your@gmail.com'
auth_password: 'app_password'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'instance']
Official Resources
- Website: https://prometheus.io/docs/alerting/latest/alertmanager
- Configuration: https://prometheus.io/docs/alerting/latest/configuration
Docker Configuration
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
restart: unless-stopped
networks:
- traefik-network
ports:
- "9093:9093"
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
- '--storage.path=/alertmanager'
volumes:
- /opt/stacks/monitoring/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
- /opt/stacks/monitoring/alertmanager/data:/alertmanager
Setup
-
Configure Prometheus: Add to prometheus.yml:
alerting: alertmanagers: - static_configs: - targets: ['alertmanager:9093'] rule_files: - '/etc/prometheus/rules/*.yml' -
Create Alert Rules:
/opt/stacks/monitoring/prometheus/rules/alerts.yml:groups: - name: example rules: - alert: InstanceDown expr: up == 0 for: 5m labels: severity: critical annotations: summary: "Instance {{ $labels.instance }} down" description: "{{ $labels.instance }} has been down for more than 5 minutes." - alert: HighCPU expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80 for: 5m labels: severity: warning annotations: summary: "High CPU usage on {{ $labels.instance }}" description: "CPU usage is above 80% for more than 5 minutes." - alert: HighMemory expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 90 for: 5m labels: severity: warning annotations: summary: "High memory usage on {{ $labels.instance }}" description: "Memory usage is above 90%." - alert: DiskSpaceLow expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 10 for: 5m labels: severity: critical annotations: summary: "Low disk space on {{ $labels.instance }}" description: "Disk space is below 10%." -
Restart Prometheus:
docker restart prometheus -
Access Alertmanager UI:
http://SERVER_IP:9093
Summary
Alertmanager routes alerts from Prometheus offering:
- Alert deduplication
- Grouping and routing
- Multiple notification channels
- Silencing and inhibition
- Web UI management
- Free and open-source
Perfect for:
- Prometheus alert handling
- Multi-channel notifications
- Alert management
- Maintenance silencing
- Alert grouping
Key Points:
- Receives alerts from Prometheus
- Routes to notification channels
- Deduplicates and groups
- Supports silencing
- Web UI for management
- Configure in alertmanager.yml
- Define rules in Prometheus
Remember:
- Configure receivers (Discord, Email, etc.)
- Create alert rules in Prometheus
- Test alerts work
- Use silencing for maintenance
- Group related alerts
- Set appropriate thresholds
- Monitor alertmanager itself
Alertmanager manages your alerts intelligently!