Prometheus and Grafana configuration outside kubernetes cluster
Alert Manager Installation
Lets install alert manager on the same server where prometheus is running
create a user called as alertmanager
sudo useradd --no-create-home --shell /bin/false alertmanager
Lets create some folders for holding alert manager
sudo mkdir /etc/alertmanager
sudo mkdir -p /data/alertmanager
Now Download & untar alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz
tar xzf alertmanager-0.23.0.linux-amd64.tar.gz
Lets copy alertmanager amtool to /usr/local/bin
sudo cp amtool /usr/local/bin/
sudo cp alertmanager /usr/local/bin/
copy the alert manager yaml file /etc/alertmanager/
sudo cp alertmanager.yml /etc/alertmanager
Lets give permissions to alertmanager user
sudo chown alertmanager:alertmanager /usr/local/bin/{amtool,alertmanager}
sudo chown -R alertmanager:alertmanager /data/alertmanager /etc/alertmanager/*
Lets create a systemd unitfile at /etc/systemd/system/alertmanager.service
[Unit]
Description=AlertManager
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
--config.file /etc/alertmanager/alertmanager.yml \
--storage.path /data/alertmanager
[Install]
WantedBy=multi-user.target
Now enable and start the alertmanager
sudo systemctl enable alertmanager.service
sudo systemctl start alertmanager.service
sudo systemctl status alertmanager.service
We need to bind the alert manager to prometheus. Change /etc/prometheus/prometheus.yml to add the following section of alerts
global:
scrape_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'kubernetes'
static_configs:
- targets: ['10.128.0.31:30000']
sudo systemctl restart prometheus
Now change the alert manager to add the email reciever
GNU nano 4.8 /etc/alertmanager/alertmanager.yml Modified route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'email'
email_configs:
- to: 'devops@qt.com'
from: 'alerts@qt.com'
smarthost: smtp.mailtrap.io:587
auth_username: 'jdsflksd'
auth_password: 'test'
- name: 'web.hook'
webhook_configs:
- url: 'http://127.0.0.1:5001/'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
Recording Rules
We can use recording rules to have prometheus evaluate PromQL Exressions regularly and ingest their results.
This is useful to speed up your dashboards and provide aggregrated results for use elsewhere
Recording rules got in seperate files from prometheus.yaml which can be specified in rule_files top_level filed in prometheus.yml file
global:
scrape_interval: 15s
rule_files:
- rules.yaml
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'kubernetes'
static_configs:
- targets: ['10.128.0.31:30000']
groups:
- name: example
rules:
- record: job:process_cpu_seconds:rate5m
expr: sum without(instance)(rate(process_cpu_seconds_total[5m]))
Alerting
There is a set of community alerts created and hosted over here Refer Here
Lets create a alert-k8s.yaml in /etc/prometheus/alerts/k8s.yaml
groups:
- name: LearningK8s
rules:
- alert: KubernetesNodeReady
expr: kube_node_status_condition{condition="Ready",status="true"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: Kubernetes Node ready (instance {{ $labels.instance }})
description: "Node {{ $labels.node }} has been unready for a long time\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
global:
scrape_interval: 15s
rule_files:
- 'alerts/k8s.yml'
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'kubernetes'
static_configs:
- targets: ['10.128.0.31:30000']
As of now we are able to get the alert in prometheus which gets forwarded to alert manager but we are not recieving email
Note: I will try to look into this issue and we will create some alerts