Prometheus and Grafana configuration outside kubernetes cluster
Alert Manager Installation
- Lets install alert manager on the same server where prometheus is running
- create a user called as alertmanager
sudo useradd --no-create-home --shell /bin/false alertmanager
- Lets create some folders for holding alert manager
sudo mkdir /etc/alertmanager
sudo mkdir -p /data/alertmanager
- Now Download & untar alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz
tar xzf alertmanager-0.23.0.linux-amd64.tar.gz
- Lets copy alertmanager amtool to /usr/local/bin
sudo cp amtool /usr/local/bin/
sudo cp alertmanager /usr/local/bin/
- copy the alert manager yaml file /etc/alertmanager/
sudo cp alertmanager.yml /etc/alertmanager
- Lets give permissions to alertmanager user
sudo chown alertmanager:alertmanager /usr/local/bin/{amtool,alertmanager}
sudo chown -R alertmanager:alertmanager /data/alertmanager /etc/alertmanager/*
- Lets create a systemd unitfile at /etc/systemd/system/alertmanager.service
[Unit]
Description=AlertManager
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
--config.file /etc/alertmanager/alertmanager.yml \
--storage.path /data/alertmanager
[Install]
WantedBy=multi-user.target
- Now enable and start the alertmanager
sudo systemctl enable alertmanager.service
sudo systemctl start alertmanager.service
sudo systemctl status alertmanager.service

- We need to bind the alert manager to prometheus. Change /etc/prometheus/prometheus.yml to add the following section of alerts
global:
scrape_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'kubernetes'
static_configs:
- targets: ['10.128.0.31:30000']
sudo systemctl restart prometheus
- Now change the alert manager to add the email reciever
GNU nano 4.8 /etc/alertmanager/alertmanager.yml Modified route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'email'
email_configs:
- to: 'devops@qt.com'
from: 'alerts@qt.com'
smarthost: smtp.mailtrap.io:587
auth_username: 'jdsflksd'
auth_password: 'test'
- name: 'web.hook'
webhook_configs:
- url: 'http://127.0.0.1:5001/'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
Recording Rules
- We can use recording rules to have prometheus evaluate PromQL Exressions regularly and ingest their results.
- This is useful to speed up your dashboards and provide aggregrated results for use elsewhere
- Recording rules got in seperate files from prometheus.yaml which can be specified in rule_files top_level filed in prometheus.yml file
global:
scrape_interval: 15s
rule_files:
- rules.yaml
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'kubernetes'
static_configs:
- targets: ['10.128.0.31:30000']
groups:
- name: example
rules:
- record: job:process_cpu_seconds:rate5m
expr: sum without(instance)(rate(process_cpu_seconds_total[5m]))
Alerting
- There is a set of community alerts created and hosted over here Refer Here
- Lets create a alert-k8s.yaml in /etc/prometheus/alerts/k8s.yaml
groups:
- name: LearningK8s
rules:
- alert: KubernetesNodeReady
expr: kube_node_status_condition{condition="Ready",status="true"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: Kubernetes Node ready (instance {{ $labels.instance }})
description: "Node {{ $labels.node }} has been unready for a long time\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
global:
scrape_interval: 15s
rule_files:
- 'alerts/k8s.yml'
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'kubernetes'
static_configs:
- targets: ['10.128.0.31:30000']
- As of now we are able to get the alert in prometheus which gets forwarded to alert manager but we are not recieving email
- Note: I will try to look into this issue and we will create some alerts