Kubernetes Classroom Series – 09/Sept/2021 – Direct DevOps from Quality Thought

Prometheus and Grafana configuration outside kubernetes cluster

Refer Here for the article

Alert Manager Installation

Lets install alert manager on the same server where prometheus is running
create a user called as alertmanager

sudo useradd --no-create-home --shell /bin/false alertmanager

Lets create some folders for holding alert manager

sudo mkdir /etc/alertmanager
sudo mkdir -p /data/alertmanager

Now Download & untar alertmanager

wget https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz
tar xzf alertmanager-0.23.0.linux-amd64.tar.gz

Lets copy alertmanager amtool to /usr/local/bin

sudo cp amtool /usr/local/bin/
 sudo cp alertmanager /usr/local/bin/

copy the alert manager yaml file /etc/alertmanager/

sudo cp alertmanager.yml /etc/alertmanager

Lets give permissions to alertmanager user

sudo chown alertmanager:alertmanager /usr/local/bin/{amtool,alertmanager}
sudo chown -R alertmanager:alertmanager /data/alertmanager /etc/alertmanager/*

Lets create a systemd unitfile at /etc/systemd/system/alertmanager.service

[Unit]
Description=AlertManager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
    --config.file /etc/alertmanager/alertmanager.yml \
    --storage.path /data/alertmanager

[Install]
WantedBy=multi-user.target

Now enable and start the alertmanager

sudo systemctl enable alertmanager.service
sudo systemctl start alertmanager.service
sudo systemctl status alertmanager.service

We need to bind the alert manager to prometheus. Change /etc/prometheus/prometheus.yml to add the following section of alerts

global:
  scrape_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093']

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'kubernetes'
    static_configs:
      - targets: ['10.128.0.31:30000']

Now restart prometheus

sudo systemctl restart prometheus

Now change the alert manager to add the email reciever

  GNU nano 4.8                                                 /etc/alertmanager/alertmanager.yml                                                 Modified  route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'web.hook'
receivers:
- name: 'email'
  email_configs:
    - to: 'devops@qt.com'
      from: 'alerts@qt.com'
      smarthost: smtp.mailtrap.io:587
      auth_username: 'jdsflksd'
      auth_password: 'test'

- name: 'web.hook'
  webhook_configs:
  - url: 'http://127.0.0.1:5001/'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

Recording Rules

We can use recording rules to have prometheus evaluate PromQL Exressions regularly and ingest their results.
This is useful to speed up your dashboards and provide aggregrated results for use elsewhere
Recording rules got in seperate files from prometheus.yaml which can be specified in rule_files top_level filed in prometheus.yml file

global:
  scrape_interval: 15s
rule_files:
  - rules.yaml

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093']

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'kubernetes'
    static_configs:
      - targets: ['10.128.0.31:30000']

Sample rules.yaml

groups:
  - name: example
    rules:
      - record: job:process_cpu_seconds:rate5m
        expr: sum without(instance)(rate(process_cpu_seconds_total[5m]))

Alerting

There is a set of community alerts created and hosted over here Refer Here
Lets create a alert-k8s.yaml in /etc/prometheus/alerts/k8s.yaml

groups:

- name: LearningK8s
  rules:
  - alert: KubernetesNodeReady
    expr: kube_node_status_condition{condition="Ready",status="true"} == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: Kubernetes Node ready (instance {{ $labels.instance }})
      description: "Node {{ $labels.node }} has been unready for a long time\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

prometheus.yaml

global:
  scrape_interval: 15s

rule_files:
  - 'alerts/k8s.yml'

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093']

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'kubernetes'
    static_configs:
      - targets: ['10.128.0.31:30000']

As of now we are able to get the alert in prometheus which gets forwarded to alert manager but we are not recieving email
Note: I will try to look into this issue and we will create some alerts

Prometheus and Grafana configuration outside kubernetes cluster

Alert Manager Installation

Recording Rules

Alerting

Share this:

Leave a ReplyCancel reply

Discover more from Direct DevOps from Quality Thought