Kubernetes Classroom Series – 09/Sept/2021

Prometheus and Grafana configuration outside kubernetes cluster

Alert Manager Installation

  • Lets install alert manager on the same server where prometheus is running
  • create a user called as alertmanager
sudo useradd --no-create-home --shell /bin/false alertmanager
  • Lets create some folders for holding alert manager
sudo mkdir /etc/alertmanager
sudo mkdir -p /data/alertmanager
  • Now Download & untar alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz
tar xzf alertmanager-0.23.0.linux-amd64.tar.gz
  • Lets copy alertmanager amtool to /usr/local/bin
sudo cp amtool /usr/local/bin/
 sudo cp alertmanager /usr/local/bin/
  • copy the alert manager yaml file /etc/alertmanager/
sudo cp alertmanager.yml /etc/alertmanager
  • Lets give permissions to alertmanager user
sudo chown alertmanager:alertmanager /usr/local/bin/{amtool,alertmanager}
sudo chown -R alertmanager:alertmanager /data/alertmanager /etc/alertmanager/*
  • Lets create a systemd unitfile at /etc/systemd/system/alertmanager.service
[Unit]
Description=AlertManager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
    --config.file /etc/alertmanager/alertmanager.yml \
    --storage.path /data/alertmanager

[Install]
WantedBy=multi-user.target
  • Now enable and start the alertmanager
sudo systemctl enable alertmanager.service
sudo systemctl start alertmanager.service
sudo systemctl status alertmanager.service
  • Preview
  • We need to bind the alert manager to prometheus. Change /etc/prometheus/prometheus.yml to add the following section of alerts
global:
  scrape_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093']

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'kubernetes'
    static_configs:
      - targets: ['10.128.0.31:30000']
  • Now restart prometheus
sudo systemctl restart prometheus
  • Now change the alert manager to add the email reciever
  GNU nano 4.8                                                 /etc/alertmanager/alertmanager.yml                                                 Modified  route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'web.hook'
receivers:
- name: 'email'
  email_configs:
    - to: 'devops@qt.com'
      from: 'alerts@qt.com'
      smarthost: smtp.mailtrap.io:587
      auth_username: 'jdsflksd'
      auth_password: 'test'

- name: 'web.hook'
  webhook_configs:
  - url: 'http://127.0.0.1:5001/'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

Recording Rules

  • We can use recording rules to have prometheus evaluate PromQL Exressions regularly and ingest their results.
  • This is useful to speed up your dashboards and provide aggregrated results for use elsewhere
  • Recording rules got in seperate files from prometheus.yaml which can be specified in rule_files top_level filed in prometheus.yml file
global:
  scrape_interval: 15s
rule_files:
  - rules.yaml

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093']

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'kubernetes'
    static_configs:
      - targets: ['10.128.0.31:30000']
  • Sample rules.yaml
groups:
  - name: example
    rules:
      - record: job:process_cpu_seconds:rate5m
        expr: sum without(instance)(rate(process_cpu_seconds_total[5m]))

Alerting

  • There is a set of community alerts created and hosted over here Refer Here
  • Lets create a alert-k8s.yaml in /etc/prometheus/alerts/k8s.yaml
groups:

- name: LearningK8s
  rules:
  - alert: KubernetesNodeReady
    expr: kube_node_status_condition{condition="Ready",status="true"} == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: Kubernetes Node ready (instance {{ $labels.instance }})
      description: "Node {{ $labels.node }} has been unready for a long time\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

  • prometheus.yaml
global:
  scrape_interval: 15s

rule_files:
  - 'alerts/k8s.yml'

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093']

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'kubernetes'
    static_configs:
      - targets: ['10.128.0.31:30000']
  • As of now we are able to get the alert in prometheus which gets forwarded to alert manager but we are not recieving email
  • Note: I will try to look into this issue and we will create some alerts

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Please turn AdBlock off
Plugin for Social Media by Acurax Wordpress Design Studio

Discover more from Direct DevOps from Quality Thought

Subscribe now to keep reading and get access to the full archive.

Continue reading

Visit Us On FacebookVisit Us On LinkedinVisit Us On Youtube