Site Reliability Engineering
- These are practices followed by google to run their production systems. Refer Here for books.
- Refer Here for SRE concepts
Observability of Kubernetes using Prometheus and Grafana Stack
-
We would setup observability using Prometheus, Grafana, Loki
-
We would setup a k8s cluster (aks) and most of the clouds support prometheus and grafana as addons
- Setup AKS
RG=rg-aks-obsv
AKS=aks-obsv
LOCATION=eastus
NAMESPACE=monitoring
az group create -n $RG -l $LOCATION
az aks create -g $RG -n $AKS --node-count 3 --node-vm-size Standard_B2ms \
--enable-managed-identity
az aks get-credentials -g $RG -n $AKS --overwrite-existing
- Setting up promethesu using helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
kubectl create namespace $NAMESPACE
cat > values.kps.yaml <<'EOF'
grafana:
adminUser: admin
adminPassword: admin123
service: { type: LoadBalancer }
persistence:
enabled: true
type: pvc
size: 10Gi
storageClassName: managed-csi
prometheus:
prometheusSpec:
retention: 15d
retentionSize: "2GiB"
walCompression: true
enableFeatures:
- exemplar-storage
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: managed-csi
accessModes: ["ReadWriteOnce"]
resources: { requests: { storage: 5Gi } }
podMonitorSelectorNilUsesHelmValues: false
serviceMonitorSelectorNilUsesHelmValues: false
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
storageClassName: managed-csi
accessModes: ["ReadWriteOnce"]
resources: { requests: { storage: 1Gi } }
EOF
helm install kps prometheus-community/kube-prometheus-stack -n $NAMESPACE -f values.kps.yaml
kubectl get pods -n $NAMESPACE
kubectl get svc -n $NAMESPACE
GRAFANA_LB=$(kubectl -n $NAMESPACE get svc kps-grafana -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo "Grafana: http://$GRAFANA_LB (admin/admin123)"
- Now lets setup log collection using loki and promtail
helm install loki grafana/loki-simple-scalable \
--namespace $NAMESPACE
helm install promtail grafana/promtail \
--namespace $NAMESPACE \
--set "loki.serviceName=loki-write" \
--set "loki.servicePort=3100"
- Lets deploy the app built for monitoring
kubectl create ns demo
IMAGE=shaikkhajaibrahim/observapp:latest
# Deployment & Service (uses your pushed image)
cat > fastapi-k8s.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: fastapi-obs
namespace: demo
spec:
replicas: 2
selector:
matchLabels: { app: fastapi-obs }
template:
metadata:
labels: { app: fastapi-obs }
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
prometheus.io/path: "/metrics"
spec:
containers:
- name: app
image: $IMAGE
env:
- name: OTEL_SERVICE_NAME
value: fastapi-obs
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://otel-collector.otel:4318
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: http/protobuf
- name: OTEL_TRACES_SAMPLER
value: parentbased_traceidratio
- name: OTEL_TRACES_SAMPLER_ARG
value: "1.0" # sample everything in lab; reduce in prod
- name: OTEL_PYTHON_LOG_CORRELATION
value: "true"
ports:
- containerPort: 8000
---
apiVersion: v1
kind: Service
metadata:
name: fastapi-obs
namespace: demo
spec:
selector: { app: fastapi-obs }
ports:
- port: 80
targetPort: 8000
EOF
kubectl apply -f fastapi-k8s.yaml
- setup steady traffic gnerator
kubectl -n demo run looper --image=curlimages/curl -i --rm -- \
sh -lc 'while true; do curl -s fastapi-obs.demo/hello >/dev/null; sleep 0.2; done'
We have built a sample application for observability