Observability
- Google has released in a book in 2016 about how google runs production called it as Site Reliability Engineering
- Site Reliability Engineering Overview
- Golden Signals
- SLIs, SLOs and SLAs
- Observability
Prometheus and Grafana
- Prometheus Architecture
- Installing Prometheus on kubernetes
- Helm Chart
- using Prometheus Operator Refer Here
- PromQL Cheatsheet
- Grafana: Installation Refer Here and Refer Here for helm charts
- Getting logs from kubernetes
- Fluentd
- loki
- elastic beats
- Prometheus Operator usage Refer Here
Making Applications Observable
- Getting Metrics from application into Prometheus
- Exporters: To get metrics from well known servers
- In the application code expose prometheus metrics endpoint
- Logs:
- Any enterprise apps generate logs
- If the logs are written to files use log agents (fluentd)
- Developers can code to directly send logs to centralized log server
- Traces:
- For sending traces of your application to almost any tool there is a standard called as open telemetry
