Kubernetes classroom Notes – 13/July/2024

Site Reliability Engineering

Refer Here for sre books published by google.
SRE is an engineering process on how google runs production systems
Engineering ideas were largely adopted by customers of google and also other enterprises and now we have a job role called as SRE
Refer Here for presentation on SRE

Observability

Observability runs on collection 3 major informations about applications
- metrics: A numerical value that represents some collected metric (cpu, memory, latency, error rate)
- logs: A text record
  - levels:
    - information
    - warning
    - error
    - debug (verbosity levels)
- traces
We integrate the above with actionable alerting system.
Centralized log aggregation tools:
- Elastic Search (logstash and beats)
- Splunk
- Fluentd
- datadog (sass product)
metrics:
- New Relic
- Metric beats => Elastic Search
- nagios & zabbix
Tracing (APM)
- app dynamics
- elastic search apm
How to acheive observability
- Fluentd
- Prometheus
- Grafana

By continuous learner

devops & cloud enthusiastic learner

View all of continuous learner's posts.

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Please turn AdBlock off

Animated Social Media Icons by Acurax Wordpress Development Company

Exit mobile version

%%footer%%