Kubernetes classroom Notes – 13/July/2024

Site Reliability Engineering

  • Refer Here for sre books published by google.
  • SRE is an engineering process on how google runs production systems
  • Engineering ideas were largely adopted by customers of google and also other enterprises and now we have a job role called as SRE
  • Refer Here for presentation on SRE

Observability

  • Observability runs on collection 3 major informations about applications
    • metrics: A numerical value that represents some collected metric (cpu, memory, latency, error rate)
    • logs: A text record
      • levels:
        • information
        • warning
        • error
        • debug (verbosity levels)
    • traces
  • We integrate the above with actionable alerting system.
  • Centralized log aggregation tools:
    • Elastic Search (logstash and beats)
    • Splunk
    • Fluentd
    • datadog (sass product)
  • metrics:
    • New Relic
    • Metric beats => Elastic Search
    • nagios & zabbix
  • Tracing (APM)
    • app dynamics
    • elastic search apm
  • How to acheive observability
    • Fluentd
    • Prometheus
    • Grafana
Published
Categorized as Uncategorized Tagged

By continuous learner

devops & cloud enthusiastic learner

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Please turn AdBlock off
Animated Social Media Icons by Acurax Wordpress Development Company

Discover more from Direct DevOps from Quality Thought

Subscribe now to keep reading and get access to the full archive.

Continue reading

Exit mobile version
%%footer%%