DevOps Classroom Series – 01/May/2021

Site Reliability Engineering (SRE)

  • Refer Here for the Site Reliability Engineering notes
  • SRE is principles based on how Google runs production systems
    • Engineering approach to operations
  • Basic problem statement Preview
  • Functions of Site Reliability Engineering
    • Reducing Toil
    • Managing Risk
    • Handling Failures
  • For any application there are four golden signals
    • Latency: This is the time taken to send a request and recieve a response
    • Traffic: This is measured in number of requests flowing across the n/w
    • Errors: Errors can tell us about misconfigurations in infrastructure, bugs in application code or broken dependencies
    • Saturation: This defines th load on your network and server resources
  • Service Level Indicator
    • Success Rate: for every 5000 requests send to the server 4800 requests are be successful
      • SLI : 96% of requests successful
    • Latency: For the last 5000 requests 4000 requests have latency less than 0.5 seconds, 600 with in 2 seconds and 300 within 5seconds
      • SLI : Latency of 80% request with in 0.5%, 92% with in 2s, 99.5 within 5 seconds
  • Service Level Objective:
    • application will be up and available for 99.5% in a year

Observability in Elastic Stack

  • Create an Elastic Stack Cloud account Refer Here
  • After setup run the spring pet clinic by following apm agent
  • Exercise:
    • Send metrics, logs and enable tracing for a spring pet clinic application
    • To Install
      • Create a ubuntu linux
      sudo apt update
      sudo apt install openjdk-11-jdk -y
      • Now run the spring petclinic application
      • Configure heart beat, metric beat to kibana in elastic cloud

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About learningthoughtsadmin