DevOps Classroom Series – 23/Oct/2019

Primary Reference

SLI (Service Level Indicator)

  • Indicator of Availability of your application/service
  • Sample Indicators:
    • Latency of the home page over last 5 minutes will be less than 300ms for 99.9% of requests

SLO (Service Level Objective)

  • SLI binds over a period of time
  • Sample:
    • Latency of homepage over a year will be less than 300 ms for 99.9% of the requests

SLA (Service Level Agreements)

  • Sample:
    • Customer will be offered free credits if 99.5% of the requests over a year fail to achieved the latency of less than 300 ms

Problem

  • We build systems and they fail at some point.
  • What’s the SRE approach towards failures

Risks

  • You can make aggressive deployments as long as you are with in Error Budget.
  • If Error Budget is exceeded no more deployments

Error Budget

  • Allowed time in minutes or hours of failure.
  • Sample:
    • SLO : Latency will be less than 300 ms for 99.9% of request over the year

    • ERROR budget is what is left of total time after removing SLO (100-99.9) * 365 * 24 * 60/ 100 = 525.6 minutes/year

    • SLO : Latency will be less than 300 ms for 99.99% of request over the year

    • ERROR Budget (100-99.99) * 365 * 24 * 60 /100 = 52.5 minutes/year

Error Budget Burndown

  • Error Budget used
  • Fast Burndowns
  • Slow Burndowns

Toil

  • Repetitive manual work that can be automated
  • Focus on Toils which are more frequent than infrequent ones

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About continuous learner

devops & cloud enthusiastic learner