DevOps Classroom Series – 19/Feb/2020

SLI, SLO and SLAs

  • SLI defines SLOs which helps in coming up with SLAs
  • SLA is SLO with consequences
  • Examples
SLI: Home page will be loaded with 3 seconds for a period of 10 mins
SLO: Home Page will be loaded within 3 seconds for a period of one month for 99.99% of the requests
SLA: If the homepage is not loaded within 3 seconds for a period of one month for 99.95% of the requests customer will recieve redeem points
    

Who defines What?

Preview

Error Budgets

  • The time where application is allowed to fail
SLA : 99% uptime for a period of one month

Error Budget => 1*30*24* (100-99)/100 = 7.2 hours
  • Error budget has to evenly categorized into multiple known areas Preview

  • If the Error budget is burnt down, then SREs can impose restrictions on any new features during that period.

Risks

  • In Actual Environment where the application is deployed, we might have some risks which impact our SLAs
    • Service Providers Availability is less than 99% then Applications SLA connot be grater than 99%

Toil

  • Toil is an activity which
    • Manual
    • Repetitive
  • This Toil can be ideal candidate to be automated, but we will not automate any thing which doesnt add value
Scenario 1: Every year during chrismas, an engineer needs to restart 10 servers which takes 20 mins

To automate this it takes 20 hours of work from SRE. 

In this case don't automate

Scenario 2: Every week all the logs from web server needs to be exported to blob storage 
 This activity takes 20 mins of SRE's time

 To automate this it takes 10 hours

 In this we would automate this scenario.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About learningthoughtsadmin