DevOps Classroom Series – 01/May/2021 – Direct DevOps from Quality Thought

Site Reliability Engineering (SRE)

Refer Here for the Site Reliability Engineering notes
SRE is principles based on how Google runs production systems
- Engineering approach to operations
Basic problem statement
Functions of Site Reliability Engineering
- Reducing Toil
- Managing Risk
- Handling Failures
For any application there are four golden signals
- Latency: This is the time taken to send a request and recieve a response
- Traffic: This is measured in number of requests flowing across the n/w
- Errors: Errors can tell us about misconfigurations in infrastructure, bugs in application code or broken dependencies
- Saturation: This defines th load on your network and server resources
Service Level Indicator
- Success Rate: for every 5000 requests send to the server 4800 requests are be successful
  - SLI : 96% of requests successful
- Latency: For the last 5000 requests 4000 requests have latency less than 0.5 seconds, 600 with in 2 seconds and 300 within 5seconds
  - SLI : Latency of 80% request with in 0.5%, 92% with in 2s, 99.5 within 5 seconds
Service Level Objective:
- application will be up and available for 99.5% in a year

Observability in Elastic Stack

Create an Elastic Stack Cloud account Refer Here
After setup run the spring pet clinic by following apm agent

Exercise:

Send metrics, logs and enable tracing for a spring pet clinic application

To Install

Create a ubuntu linux

sudo apt update
sudo apt install openjdk-11-jdk -y
wget https://storage.googleapis.com/qtreferenceapplications/spring-petclinic-2.4.2.jar
wget https://repo1.maven.org/maven2/co/elastic/apm/elastic-apm-agent/1.23.0/elastic-apm-agent-1.23.0.jar

Now run the spring petclinic application
Configure heart beat, metric beat to kibana in elastic cloud

Site Reliability Engineering (SRE)

Observability in Elastic Stack

Share this:

Leave a ReplyCancel reply

Discover more from Direct DevOps from Quality Thought