Alerts
- Avoid Noisy Alerts
- Alert on SLI’s, SLOs and SLAS
- To make alerts meaniful make your monitoring system Observable
Observability
- Three Pillars of Observability
- Structured Logs
- Metrics
- Traces
Incidents
-
Create a role called as Incident Commander
-
Incident Command Appoints Operations Lead
-
Incident Command Appoints Communications Lead
-
After incident is resolved, Incident Commander is responsible for sharing a Post-Mortem Report after conducting Retrospection.
Some Important Metrics
-
MTTF (Mean Time To Failure)
-
MTBF (Mean Time Between Failures)
-
MTTR ( Mean Time To Resolve)
-
Refer Here for Slideshow of SRE