Monitoring and Alerting
- Alert on SLIs or SLOs
- Turn off all the other alerts
Observability
- Three different things to observe
- Logs
- Metrics
- Traces
- Monitoring system with High level failures which navigates to
- Logs
- Metrics
- Traces
Incident Mangement
- Clear cut Ways of Working defined by SRE.
- Following Roles are available to deal the situation
- Incident Commander Role is allocated when the incident is recorded.
- Incident Commander has following activiteis
- Plan the Work to Resolve incident or delegate to Planning Lead (New Role create for incident)
- Do Operations to Resolve or delegate to Operations Lead (New Role create for incident)
- Make necessary Communications or delegate to Communications Lead (New Role create for incident)
- Once the incident is resolved Create Postmortem Documnet