DevOps Classroomnotes 06/Dec/2022

Monitoring

Consider the following application architecture
The above architecture is a sample architecture to make a movie ticketing application work.
What can go wrong here?
- network failures
- power failures
- OS failures
  - Disks getting filled up
  - New OS Updates/Patches crashing
- Hardware issues
- Services crashing
  - code issues
  - High CPU utilizations
  - High Memory
  - High I/O on disk
- Performance issues
- Saturation:
- Latency: Time Taken by packet to reach the server (journey time)
- High CPU and Memory, Disk Issues
When things go wrong what can be done for quick resolutions ?
- We cant avoid all failures, but we need quick resolutions
Reducing Problems in Applications might not be possible. But on a broader note there are two approaches
- Preventive approaches
- Reactive approaches
To acheive the faster resolution we need to monitor. We are supposed to monitor
- Server/Infrastructure
  - Basic health check for server/infra up or not
  - Some metrics to collect
    - CPU Utilization
    - Memory Utilzation
    - Free disk space
- Application Monitoring
  - Application home page accesible or not
  - logs of applications