Monitoring
- Consider the following application architecture
- The above architecture is a sample architecture to make a movie ticketing application work.
-
What can go wrong here?
- network failures
- power failures
- OS failures
- Disks getting filled up
- New OS Updates/Patches crashing
- Hardware issues
- Services crashing
- code issues
- High CPU utilizations
- High Memory
- High I/O on disk
- Performance issues
- Saturation:
- Latency: Time Taken by packet to reach the server (journey time)
- High CPU and Memory, Disk Issues
-
When things go wrong what can be done for quick resolutions ?
- We cant avoid all failures, but we need quick resolutions
- Reducing Problems in Applications might not be possible. But on a broader note there are two approaches
- Preventive approaches
- Reactive approaches
- To acheive the faster resolution we need to monitor. We are supposed to monitor
- Server/Infrastructure
- Basic health check for server/infra up or not
- Some metrics to collect
- CPU Utilization
- Memory Utilzation
- Free disk space
- Application Monitoring
- Application home page accesible or not
- logs of applications
- Server/Infrastructure