Understand the Components of Monitoring
- Every monitoring architecture or setup works with same base components, They are
- Alerts/notifications:
- A alert/notification is an event that is triggered to inform the system Admin or Cloud Engineer about a potential issue or issue that has already happened
- Notifications can be sent using different media Email, SMS, Pager Alert, Mobile App Push Notifications.
- In AWS there is a Service called as Amazon SNS (Simple Notification System) where we can configure alerts/notifications
- Events:
- Any action or activity or series of activities that occur in a system is called an event
- The ability to track each of thes events is very important in monitoring software systems and applications
- Logging:
- A log is historical record of an event.
- A log contains details of the event and also the time when event occured.
- Logs are generally text files in formats such as (.txt, .json, .xml)
- Analyzing Logs to improve troubleshooting is an important action for admins.
- Metrics
- A metric is an information about a resource at certain point of time
- Examples: Free disk space, memory consumption, cpu usage etc
- System Availability:
- Availability = (uptime)/(uptime+downtime)
- Incidents: Some issue that has caused the failure of the system
- Alerts/notifications:
- Note: MTTR (Mean time to Recover)
AWS CloudWatch
- This is a Service designed by AWS which is end-to-end monitoring solution, used for monitoring applications, server, serverless application, on-premises systems and many more.
- Cloud Watch has the ability
- to collect and store logs of the applications deployed in any environment
- to collect metrics and create dashboards
- to use SNS to send notifications of failures
- to capture inbuilt Events
- to take corrective actions