Monitoring
System Monitoring
- Monitor Basic System Details
- Is Server Up
- Is Http Page responding
- Is Datbase Query Responding
- Whats the free disk space at that moment
- What is CPU Utilization at that moment
- What is Network load at that moment
- What is Disk IO Activity at that moment
- It can be completely our responsibility (DevOps/SRE)
- Tools:
- Nagios
- Zabbix
- Icinga
Application Monitoring
- Detailed monitoring of your application
- How much memory is my application consuming
- Whats the current number of concurrent users on my application
- What are my applications logs speaking
- What are my applications traces telling
- What are failure patterns
- Need collaboration with Dev to accomplish this monitoring
- Tools:
- Elastic Stack
- Splunk
- App Dynamics
- Application Insights
Terms
- Metric: Some measurement in terms of units of System/Application. Eg CPU Utilization, Newtork In/Out
- Charts: Charts are metrics aggregated over time
- Logs:
- System Logs
- Application Logs
- Dashboard: A unified view of every thing that matters.
- Alert: concern about a system raised to person(s)
Expectations
- Health of the System
- Identify Failure Patterns
- Do analytics to suggest better customer experience.
SRE Expectations
- Reduce Noise and Alert on SLIs or SLOs
- Make your Monitoring Observable
Nagios
- Has two Versions
- Core
- Enterprise
- Nagios Core Installation:
- Involves downloading Nagios Code
- Building the Nagios Code
- Configuring Nagios
- For the above nagios uses make
- Nagios
- Plugins
- Commands