Monitoring
An System (Hospital Information System)
- This is the system used to maintain a Multi-Speciality Hospital with different branches
- Architecture
- Lets us assume our job in system is figure out issues and respond in the cases of failures.
- To solve this we have to use two approaches
- Proactive
- Reactive
- Metrics:
- MTTF (Mean Time To Failure): Average Time which states time taken by your system to fail. This should be high
- MTTR (Mean Time To Recover): Average Time which states time taken by your team to recover from failure. This should be less.
Expectation
- We need to have a monitoring system so that our objective MTTF is HIGH and MTTR is low can be acheived.
Principles
- Single Point of Failure (SPOF): An component or server which alone is responsible for doing something. This is generally solved by redundancy or replication
- Fault Tolerance: Ability fo system to deal with faults is called as Fault Tolerance.
Ways of Monitoring
- System Monitoring:
- This at a very simple level is to check if the application/server is up or down (Heart Beat)
