DevOps Classroom Series – 08/Jul/2020

Nagios

  • Nagios is an open source tool for System Monitoring
  • Main purposed of system monitoring is to detect if any system is not working as intended and notify the apropriate staff to resolve the error as early as possible.
  • System monitoring in Nagios is split into two major categories of objects
    • Hosts:
      • represent a physical or virtual device on your network
      • Eg: appserver1.lt.com, nswitch1, 192.168.0.11
    • Services
      • are particular functionalities on hosts
      • Eg: SSH, tomcat, SQL Server etc Preview
  • Hosts can be grouped as host groups. Preview
  • Nagios checks will result in 4 distinct states
    • OK
    • Warning
    • Critical
    • Unknown
  • Nagios performs all of its checks using plugins.
  • Plugins are external components which Nagios passes information on what should be checked and what limits
  • Plugins are responsible for performing the checks and anlalysing the results. The output form checks of Nagions is status (OK, Warning, Critical, Unknown)
  • Nagios comes with set of Standard plugins.
  • If you need to perform a specific check, it is easy to write your own plugins in any language.
  • Nagios is based on clear object definition system. So lets see the different types of objects
    • Commands:
      • definitions of how nagios should perform types of checks
      • They are abstraction layer on top of actual plugins that perform the checks.
    • Time periods:
      • Date and time spans during which operations should/shouldn’t be perfomed
    • Host and host groups
    • Services: various functionalities or resources to monitor on specific host. eg: CPU usage, free disk space, webserver, dbserver, ftp
    • Contact and contact groups: People or group of people that should be notified.
    • Notifications: These define who should be notified of what.
    • Escalations: Extensions to notifications. For eg: a critical server being down for more that 4 hours should alert IT management, so that they can track the issue
  • Nagios has a beneficial feature that is mature dependency system
    • Consider this below network Preview
    • Now lets assume the following router goes down, then generally it would endup showing dbserver is down, webserver is down, switch2 is down and router2 is down Preview
    • In nagios we can define that a particular service is dependent on other server and if the other service fails, dont check the dependent services
    • Nagios offers a consistent system of macro definitions. These are variables that can be put into all object definitions
  • Soft and hard states:
    • Nagios works by checking if a particular service or host is working correctly by storing status. In order to avoid detecting random or temporary failures, Nagios uses soft and hard states
    • In the below infra, lets assume you want to restart webserver Preview
    • Restarting web server makes web pages inaccesible for 10 seconds.
    • when the previous status of check is unknow or different from the new state, Nagios will assume soft state is new result and it will try to redo the check to ensure that the new result state is permanent. This permanent state is hard state.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About learningthoughtsadmin