Nagios – Introduction

Issue stopping Production

  • Consider the simple diagram below, which has a web server and db server (primary & secondary) and also email and dhcp servers in a corporate network Preview
  • Now in this sample network there can be many failures, which can happen lets assume a failure which says db servers and webservers are down. Sounds like a serious error
    • In this case now admin has to rush into the lab to identify the failure, fix the failure and resolve the issue
    • This is pretty much time taking activity
    • Root cause could be Switch on the production network is down, so admins recieve the alerts saying webserver and db servers are down
  • Admins need some kind of a monitoring system which can help in identifying the failure easily
  • Nagios can help diagnosing these kind of issues very easily, so lets get started with this journey of System Monitoring using Nagions. Preview

Nagios

  • Nagios is an open source tool for system monitoring.

  • It watches servers, services other devices on your network and informs if they are not working as expected

  • Monitoring in Nagios is split into two main categories

    • hosts: Physical or virtual device on the network (servers, routers, switches, printers etc)
    • services: Particular functionality which is running on a host (SSH, Email Services, Web Servers and Databases)
  • Hosts in Nagios can be grouped as Host groups for convience Preview

  • If you consider the above image we can orgnize the hosts, hostgroups and services as follows Preview

Nagios States

  • Nagios has 4 states
    • OK
    • Warning
    • Critical
    • Unknown
  • These are much like simple traffic signals which describe the health of host/service. This is much simpler than looking for graphs, analysing trends etc

Plugins

  • Nagios performs all of its checks using plugins to which nagios passes on what should be checked and what are the warning and critical limits.
  • Nagios comes with standard set of plugins that allow you to check for almost all the services that are used mostly in Enterprises.
  • Nagios also provides easier way to write our own plugins

Main Object Definitions of Nagios

  • Commands:
    • These define how nagios should performs checks
    • Act as abstraction to actual plugins which all you to perform checks
  • Time Periods:
    • Date and time periods during which the operations should or should not be performed
    • Eg: Monday-Friday, 10:00 AM – 06:00 PM
  • Hosts and Host groups:
    • Already defined above, but individual device/virtual/physical machine is generally host
    • Hosts are grouped into host groups
    • One Host might be part of multiple groups
  • Services:
    • Functionality to monitor
    • Eg: CPU Utilization, Storage Space or Web Server
  • Contacts and contact groups:
    • People whom should be notified with the information about how is a contact
    • Just like hosts are grouped into host groups, contacts are also grouped into contact Group
  • Notifcation:
    • These define who should be notified of what
    • Eg: All the Server failure report to admins during working times and outside of working times notify lead admins

Soft and Hard States

  • During some temporary failures which are auto corrected for example restart of the webserver will bring some page down for few seconds after that the users will not see the failure of page not loading.

  • To make it easier whether the problem is temporary or permanent, soft states are introduced

  • Soft state is generally a temporary state and Hard State is Permanent

  • Lets assume we are monitoring webserver and the current state is webserver is up and running Preview

  • Now lets assume some admin has restarted the server and now the current state will be Preview

  • Now nagios will have configured number of soft state checks to performed before declaring the hard state and now lets assume number is 3 and it is checked for every 5 seconds. If the webserver comes into running state with this time status will be Preview

  • If the webserver fails to come up into running state even after 3 attempts which is configured then the following will be the state of nagios Preview

  • This concept helps admins from getting unnecessary alerts or noise

  • In the next series we will install nagios

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About continuous learner

devops & cloud enthusiastic learner