DevOps Classroom Series – 13/Jul/2020

Understand checks in Nagios

  • Nagios performs checks by running a external command and uses return code along with output from command to findout whether the check worked or not
  • Nagios will schedule checks, so it requires plugins to have specific behavior. Nagios relies on exit codes of command
exit code       Status          Description
0               OK                Working Correctly
1               Warning           Working, but needs attention
2               CRITICAL          Not Working correctly
3               UNKNOWN           Unable to determine status for host or service
  • Standard Nagios Plugins usually accept the following parameters
    • -h, –help: Provide Help
    • -V, –version: Print version of the plugin
    • -v : prints more detailed information
    • -w: specific limits for warning
    • -c: specific limits for critical
    • -H: hostname, Ip address etc
    • -4: use ipV4
    • -6: use ipv6
    • -p: TCP or udp port to connect
  • Where are plugins located
cd /usr/local/nagios/libexec
ls -al

Preview

  • Lets check some low level commands
./check_ping --help
./check_ping -H google.com -w 100,5% -c 200,10% -p 5
./check_ping -H 172.31.5.24 -w 100,5% -c 200,10% -p 5

Preview Preview

  • Now lets look at command defintion
################################################################################

# This command checks to see if a host is "alive" by pinging it
# The check must result in a 100% packet loss or 5 second (5000ms) round trip
# average time to produce a critical error.
# Note: Five ICMP echo packets are sent (determined by the '-p 5' argument)

define command {

    command_name    check-host-alive
    command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}
  • Now nagios has lot of standard macros Refer Here
  • If we want to check tcp or udp connectivity
./check_tcp --help
./check_udp --help

Preview

  • If you want to check email server connectivity
./check_pop --help
./check_imap --help
./check_smtp --help
  • Checking ftp, dhcp server
./check_ftp --help
./check_dhcp --help
  • Check websites
./check_http --help
  • Checking database servers
# mysql
./check_mysql --help
./check_mysql_query --help

# PostgreSQL
./check_pgsql --help

# Oracle
./check_oracle --help
  • Checking swap space
./check_swap --help
  • Checking disk space
./check_disk --help
# remote shares
./check_disk_smb --help
  • Third party plugins: Refer Here

  • Building a Nagios Command or understanding existing command

    • understand command line of the plugin
    • Then write nagios command defintion
    define command {
        command_name   check_myown_mysql
        command_line   $USER1$/check_tcp -H $HOSTADDRESS$ -p 3306 -c 10 -w 5 
    }
    
    define command {
        command_name   check_myown_pgsql
        command_line   $USER1$/check_tcp -H $HOSTADDRESS$ -p 5432 -c 10 -w 5 
    }
    
    • Later we would use these commands with services or service groups
    define host {
        use             linux-server
        host_name       dbserver1
        alias           dbserver1
        address         172.31.5.24
    }
    
    define hostgroup {
        hostgroup_name		mysqldb-servers
        alias			    mysql db Servers
        members			    dbserver1
    }
    
    define service {
    
        use                     local-service           ; Name of service template to use
        hostgroup_name          mysqldb-servers
        service_description     check mysql up or not
        check_command           check_myown_mysql
    }
    
    
    
    • For configuration files of nagios created by me Refer Here
    • When commands fail with critical state or warning depending on your configuration you would recieve emails to all the configured members in the contact(s)/contact groups Preview Preview
    • I have configured email notifications using sendgrid and installed sendmail Refer Here for configuration

How can i use Nagios for advanced checks

  • Lets assume you are being asked to check free disk space, processes running on remote servers.
  • Consider below image Preview
  • Checking if the hosts (tomcat, nginx, db1, db2) are alive or not , checking if 8080 port for tomcat, 80 port for nginx, 3306 for db1 and db2 are responding or not is straight forward.
  • Now i want to find out free disk space on db1 and db2, user load in tomcat and nginx. Which is possible only when nagios logs into the machines using ssh and then running linux commands (this is not feasible), Preview
  • Using Nagios to solve these kind of problems, we need to understand
    • Active and Passive Checks and NRDP
    • NRPE and NSClient++

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About learningthoughtsadmin