Understand checks in Nagios
- Nagios performs checks by running a external command and uses return code along with output from command to findout whether the check worked or not
- Nagios will schedule checks, so it requires plugins to have specific behavior. Nagios relies on exit codes of command
exit code Status Description
0 OK Working Correctly
1 Warning Working, but needs attention
2 CRITICAL Not Working correctly
3 UNKNOWN Unable to determine status for host or service
- Standard Nagios Plugins usually accept the following parameters
- -h, –help: Provide Help
- -V, –version: Print version of the plugin
- -v : prints more detailed information
- -w: specific limits for warning
- -c: specific limits for critical
- -H: hostname, Ip address etc
- -4: use ipV4
- -6: use ipv6
- -p: TCP or udp port to connect
- Where are plugins located
cd /usr/local/nagios/libexec
ls -al
- Lets check some low level commands
./check_ping --help
./check_ping -H google.com -w 100,5% -c 200,10% -p 5
./check_ping -H 172.31.5.24 -w 100,5% -c 200,10% -p 5
- Now lets look at command defintion
################################################################################
# This command checks to see if a host is "alive" by pinging it
# The check must result in a 100% packet loss or 5 second (5000ms) round trip
# average time to produce a critical error.
# Note: Five ICMP echo packets are sent (determined by the '-p 5' argument)
define command {
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}
- Now nagios has lot of standard macros Refer Here
- If we want to check tcp or udp connectivity
./check_tcp --help
./check_udp --help
- If you want to check email server connectivity
./check_pop --help
./check_imap --help
./check_smtp --help
- Checking ftp, dhcp server
./check_ftp --help
./check_dhcp --help
- Check websites
./check_http --help
- Checking database servers
# mysql
./check_mysql --help
./check_mysql_query --help
# PostgreSQL
./check_pgsql --help
# Oracle
./check_oracle --help
- Checking swap space
./check_swap --help
- Checking disk space
./check_disk --help
# remote shares
./check_disk_smb --help
-
Third party plugins: Refer Here
-
Building a Nagios Command or understanding existing command
- understand command line of the plugin
- Then write nagios command defintion
define command { command_name check_myown_mysql command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p 3306 -c 10 -w 5 } define command { command_name check_myown_pgsql command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p 5432 -c 10 -w 5 }
- Later we would use these commands with services or service groups
define host { use linux-server host_name dbserver1 alias dbserver1 address 172.31.5.24 } define hostgroup { hostgroup_name mysqldb-servers alias mysql db Servers members dbserver1 } define service { use local-service ; Name of service template to use hostgroup_name mysqldb-servers service_description check mysql up or not check_command check_myown_mysql }
- For configuration files of nagios created by me Refer Here
- When commands fail with critical state or warning depending on your configuration you would recieve emails to all the configured members in the contact(s)/contact groups
- I have configured email notifications using sendgrid and installed sendmail Refer Here for configuration
How can i use Nagios for advanced checks
- Lets assume you are being asked to check free disk space, processes running on remote servers.
- Consider below image
- Checking if the hosts (tomcat, nginx, db1, db2) are alive or not , checking if 8080 port for tomcat, 80 port for nginx, 3306 for db1 and db2 are responding or not is straight forward.
- Now i want to find out free disk space on db1 and db2, user load in tomcat and nginx. Which is possible only when nagios logs into the machines using ssh and then running linux commands (this is not feasible),
- Using Nagios to solve these kind of problems, we need to understand
- Active and Passive Checks and NRDP
- NRPE and NSClient++