Nagios, Getting Started.

The Industry Standard In IT Infrastructure Monitoring

Who are using Nagios

Agenda

• What is Nagios

• What can you do with Nagios

• Features

• Basico Architectureo Terminology

• Monitoring

• State Types

• Active / Passive Checks

• Reports

What is Nagios Core

Open Source system and network monitoring application

With Nagios you can

• Monitor your entire IT infrastructure

• Spot problems before they occur

• Know immediately when problems arise

• Share availability data with stakeholders

• Detect security breaches

• Plan and budget for IT upgrades

• Reduce downtime and business losses

• Monitoring of network services • SMTP • POP3 • HTTP • PING and more

• Monitoring of host resources• Processor load • Disk usage and more

• Simple plugin design that allows users to easily develop their own service checks

• Parallelized service checks

Features

Features

• Ability to define network host hierarchy/groups

• Allowing detection of and distinction between hosts that are down and those that are unreachable

• Contact notifications when service or host problems occur and get resolved via

• Email• Pager • or user-defined methods

• Ability to define event handlers to be run during service or host events for proactive problem resolution

Features

• Automatic log file rotation

• Support for implementing redundant monitoring hosts

• Optional web interface for viewing • Current network status• Notification • Problem history• Log file and more

Basics

Basics

Basics

Basics

Basics

Basics

Definitions• Host• Service• Contacts• Commands• TimePeriod• Eventhandlers

Basics

HostDefines a physical server, workstation, device, etc. that resides on your network.

Basics

Host

define host{host_name remotehostalias some Remote Hostaddress 192.168.1.50contacts adminmax_check_attempts 3check_period 24x7notification_interval 60notification_period 24x7

}

Basics

Service

• Its a service that runs on the host.

• Actual service on the host like POP, SMTP, HTTP, etc.)

• Metric associated with the host (response to a ping, number of logged in users, free disk space, etc.

Basics

Service

Basics

Servicedefine service {

host_name linux-serverservice_description check-disk-sda1check_command check-disk!/dev/sda1max_check_attempts 5check_interval 5retry_interval 3check_period 24x7notification_interval 30notification_period 24x7notification_options w,c,rcontact_groups admins

}

ContactsIdentify someone who should be contacted in the event of a problem.

define contact{contact_name adminalias adminhost_notifications_enabled 1service_notifications_enabled 1service_notification_period 24x7host_notification_period 24x7service_notification_options w,u,c,rhost_notification_options d,u,rservice_notification_commands notify-by-emailhost_notification_commands host-notify-by-emailemail [email protected] [email protected]

}

Basics

Commands

define command{name check_httpcommand_name check_httpcommand_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$

}

define host {..address 192.168.1.50..

}

Basics

define service { .. check_command check-disk!/dev/sda1 ..}

Time PeriodValid times for notifications and service checks.

define timeperiod{timeperiod_name nonworkhoursalias Non-Work Hourssunday 00:00-24:00 weekmonday 00:00-09:00,17:00-24:00tuesday 00:00-09:00,17:00-24:00wednesday 00:00-09:00,17:00-24:00thursday 00:00-09:00,17:00-24:00friday 00:00-09:00,17:00-24:00saturday 00:00-24:00

}

Basics

Event handlers are optional system commands (scripts or executables) that are run whenever a host or service state change occurs.

• Restarting a failed service• Entering a trouble ticket into a helpdesk system• Logging event information to a database• Cycling power on a host

Event Handlers

Event handlers are executed when a service or host:

• Is in a SOFT problem state• Initially goes into a HARD problem state• Initially recovers from a SOFT or HARD problem state

Event Handlers

define service { .. event_handler command_name

event_handler_enabled [0/1] ..}

Other Blocks• contactgroup• servicegroup• servicedependency• serviceescalation• serviceextinfo• hostdependency• hostescalation• hostextinfo

Basics

Monitoring Services

Nagios can be used to monitor Public and Private Services

• Private Services• CPU load• Memory usage• Disk usage• Logged in users• Running processes

• Publicly available services that are provided by Linux servers • HTTP• FTP• SSH • SMTP

Monitoring Private Services

• Plugins/Addons are mostly used for monitoring private services.

• NRPE addon is installed on the target servers (Nagios Remote Plugin Executor)

• Its is an addon that allows you to execute plugins on remote Linux/Unix hosts

Monitoring Private Services

• NCSA addon (Nagios Service Check Adapter))

• Allows you to send passive check results from remote Linux/Unix to the Nagios daemon running on the monitoring server.

• This is very useful in distributed and redundant/failover monitoring setups.

Monitoring Public Services

• Check plugins first @ Nagios Exchange

• Walk through

• Create host in file within cfg dir• Define Service for each process/service that needs to be

monitored.• Service uses pre-defined/custom defined commands. • Define contacts who would receive notifications and take action.

• Based on variable max_check_attempts

• The SOFT state is logged, when• Number of checks haven’t completed yet

• When a service or host recovers from a soft error. This is considered a soft recovery.

State Types

• HARD state is logged, when• Number of checks have completed

• When a host or service transitions from one hard error state to another error state (e.g. WARNING to CRITICAL).

• ex. Running to Down

• When a service check results in a non-OK state and its corresponding host is either DOWN or UNREACHABLE.

• When a host or service recovers from a hard error state. This is considered to be a hard recovery.

• Contacts are notified of the host or service problem or recovery.

State Types

Active / Passive Checks

Active Checks

● Initiated by the Nagios process

● Ran on a regularly scheduled basis

Active / Passive Checks

Passive Checks● Passive checks are initiated and performed

by external applications/processes

● Passive check results are submitted to Nagios for processing

• Used for• Checks that are asynchronous in nature ● Located behind a firewall and cannot be

checked actively from the monitoring host

Nagios in Action

Demo Time : http://nagioscore.demos.nagios.com/

Reports

• Availability ReportReport for uptime and services

• Trends ReportGraphical breakdown of of state of particular host, service.

Reports

• Alert History ReportRecord of historical alerts

Reports

• Alert Summary Report

Reports

• Alert Histogram ReportFrequency graph of host and service alerts

Reports

• Notification ReportProvides historical record of notifications sent to contacts

Summary

• Infra monitoring

• Anomaly Outage detection

• Automatic Problem remedy

• Schedule Downtime

• Outage Alerts

• Alert Escalations

• Historical Reporting

• Maintenance Planning

Advice for Beginners

• Relax - it's going to take some time.

• Use the quickstart instructions.

• Read the documentation.

• visiting the Nagios Support Forum at http://support.nagios.com/forum/.

http://support.nagios.com/forum/



Next Steps

• Get your hands dirty

• Get trainingLive / Self paced training

• Get certifiedNagios Certified ProfessionalNagios Certified Administrator

• Use it to Monitor your infra.

http://www.nagios.com/services/training

http://www.nagios.com/services/training

http://www.nagios.com/services/certification

http://www.nagios.com/services/certification

References

• Nagios Documentation• Nagios Online Demo• Slideshare• NRPE Blog

http://www.nagios.org/documentation/

http://www.nagios.org/documentation/

http://nagioscore.demos.nagios.com/

http://nagioscore.demos.nagios.com/

http://www.slideshare.net/nagiosinc/jeff-ly-case-study-nagios-nu-skin?qid=a6fcfc12-af4d-4e4f-a766-b11615a49f4d&v=qf1&b=&from_search=1

http://www.slideshare.net/nagiosinc/jeff-ly-case-study-nagios-nu-skin?qid=a6fcfc12-af4d-4e4f-a766-b11615a49f4d&v=qf1&b=&from_search=1

http://blog.roozbehk.com/post/25059446631/nrpe-monitoring-linux-remote-hosts-nagios

http://blog.roozbehk.com/post/25059446631/nrpe-monitoring-linux-remote-hosts-nagios

Thank You

Nagios, Getting Started.

Software

commands host

host problems

failed service

actual service

basicsservicedefine

host response

basicshostdefine host

command checkdisk