Enhancing Nagios with Python Plugins Maurice Maneschi Associate Director, Risk Management Systems Oakvale Capital Limited
Jul 03, 2015
Enhancing Nagioswith Python Plugins
Maurice ManeschiAssociate Director, Risk Management Systems
Oakvale Capital Limited
Presentation Outline
● Risk Management Systems● What is Nagios● Why Python● What is a plug in● Specific Risks being monitored● Analysing reports and logs● Where to next
Risk Management Systems
● A division of five staff● Supporting three key applications● Running on eight servers● Depending on 15+ other boxes spread over 3 LANs● Five key vendors
Risk Management System
● Divisional goals
– Key goal is application management
– Some customer support
– Product innovation
– Project management
– No time for nasty surprises
What is Nagios
● Host, service, network monitoring program● Open source● Written in C● Runs on Linux and Apache
What is Nagios
● Configured with the hosts of a network
– How the hosts are networked
– What key services are on the hosts● “PING”, SMTP, HTTP etc.
● Application polls these at specified intervals
– From the results of the polls, determines the state of hosts, services and networks
– Alerts sent by email
– Escalation, reporting, statistics and more
Why Python
● Flexible● Efficient● Managable● Numerous, diverse libraries● Cross-platform● Huge number of code samples across the network
What is a plugin
● Executable file
– Takes parameters (preferable)
– Prints a short status message● Returns an exit status of
– 0 – all OK
– 1 – warning
– 2 – critical● Stateless
What is a plugin
● Executable Python script
● Code the test● Print the status line● Return a status● Easy!
Specific risks being monitored
● Customer email to the help desk system has stopped
– User issues email in directly into our help desk system for prioritisation, action and eventually billing
– Spam periodically breaks the import agent
– Its proprietary, so no fix in sight
– Nagios watches the queue using POP3
Specific risks being monitored
Specific risks being monitored
Specific risks being monitored
● Ratefeed is missing some rates
– Rates feed into our system from Reuters via MS Excel
– Some rates are critical, and human intervention is required if they are missing
– Other rates are important, but are just tracked when missing
– Nagios watches MS Excel file sheet with the “unreliable rates”
Specific risks being monitored
Specific risks being monitored
Specific risks being monitored
● Rates must be inserted regularly
– Insertion process has numerous dependencies
– Moving target – causes of failure change over time
– Focus on the end point – are the rates in the database?
– Nagios the databases and alerts to old or missing rates
Specific risks being monitored
Specific risks being monitored
Specific risks being monitored
● External source of dealing information
– Fed in through the FIX protocol
– Numerous failure points being monitored on a (Windows) server
– Monitor process must check in with Nagios every 10 minutes
– Using passive and active checks
Specific risks being monitored
Specific risks being monitored
Specific risks being monitored
● Quick passive check
Specific risks being monitored
● Successful backups● Successful scheduled tasks● Database comparisons● Common errors
– Password server on web site
– Known failure point on an MS Excel worksheet
Extra enhancements to Nagios
● High level view to systems health● Audio alerts and SMSes from UTbox.net● Status screen on monitor PC● Syslogd for firewall● Script reuse for rate checks● Ad hoc system problems
– Currently tracking WAN failures
Analysing reports and logs
● Screen saver often sufficient● Summary views
Where to next
● Low spec-ed PC● Nagios is in several distro repositories
– I compile from the source● Allow a day at least to configure Nagios
– Don't expect to install and switch it on● Tuning Nagios is an ongoing job
Further information
● Nagios: http://www.nagios.org● Python: http://www.python.org
– pyexcelerator, pymssql, freetds from Sourceforge● Oakvale Capital: http://www.oakvale.com● Code samples:
http://www.redwaratah.com/wiki/index.php?title=Nagios_and_Python● Maurice Maneschi: [email protected]