Complex deployments require monitoring. It starts with a shell script that pings a remote server occasionally. Then you’ve got “hey I just rebooted” emails added to init.d. You might write some tests using wget or curl or expect (or even an automation tool like selenium to verify functionality.) It can get out of hand.
Let’s start off with a list of monitoring tools. I’m not endorsing anything here, just building a list and hoping for feedback:
- Nagios – very popular open source network monitoring
- Zenoss – newer network monitoring tool. uses Zope
- Hyperic – commerical, free basic version
- OpenNMS – open source network management platform
SNMP is a standard protocol for checking network status. Everything from switches to SAN arrays can use SNMP to report their status. You can wrap JMX beans in SNMP. And you can write scripts that verify complex functionality and publish SNMP data, and then use your network monitoring tools to check status, send emails and pages, or take whatever action is needed.