This site best when viewed with a modern standards-compliant browser. We recommend Firefox Get Firefox!.

Linux-HA project logo
Providing Open Source High-Availability Software for Linux and other OSes since 1999.

USA Flag UK Flag

Japanese Flag


About Us

Contact Us

Legal Info

How To Contribute

Security Issues

This web page is no longer maintained. Information presented here exists only to avoid breaking historical links.
The Project stays maintained, and lives on: see the Linux-HA Reference Documentation.
To get rid of this notice, you may want to browse the old wiki instead.

1 February 2010 Hearbeat 3.0.2 released see the Release Notes

18 January 2009 Pacemaker 1.0.7 released see the Release Notes

16 November 2009 LINBIT new Heartbeat Steward see the Announcement

Last site update:
2020-10-29 01:15:36

Monitoring and Administration Software for Linux

  • lm-sensors: hardware health monitoring project.

  • safte-monitor. SAFTE stands for SCSI Accessed Fault-Tolerant Enclosure device. If you have a SAF-TE compatible storage enclosure, safte-monitor will let read the enclosure configuration fetching things such as the number of fans, power supplies, slots, and also the mapping of slots to scsi ids. safte-monitor reads disk enclosure status information from SAF-TE capable enclosures (SCSI Accessible Fault Tolerant Enclosures). SAF-TE is a component of SES (SCSI Enclosure Services) which is common on most SCSI disk enclosures these days. saftemon can monitor multiple SAF-TE devices and will automatically probe and detect them. The information retreived includes power supply and fan status, temperature, audible alarm, drive faults, array critical / failed / rebuilding state and door lock status. saftemon logs changes in the status of these enclosure elements to syslog and can optionally execute an alert help program with details of the component failure. This could send a pager message for example. Temperature alert limits also be set.

    The SAF-TE spec is on the web and an addendum is at this location. More information about the specification here.

  • Linux RAS project. Ambitious. Mailing list information can be found on the web.

  • Nmon graphs lots of things for Linux and AIX machines.

  • Linux RAM ECC monitoring with a corresponding Mailing list.

  • Chris Brady's x86 Memory Testing program (memtest86). This ships with newer versions of SuSE Linux.

  • Mon: service monitoring daemon.

  • HAPM: another service monitoring daemon. High Availability Port Monitor (HAPM) is a local port status check. It is a simple, light and fast daemon to check TCP/UDP ports. If one or more monitored ports (per IP) downs then the primary Heartbeat will be killed by HAPM.

  • OpenNMS is an open-source project dedicated to the creation of an enterprise grade network management platform.

  • Spumoni enables any program which can be queried via local commands to be health-checked via SNMP. This allows admins to use enterprise-level monitoring programs such as OpenNMS, Tivoli, OpenView, MRTG and RRDTool for even non-SNMP-enabled applications.

  • Monit: Monit is a utility for monitoring daemons or similar programs running on a Unix system. It will start specified programs if they are not running and restart programs not responding.

  • VACM: VA Cluster Manager. VACM provides cluster monitoring and control at a very fundamental level.

  • PIKT: Problem Informant Killer Tool

  • NOCOL/SNIPS - system and network monitoring tool

  • Big Brother - Systems and Network monitor. It monitor both network and system information.

  • Big Sister - a real time system and network health monitoring application

  • Nagios - Network monitor (formerly Netsaint)

  • MAT - Monitoring and Administration Tool. MAT is an easy to use network enabled UNIX configuration and monitoring tool. It provides an integrated tool for many common system administration tasks, including Backups, and Replication

  • WebRAT - a web based administration tool, to administer several nodes on a network, from a central host (administration server). The main purpose of WebRAT is to administer a network with many nodes, remotely. The more the nodes on the network, the more WebRAT will seem to be irreplaceable.

  • xCAT (Extreme Cluster Administration Toolkit) - A tool kit that can be used for the deployment and administration of (primarily high-performance) Linux clusters.

  • dwatch - daemon watching program -- last updated in 2001 -- appears to be dead