
Linux-HA provides basic high-availability (failover[1]) capabilities on a wide range of platforms, supporting many thousands of mission critical sites all over the globe.
Linux-HA is used on a huge number of platforms, ranging from ARM processors through mainframes. We test extensively on ia32 platforms (8 hours to 5 days), and perform basic testing of each release on every platform supported by SUSE Linux[2]:
Beginning with release 2[3], we have extended our exhaustive testing procedure to OpenPower[4] platforms.
Linux-HA is portable to many platforms, and we treat portability bugs seriously. Patches to fix portability bugs are welcomed.
Linux-HA has no special shared disk requirements.
It supports the following data sharing configurations:
replication (DRBD[5], or application-specific)
SCSI RAID controllers supporting clustering (IBM ServeRAID[6], ICP Vortex)
The only requirements we have on shared disk is that it support mount and umount. More specifically, it does not rely on SCSI reservations (or their equivalent).
Linux-HA is highly portable, and runs on many platforms. It is best supported (and works best) on Linux - virtually any version. The build system creates RPM[7]s and Debian packages automatically, and it is also integrated into the Gentoo Linux build system, among others. Linux-HA is provided natively with SUSE Linux[2], Conectiva[8] Linux, TurboLinux[9], Debian[10], Gentoo[11] and a few other Linux distributions, and is a standard part of many Linux-based products.
It also works completely on FreeBSD, and Solaris.
Heartbeat[12] will run in whatever memory your OS and application needs plus about 3 megabytes more. Although it is very lightweight, Linux-HA locks itself and its libraries into memory.
The RPMs document this best. And, if you don't want to install some of these libraries, then many of these dependencies can be automatically eliminated by rebuilding from source. The only two slightly unusual mandatory dependencies are glib (currently 1.2), and libnet >= 1.1.
The STONITH[13] plugins create a variety of dependencies on the libraries they need - but you don't actually need most or any of them for any given installation, and Autoconf[14] will not create modules you don't have the libraries for.
Linux-HA will run on any kernel that doesn't have a major scheduler bug.
For Linux, that means basically anything but Red Hat[15] 2.4.18-2.4.20.
It has no kernel dependencies or hooks.
For the 1.x releases, the maximum number of nodes[16] is two. Version 2[3] has multi-node support (tested with 16 nodes).
Heartbeat[17] currently comes with the following adminstrative tools:
hb_standby[18] put current node[19] into standby mode
hb_takeover[20] put other node into standby mode (in 1.3.0, 1.2.1)
cl_status[21] provide status information through a command line tool.
Heartbeat[17] monitors node death, and IP connectivity through ipfail[22]. Version 2[3] includes built-in resource[23] monitoring. At the current time, many people use mon[24] to handle additional monitoring.
Linux-HA can support virtually any application that can withstand a crash and be restarted robustly every time, and which can somehow access a good copy of its state data from either machine.
People do a huge variety of things. If you want it, someone has probably already done it. It supports most applications immediately without writing any scripts. See our SuccessStories[25] page for information on how a few selected reference customers use Linux-HA.
Linux-HA provides configurable automated notification whenever resources[23] move from one machine to another, through the MailTo[26] resource agent. You can easily write your own if you don't like ours. Additionally, you can run an SNMP agent which will send out SNMP traps when nodes fail.
Linux-HA's processor usage is usually negligible, typically much less than 1 percent. If you configure ultra-fast failover[1] times (< 1 second), then this amount will go up with the required faster heartbeat rates.
This is a difficult question to answer since it depends on where you start. As a rule, good HA systems add about one "9" to your system's availability, when appropriately configured. This general rule applies to Linux-HA as well. That is, if your pre-HA clustering[27] availability was 99.9%, then the resulting availability of your system ought to be something like about 99.99%. One can improve on this through good adminstrative procedures and higher degrees of redundancy.
When properly configured, Linux-HA can detect failure in less than a second. It is fairly common that people configure a failure detection time of a few seconds.
Linux-HA version 1 does not have a monitoring GUI. Starting with release 2.0.5, Linux-HA comes with an easy to use GUI[28] for configuring, monitoring and controlling it.
You can monitor Linux-HA through SNMP using your favorite SNMP-enabled systems management tool.
Linux-HA does not provide one at this time, but you could add one easily if you felt strongly about it. It would take around 30 lines of shell script.
Since ssh[29] does such a good job, and the security implications are significant, we haven't yet been motivated to provide such a facility.
You can remotely administer nodes with ssh or Webmin[30]. Webmin has a Heartbeat[12] module.
We support rebooting supported through STONITH[13] plugins which we provide. Appropriate hardware is required.
No.
Release 2[3], Release 2 Fact Sheet[31]
| [1] | http://en.wikipedia.org/wiki/Failover |
| [2] | http://en.wikipedia.org/wiki/SUSE |
| [3] | http://www.linux-ha.org/NewHeartbeatDesign |
| [4] | http://www-1.ibm.com/servers/eserver/openpower/ |
| [5] | http://www.linux-ha.org/DRBD |
| [6] | http://www.linux-ha.org/ServeRAID |
| [7] | http://en.wikipedia.org/wiki/RPM_Package_Manager |
| [8] | http://en.wikipedia.org/wiki/Conectiva |
| [9] | http://en.wikipedia.org/wiki/Turbolinux |
| [10] | http://en.wikipedia.org/wiki/Debian |
| [11] | http://en.wikipedia.org/wiki/Gentoo_Linux |
| [12] | http://www.linux-ha.org/HeartbeatProgram |
| [13] | http://www.linux-ha.org/STONITH |
| [14] | http://en.wikipedia.org/wiki/Autoconf |
| [15] | http://en.wikipedia.org/wiki/Red_Hat_Linux |
| [16] | http://www.linux-ha.org/ClusterNode |
| [17] | http://www.linux-ha.org/Heartbeat |
| [18] | http://www.linux-ha.org/hb_standby |
| [19] | http://www.linux-ha.org/node |
| [20] | http://www.linux-ha.org/hb_takeover |
| [21] | http://www.linux-ha.org/cl_status |
| [22] | http://www.linux-ha.org/ipfail |
| [23] | http://www.linux-ha.org/resource |
| [24] | http://www.linux-ha.org/mon |
| [25] | http://www.linux-ha.org/SuccessStories |
| [26] | http://www.linux-ha.org/MailTo |
| [27] | http://en.wikipedia.org/wiki/Computer_cluster |
| [28] | http://www.linux-ha.org/GuiGuide |
| [29] | http://en.wikipedia.org/wiki/Ssh |
| [30] | http://en.wikipedia.org/wiki/Webmin |
| [31] | http://www.linux-ha.org/FactSheetv2 |
This information provided courtesy of the Linux-HA project at http://linux-ha.org/