Resource Agents
From Linux-HA
A resource agent is a standardized interface for a cluster resource. In translates a standard set of operations into steps specific to the resource or application, and interprets their results as success or failure.
Resource Agents have been managed as a separate Linux-HA sub-project since their 1.0 release, which coincided with the Heartbeat 2.99 release. Previously, they were a part of the then-monolithic Heartbeat project, and had no collective name. Later, the Linux-HA Resource Agents and the RHCS Resource Agents sub-projects have been merged. The joint upstream repository is now https://github.com/ClusterLabs/resource-agents
Pacemaker supports three types of Resource Agents,
This page is about OCF Resource Agents bundled in the resource-agents package (aka cluster-agents, on Debian based distros), which you should install together with Heartbeat (or Corosync) and Pacemaker.
Supported Operations
Operations which a resource agent my perform on a resource instance include:
- start: enable or start the given resource
- stop: disable or stop the given resource
- monitor: check whether the given resource is running (and/or doing useful work), return status as running or not running
- validate-all: validate the resource's configuration
- meta-data: return information about the resource agent itself (used by GUIs and other management utilities, and documentation tools)
- some more, see OCF Resource Agents and the Pacemaker documentation for details.
Implementation
Most resource agents are coded as shell scripts. This, however, is by no means a necessity – the defined interface is language agnostic.
They are synchronous in nature. That is, you start them, and they complete some time later, and you are expected to wait for them to complete. Certain operations (notably start, stop and monitor) may take considerable time to complete. Considerable time means seconds to many minutes in some cases.
Source Code Repository
Source code for Resource Agents is being maintained in the https://github.com/ClusterLabs/resource-agents Git Repository.
Available Resource Agents (release 3.9.2, current as at 2011-10-27)
| anything | Manages an arbitrary service |
This is a generic OCF RA to manage almost anything. | |
| AoEtarget | Manages ATA-over-Ethernet (AoE) target exports |
This resource agent manages an ATA-over-Ethernet (AoE) target using vblade. It exports any block device, or file, as an AoE target using the specified Ethernet device, shelf, and slot number. | |
| apache | Manages an Apache web server instance |
This is the resource agent for the Apache web server. This resource agent operates both version 1.x and version 2.x Apache servers. The start operation ends with a loop in which monitor is repeatedly called to make sure that the server started and that it is operational. Hence, if the monitor operation does not succeed within the start operation timeout, the apache resource will end with an error status. The monitor operation by default loads the server status page which depends on the mod_status module and the corresponding configuration file (usually /etc/apache2/mod_status.conf). Make sure that the server status page works and that the access is allowed *only* from localhost (address 127.0.0.1). See the statusurl and testregex attributes for more details. See also http://httpd.apache.org/ | |
| AudibleAlarm | Emits audible beeps at a configurable interval |
Resource script for AudibleAlarm. It sets an audible alarm running by beeping at a set interval. | |
| ClusterMon | Runs crm_mon in the background, recording the cluster status to an HTML file |
This is a ClusterMon Resource Agent. It outputs current cluster status to the html. | |
| conntrackd | This resource agent manages conntrackd |
Master/Slave OCF Resource Agent for conntrackd | |
| CTDB | CTDB Resource Agent |
This resource agent manages CTDB, allowing one to use Clustered Samba in a Linux-HA/Pacemaker cluster. You need a shared filesystem (e.g. OCFS2) on which the CTDB lock will be stored. Create /etc/ctdb/nodes containing a list of private IP addresses of each node in the cluster, then configure this RA as a clone. To have CTDB manage Samba, set ctdb_manages_samba="yes". Note that this option will be deprecated in future, in favour of configuring a separate Samba resource. For more information see http://linux-ha.org/wiki/CTDB_(resource_agent) | |
| db2 | Resource Agent that manages an IBM DB2 LUW databases in Standard role as primitive or in HADR roles as master/slave configuration. Multiple partitions are supported. |
Resource Agent that manages an IBM DB2 LUW databases in Standard role as primitive or in HADR roles in master/slave configuration. Multiple partitions are supported. Standard mode: An instance including all or selected databases is made highly available. Configure each partition as a separate primitive resource. HADR mode: A single database in HADR configuration is made highly available by automating takeover operations. Configure a master / slave resource with notifications enabled and an additional monitoring operation with role "Master". In case of HADR be very deliberate in specifying intervals/timeouts. The detection of a failure including promote must complete within HADR_PEER_WINDOW. In addition to honoring requirements for crash recovery etc. for your specific database use the following relations as guidance: "monitor interval" < HADR_PEER_WINDOW - (appr 30 sec) "promote timeout" < HADR_PEER_WINDOW + (appr 20 sec) For further information and examples consult http://www.linux-ha.org/wiki/db2_(resource_agent) | |
| Delay | Waits for a defined timespan |
This script is a test resource for introducing delay. | |
| drbd | Manages a DRBD resource (deprecated) |
Deprecation warning: This agent is deprecated and may be removed from a future release. See the ocf:linbit:drbd resource agent for a supported alternative. -- This resource agent manages a Distributed Replicated Block Device (DRBD) object as a master/slave resource. DRBD is a mechanism for replicating storage; please see the documentation for setup details. | |
| Dummy | Example stateless resource agent |
This is a Dummy Resource Agent. It does absolutely nothing except keep track of whether its running or not. Its purpose in life is for testing and to serve as a template for RA writers. NB: Please pay attention to the timeouts specified in the actions section below. They should be meaningful for the kind of resource the agent manages. They should be the minimum advised timeouts, but they shouldn't/cannot cover _all_ possible resource instances. So, try to be neither overly generous nor too stingy, but moderate. The minimum timeouts should never be below 10 seconds. | |
| eDir88 | Manages a Novell eDirectory directory server |
Resource script for managing an eDirectory instance. Manages a single instance of eDirectory as an HA resource. The "multiple instances" feature or eDirectory has been added in version 8.8. This script will not work for any version of eDirectory prior to 8.8. This RA can be used to load multiple eDirectory instances on the same host. It is very strongly recommended to put eDir configuration files (as per the eDir_config_file parameter) on local storage on each node. This is necessary for this RA to be able to handle situations where the shared storage has become unavailable. If the eDir configuration file is not available, this RA will fail, and heartbeat will be unable to manage the resource. Side effects include STONITH actions, unmanageable resources, etc... Setting a high action timeout value is _very_ _strongly_ recommended. eDir with IDM can take in excess of 10 minutes to start. If heartbeat times out before eDir has had a chance to start properly, mayhem _WILL ENSUE_. The LDAP module seems to be one of the very last to start. So this script will take even longer to start on installations with IDM and LDAP if the monitoring of IDM and/or LDAP is enabled, as the start command will wait for IDM and LDAP to be available. | |
| ethmonitor | Monitors network interfaces |
Monitor the vitality of a local network interface. You may setup this RA as a clone resource to monitor the network interfaces on different nodes, with the same interface name. This is not related to the IP adress or the network on which a interface is configured. You may use this RA to move resources away from a node, which has a faulty interface or prevent moving resources to such a node. This gives you independend control of the resources, without involving cluster intercommunication. But it requires your nodes to have more than one network interface. The resource configuration requires a monitor operation, because the monitor does the main part of the work. In addition to the resource configuration, you need to configure some location contraints, based on a CIB attribute value. The name of the attribute value is configured in the 'name' option of this RA. Example constraint configuration: location loc_connected_node my_resource_grp \ rule $id="rule_loc_connected_node" -INF: ethmonitor eq 0 The ethmonitor works in 3 different modes to test the interface vitality. 1. call ip to see if the link status is up (if link is down -> error) 2. call ip an watch the RX counter (if packages come around in a certain time -> success) 3. call arping to check wether any of the IPs found in the lokal ARP cache answers an ARP REQUEST (one answer -> success) 4. return error | |
| Evmsd | Controls clustered EVMS volume management
(deprecated) |
Deprecation warning: EVMS is no longer actively maintained and should not be used. This agent is deprecated and may be removed from a future release. -- This is a Evmsd Resource Agent. | |
| EvmsSCC | Manages EVMS Shared Cluster Containers (SCCs) (deprecated) |
Deprecation warning: EVMS is no longer actively maintained and should not be used. This agent is deprecated and may be removed from a future release. -- Resource script for EVMS shared cluster container. It runs evms_activate on one node in the cluster. | |
| exportfs |
Manages NFS exports |
Exportfs uses the exportfs command to add/remove nfs exports. It does NOT manage the nfs server daemon. It depends on Linux specific NFS implementation details, so is considered not portable to other platforms yet. | |
| Filesystem | Manages filesystem mounts |
Resource script for Filesystem. It manages a Filesystem on a shared storage medium. The standard monitor operation of depth 0 (also known as probe) checks if the filesystem is mounted. If you want deeper tests, set OCF_CHECK_LEVEL to one of the following values: 10: read first 16 blocks of the device (raw read) This doesn't exercise the filesystem at all, but the device on which the filesystem lives. This is noop for non-block devices such as NFS, SMBFS, or bind mounts. 20: test if a status file can be written and read The status file must be writable by root. This is not always the case with an NFS mount, as NFS exports usually have the "root_squash" option set. In such a setup, you must either use read-only monitoring (depth=10), export with "no_root_squash" on your NFS server, or grant world write permissions on the directory where the status file is to be placed. | |
| fio | fio IO load generator |
fio is a generic I/O load generator. This RA allows start/stop of fio instances to simulate load on a cluster without configuring complex services. | |
| ICP | Manages an ICP Vortex clustered host drive |
Resource script for ICP. It Manages an ICP Vortex clustered host drive as an HA resource. | |
| ids | Manages an Informix Dynamic Server (IDS) instance |
OCF resource agent to manage an IBM Informix Dynamic Server (IDS) instance as an High-Availability resource. | |
| IPaddr | Manages virtual IPv4 addresses (portable version) |
This script manages IP alias IP addresses It can add an IP alias, or remove one. | |
| IPaddr2 | Manages virtual IPv4 addresses (Linux specific version) |
This Linux-specific resource manages IP alias IP addresses. It can add an IP alias, or remove one. In addition, it can implement Cluster Alias IP functionality if invoked as a clone resource. | |
| IPsrcaddr | Manages the preferred source address for outgoing IP packets |
Resource script for IPsrcaddr. It manages the preferred source address modification. | |
| iscsi | Manages a local iSCSI initiator and its connections to iSCSI targets |
OCF Resource Agent for iSCSI. Add (start) or remove (stop) iSCSI targets. | |
| iSCSILogicalUnit | Manages iSCSI Logical Units (LUs) |
Manages iSCSI Logical Unit. An iSCSI Logical unit is a subdivision of an SCSI Target, exported via a daemon that speaks the iSCSI protocol. | |
| iSCSITarget | iSCSI target export agent |
Manages iSCSI targets. An iSCSI target is a collection of SCSI Logical Units (LUs) exported via a daemon that speaks the iSCSI protocol. | |
| jboss | Manages a JBoss application server instance |
Resource script for Jboss. It manages a Jboss instance as an HA resource. | |
| LinuxSCSI | Enables and disables SCSI devices through the
kernel SCSI hot-plug subsystem (deprecated) |
Deprecation warning: This agent makes use of Linux SCSI hot-plug functionality which has been superseded by SCSI reservations. It is deprecated and may be removed from a future release. See the scsi2reservation and sfex agents for alternatives. -- This is a resource agent for LinuxSCSI. It manages the availability of a SCSI device from the point of view of the linux kernel. It make Linux believe the device has gone away, and it can make it come back again. | |
| LVM | Controls the availability of an LVM Volume Group |
Resource script for LVM. It manages an Linux Volume Manager volume (LVM) as an HA resource. | |
| lxc | Manages LXC containers |
Allows LXC containers to be managed by the cluster. If the container is running "init" it will also perform an orderly shutdown. It is 'assumed' that the 'init' system will do an orderly shudown if presented with a 'kill -PWR' signal. On a 'sysvinit' this would require the container to have an inittab file containing "p0::powerfail:/sbin/init 0" I have absolutly no idea how this is done with 'upstart' or 'systemd', YMMV if your container is using one of them. | |
| MailTo | Notifies recipients by email in the event of resource takeover |
This is a resource agent for MailTo. It sends email to a sysadmin whenever a takeover occurs. | |
| ManageRAID | Manages RAID devices |
Manages starting, stopping and monitoring of RAID devices which are preconfigured in /etc/conf.d/HB-ManageRAID. | |
| ManageVE | Manages an OpenVZ Virtual Environment (VE) |
This OCF compliant resource agent manages OpenVZ VEs and thus requires a proper OpenVZ installation including a recent vzctl util. | |
| mysql | Manages a MySQL database instance |
Resource script for MySQL. May manage a standalone MySQL database, a clone set with externally managed replication, or a complete master/slave replication setup. | |
| mysql-proxy | Manages a MySQL Proxy daemon |
This script manages MySQL Proxy as an OCF resource in a high-availability setup. Tested with MySQL Proxy 0.7.0 on Debian 5.0. | |
| named | Manages a named server - (not yet available in 3.9.2) |
Resource script for named (Bind) server. It manages named as an HA resource. | |
| nfsserver | Manages an NFS server |
Nfsserver helps to manage the Linux nfs server as a failover-able resource in Linux-HA. It depends on Linux specific NFS implementation details, so is considered not portable to other platforms yet. | |
| nginx | Manages an Nginx web/proxy server instance |
This is the resource agent for the Nginx web/proxy server. This resource agent does not monitor POP or IMAP servers, as we don't know how to determine meaningful status for them. The start operation ends with a loop in which monitor is repeatedly called to make sure that the server started and that it is operational. Hence, if the monitor operation does not succeed within the start operation timeout, the nginx resource will end with an error status. The default monitor operation will verify that nginx is running. The level 10 monitor operation by default will try and fetch the /nginx_status page - which is commented out in sample nginx configurations. Make sure that the /nginx_status page works and that the access is restricted to localhost (address 127.0.0.1) plus whatever places _outside the cluster_ you want to monitor the server from. See the status10url and status10regex attributes for more details. The level 20 monitor operation will perform a more complex set of tests from a configuration file. The level 30 monitor operation will run an external command to perform an arbitrary monitoring operation. | |
| oracle | Manages an Oracle Database instance |
Resource script for oracle. Manages an Oracle Database instance as an HA resource. | |
| oralsnr | Manages an Oracle TNS listener |
Resource script for Oracle Listener. It manages an Oracle Listener instance as an HA resource. | |
| pgsql | Manages a PostgreSQL database instance |
Resource script for PostgreSQL. It manages a PostgreSQL as an HA resource. | |
| pingd | Monitors connectivity to specific hosts or
IP addresses ("ping nodes") (deprecated) |
Deprecation warning: This agent is deprecated and may be removed from a future release. See the ocf:pacemaker:pingd resource agent for a supported alternative. -- This is a pingd Resource Agent. It records (in the CIB) the current number of ping nodes a node can connect to. | |
| portblock | Block and unblocks access to TCP and UDP ports |
Resource script for portblock. It is used to temporarily block ports using iptables. In addition, it may allow for faster TCP reconnects for clients on failover. Use that if there are long lived TCP connections to an HA service. This feature is enabled by setting the tickle_dir parameter and only in concert with action set to unblock. Note that the tickle ACK function is new as of version 3.0.2 and hasn't yet seen widespread use. | |
| postfix | Manages a highly available Postfix mail server instance |
This script manages Postfix as an OCF resource in a high-availability setup. | |
| proftpd | OCF Resource Agent compliant FTP script. |
This script manages Proftpd in an Active-Passive setup | |
| Pure-FTPd | Manages a Pure-FTPd FTP server instance |
This script manages Pure-FTPd in an Active-Passive setup | |
| Raid1 | Manages a software RAID1 device on shared storage |
Resource script for RAID1. It manages a software Raid1 device on a shared storage medium. | |
| Route | Manages network routes |
Enables and disables network routes. Supports host and net routes, routes via a gateway address, and routes using specific source addresses. This resource agent is useful if a node's routing table needs to be manipulated based on node role assignment. Consider the following example use case: - One cluster node serves as an IPsec tunnel endpoint. - All other nodes use the IPsec tunnel to reach hosts in a specific remote network. Then, here is how you would implement this scheme making use of the Route resource agent: - Configure an ipsec LSB resource. - Configure a cloned Route OCF resource. - Create an order constraint to ensure that ipsec is started before Route. - Create a colocation constraint between the ipsec and Route resources, to make sure no instance of your cloned Route resource is started on the tunnel endpoint itself. | |
| rsyncd | Manages an rsync daemon |
This script manages rsync daemon | |
| rsyslog | rsyslog resource agent - (not yet available in 3.9.2) |
This script manages a rsyslog instance as an HA resource. | |
| SAPDatabase | Manages any SAP database (based on Oracle, MaxDB, or DB2) |
Resource script for SAP databases. It manages a SAP database of any type as an HA resource. | |
| SAPInstance | Manages a SAP instance as an HA resource. |
Usually a SAP system consists of one database and at least one or more SAP instances (sometimes called application servers). One SAP Instance is defined by having exactly one instance profile. The instance profiles can usually be found in the directory /sapmnt/SID/profile. Each instance must be configured as it's own resource in the cluster configuration. The resource agent supports the following SAP versions: - SAP WebAS ABAP Release 6.20 - 7.30 - SAP WebAS Java Release 6.40 - 7.30 - SAP WebAS ABAP + Java Add-In Release 6.20 - 7.30 (Java is not monitored by the cluster in that case) When using a SAP Kernel 6.40 please check and implement the actions from the section "Manual postprocessing" from SAP note 995116 (http://sdn.sap.com). All operations of the SAPInstance resource agent are done by using the startup framework called SAP Management Console or sapstartsrv that was introduced with SAP kernel release 6.40. Find more information about the SAP Management Console in SAP note 1014480. Using this framework defines a clear interface for the Heartbeat cluster, how it sees the SAP system. The options for monitoring the SAP system are also much better than other methods like just watching the ps command for running processes or doing some pings to the application. sapstartsrv uses SOAP messages to request the status of running SAP processes. Therefore it can actually ask a process itself what it's status is, independent from other problems that might exist at the same time. sapstartsrv knows 4 status colours: - GREEN = everything is fine - YELLOW = something is wrong, but the service is still working - RED = the service does not work - GRAY = the service has not been started The SAPInstance resource agent will interpret GREEN and YELLOW as OK. That means that minor problems will not be reported to the Heartbeat cluster. This prevents the cluster from doing an unwanted failover. The statuses RED and GRAY are reported as NOT_RUNNING to the cluster. Depending on the status the cluster expects from the resource, it will do a restart, failover or just nothing. | |
| scsi2reservation |
scsi-2 reservation |
The scsi-2-reserve resource agent is a place holder for SCSI-2 reservation. A healthy instance of scsi-2-reserve resource, indicates the own of the specified SCSI device. This resource agent depends on the scsi_reserve from scsires package, which is Linux specific. | |
| SendArp | Broadcasts unsolicited ARP announcements |
This RA can be used _instead_ of the IPaddr2 or IPaddr RA to send gratuitous ARP for an IP address on a given interface, without adding the address to that interface. For example, if for some resaon you wanted to send gratuitous ARP for addresses managed by IPaddr2 or IPaddr on an additional interface. | |
| ServeRAID | Enables and disables shared ServeRAID merge groups |
Resource script for ServeRAID. It enables/disables shared ServeRAID merge groups. | |
| sfex | Manages exclusive access to shared storage using Shared Disk File EXclusiveness (SF-EX) |
Resource script for SF-EX. It manages a shared storage medium exclusively . | |
| slapd | Manages a Stand-alone LDAP Daemon (slapd) instance - (not yet available in 3.9.2) |
Resource script for Stand-alone LDAP Daemon (slapd). It manages a slapd instance as an OCF resource. | |
| SphinxSearchDaemon | Manages the Sphinx search daemon. |
This is a searchd Resource Agent. It manages the Sphinx Search Daemon. | |
| Squid | Manages a Squid proxy server instance |
The resource agent of Squid. This manages a Squid instance as an HA resource. | |
| Stateful | Example stateful resource agent |
This is an example resource agent that impliments two states | |
| symlink | Manages a symbolic link |
This resource agent that manages a symbolic link (symlink). It is primarily intended to manage configuration files which should be enabled or disabled based on where the resource is running, such as cron job definitions and the like. | |
| SysInfo | Records various node attributes in the CIB |
This is a SysInfo Resource Agent. It records (in the CIB) various attributes of a node Sample Linux output: arch: i686 os: Linux-2.4.26-gentoo-r14 free_swap: 1999 cpu_info: Intel(R) Celeron(R) CPU 2.40GHz cpu_speed: 4771.02 cpu_cores: 1 cpu_load: 0.00 ram_total: 513 ram_free: 117 root_free: 2.4 Sample Darwin output: arch: i386 os: Darwin-8.6.2 cpu_info: Intel Core Duo cpu_speed: 2.16 cpu_cores: 2 cpu_load: 0.18 ram_total: 2016 ram_free: 787 root_free: 13 Units: free_swap: Mb ram_*: Mb root_free: Gb cpu_speed (Linux): bogomips cpu_speed (Darwin): Ghz | |
| syslog-ng | Syslog-ng resource agent |
This script manages a syslog-ng instance as an HA resource. | |
| tomcat | Manages a Tomcat servlet environment instance |
Resource script for Tomcat. It manages a Tomcat instance as a cluster resource. | |
| VIPArip | Manages a virtual IP address through RIP2 |
Virtual IP Address by RIP2 protocol. This script manages IP alias in different subnet with quagga/ripd. It can add an IP alias, or remove one. | |
| VirtualDomain | Manages virtual domains through the libvirt virtualization framework |
Resource agent for a virtual domain (a.k.a. domU, virtual machine, virtual environment etc., depending on context) managed by libvirtd. | |
| vmware | Manages VMWare Server 2.0 virtual machines |
OCF compliant script to control vmware server 2.0 virtual machines. | |
| WAS | Manages a WebSphere Application Server instance |
Resource script for WAS. It manages a Websphere Application Server (WAS) as an HA resource. | |
| WAS6 | Manages a WebSphere Application Server 6 instance |
Resource script for WAS6. It manages a Websphere Application Server (WAS6) as an HA resource. | |
| WinPopup | Sends an SMB notification message to selected hosts |
Resource script for WinPopup. It sends WinPopups message to a sysadmin's workstation whenever a takeover occurs. | |
| Xen | Manages Xen unprivileged domains (DomUs) |
Resource Agent for the Xen Hypervisor. Manages Xen virtual machine instances by mapping cluster resource start and stop, to Xen create and shutdown, respectively. A note on names We will try to extract the name from the config file (the xmfile attribute). If you use a simple assignment statement, then you should be fine. Otherwise, if there's some python acrobacy involved such as dynamically assigning names depending on other variables, and we will try to detect this, then please set the name attribute. You should also do that if there is any chance of a pathological situation where a config file might be missing, for example if it resides on a shared storage. If all fails, we finally fall back to the instance id to preserve backward compatibility. Para-virtualized guests can also be migrated by enabling the meta_attribute allow-migrate. | |
| Xinetd | Manages an Xinetd service |
Resource script for Xinetd. It starts/stops services managed by xinetd. Note that the xinetd daemon itself must be running: we are not going to start it or stop it ourselves. Important: in case the services managed by the cluster are the only ones enabled, you should specify the -stayalive option for xinetd or it will exit on Heartbeat stop. Alternatively, you may enable some internal service such as echo. |
