
Heartbeat (through the LRM[1]), supports four different types of resource agents. These are:
The first two of these are based on externally defined standards, and the latter two are unique to this project.
When writing new resource agents, we typically recommend that they be written as OCF resource agents, as they are the most powerful and general of all of these types, unless one needs to operate a STONITH[2] device, in which case we recommend that you write a STONITH device handler.
The OCF specification is basically an extension of the definitions for an LSBResourceAgent[3].
OCF Resource Agents are those found in /usr/lib/ocf/resource.d/{provider} .
The OCF Spec (as it relates to ResourceAgent[4]s) can be found at: http://www.opencf.org/cgi-bin/viewcvs.cgi/specs/ra/resource-agent-api.txt?rev=HEAD[5]
Anything found in the /usr/lib/ocf/resource.d/heartbeat is provided as part of the Heartbeat package which supports them as of version 2.0.0. When creating your own agents, you are encouraged to create a new directory under /usr/lib/ocf/resource.d/ and use provider={your subdirectory name} . So, for example, if you want to name your provider dubrouski, and create a resource named serge, you would make a directory called /usr/lib/ocf/resource.d/dubrouski and name your resource script /usr/lib/ocf/resource.d/dubrouski/serge.
For convenience, many of the return codes, defaults and other OCF utility functions are available to be included by custom OCF agents from /usr/lib/heartbeat/ocf-shellfuncs
Beware: Linux-ha implementation have been somewhat extended from the OCF Specs, but none of those changes are incompatible with the OCF specification.
When writing/testing your ResourceAgent[4], you may find the ocf-tester script to be very useful. It comes in the heartbeat package.
Normal OCF Resource Agents are required to have these actions:
start - start the resource. Exit 0 when the resource is correctly running (i-e providing the service) and anything else except 7 if it failed
stop - stop the resource. Exit 0 when the resource is correctly stopped and anything else except 7 if it failed.
monitor - monitor the health of a resource. Exit 0 if the resource is running, 7 if it is stopped and anything else if it is failed. Note that the monitor script should test the state of the resource on the localhost.
meta-data - provide information about this resource as an XML snippet. Exit with 0
Note: OCF specs have strict definitions of what exit codes actions must return. We follow these specifications, and exiting with the wrong exit code will cause Heartbeat[6] to behave in ways you will likely find puzzling and annoying. In particular, Heartbeat[6] needs to distinguish a completely stopped resource from one which is in some erroneous and indeterminate state.
OCF Resource Agents should support the following action:
validate-all - validate the set of configuration parameters given in the environment, Exit with 0 if parameters are valid, 2 if not valid, 6 if resource is not configured, 5 if the software the RA is supposed to run cannot be found.
Additional requirements (not part of the OCF specs) are placed on agents that will be used for cloned[7] and multi-state[8] resources.
promote - promote the local instance of a resource to the master/primary state. Should exit 0
demote - demote the local instance of a resource to the slave/secondary state. Should exit 0
notify - used by heartbeat[9] to send the agent pre and post notification events telling the resource what is or did just take place. Must exit 0
Some actions specified in the OCF specs are not used by Heartbeat[6] at the moment
reload - reload the configuration of the resource instance without disrupting the service
recover - a variant of the start action, this should try to recover a resource locally.
In addition to having more actions, your OCF resource agent is permitted to take parameters to tell it which instance of the resource it is being asked to control, and any simple configuration parameters it might need to tell it what to do or exactly how it should be done.
These are passed in to the script as environment variables, with the special prefix OCF_RESKEY_. So, if you need to be given a parameter which the user thinks of as ip it will be passed to the script as OCF_RESKEY_ip.
The most common problems when implementing OCF ResourceAgent[4]s are:
ResourceAgent[4] DL321.pdf[10]
LSB Resource Agents are those found in /etc/init.d. Generally they are provided by the OS/distribution and in order to be used with version 2 of Heartbeat, must conform to the LSB Spec.
The LSB Spec (as it relates to init scripts) can be found at: http://refspecs.linux-foundation.org/LSB_3.2.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html[11]
Many distributions claim LSB compliance but ship with broken init scripts. The most common problems are:
NOTE: Parameters and options can not be passed to LSB Resource Agents.
Assuming some_service is configured correctly and currently not active, the following sequence will help you determine if it is LSB compatible:
/etc/init.d/some_service start ; echo "result: $?"
Did the command print result: 0 (in addition to the regular output)?
/etc/init.d/some_service status ; echo "result: $?"
Did the command print result: 0 (in addition to the regular output)?
/etc/init.d/some_service start ; echo "result: $?"
Did the command print result: 0 (in addition to the regular output)?
/etc/init.d/some_service stop ; echo "result: $?"
Did the command print result: 0 (in addition to the regular output)?
/etc/init.d/some_service status ; echo "result: $?"
Did the script indicate the service was not running?
Did the command print result: 3 (in addition to the regular output)?
/etc/init.d/some_service stop ; echo "result: $?"
Did the command print result: 0 (in addition to the regular output)?
This step is not readily testable and relies on manual inspection of the script.
The script can optionally use one of the other codes listed in the LSB spec to indicate that it is active but failed. In such a case, this tells Heartbeat that before moving the resource to another node, it should stop it on the existing one first. Making use of these extra exit codes is encouraged.
If the answer to any of the above questions is no, then the init script is not LSB compliant.
If you are using version 2 of Heartbeat and have specified crm yes, then your options at this point are to:
write an OCFResourceAgent[12] based on the existing init script
If you are using version 1 of Heartbeat or have not specified crm yes, then the script may still work as long as it follows the rules for HeartbeatResourceAgent[13]s.
The Classic Heartbeat[6] resource[14] agents (scripts) are basically LSB[15] init scripts - with slightly odd status operations. The only operations on the resource scripts which Heartbeat[16] performs are:
These operations are as follows:
Activate the given resource[14].
According to the LSB, it is never an error to start an already active resource. Exit with 0 on success, nonzero on failure. Heartbeat will only start a resource if it wants it to be running on the current machine, and status shows it's not already running. Heartbeat will never start the same resource at the same time in different nodes in the cluster.
Deactivate the given resource.
Performed when we want to make sure a resource is not running. Although there are occasions when we check to see if a resource is running before stopping it, during shutdown, we will stop all resources whether or not we think they're running.
According to the LSB[15], stopping a resource which is already stopped is always permissible. Heartbeat[16] will DEFINITELY stop resources it doesn't know is running. Stop failures can result in the machine being rebooted to clear up the error. Note that some Red Hat init scripts are not LSB-compliant and complain when trying to stop resources which are not running.
Determine running status of the given resource.
The status operation has to really report status correctly, AND, it has to print either OK or running when the resource is active, and it CANNOT print either of those when it's inactive. For the status operation, we ignore the return code.
This sounds quite odd, but it's a historical hangover for compatibility with earlier versions of Linux which didn't reliably give proper status exit codes, but they did print OK or running reliably.
Heartbeat calls the status operation in many places. We do it before starting any resource, and also (IIRC) when releasing resources.
After repeated stop failures, we will do a status on the resource. If the status reports that the resource is still running, then we will reboot the machine to make sure things are really stopped. Note that this behaviour is only with the old resource manager of v1 (haresources based) clusters. CRM/v2 clusters use stonith.
Start, stop and status operations are NEVER overlapped on a given resource on a given machine. You don't have to worry about concurrency of an operation on a resource.
Unlike an LSBResourceAgent[3], a HeartbeatResourceAgent can be passed a list of positional parameters. The parameters go before the operation name, like this:
IPaddr 10.10.10.1 start
The haresources[17] line which corresponds to this set of parameters is:
IPaddr::10.10.10.1
and invoked with the start operation.
Heartbeat[6] looks for resource[14] scripts in /etc/ha.d/resource.d and /etc/init.d.
ResourceAgent[4], HeartbeatProgram[16], haresources[17], LSBResourceAgent[3], OCFResourceAgent[12]
Are not yet documented
However, the following short description might be helpful.
STONITH resource agents are mapped from STONITH[2] devices, so to write a STONITH Resource Agent, one has to write a STONITH device driver, and the corresponding resource agent will magically appear in the system.
Because these resource agents are mapped from STONITH[2] devices, the APIs don't look very much like other resource agents.
If you have a host-reset device you wish to use, and we don't already support it, there are two basic approaches to writing a STONITH Resource Agent:
If you write a 'C' plugin, they are locked into memory and are suitable for use with cluster filesystems without danger. They are slightly harder to write than script plugins, but there are numerous worked examples in the source code.
If you write a script plugin, it is fully as functional and easy to configure as 'C' plugins, with the slight disadvantage of being risky to use with a cluster filesystem in a low-memory situation on Linux.
For now, all I can say about these is Use The Source Luke
--
| [1] | http://www.linux-ha.org/LRM |
| [2] | http://www.linux-ha.org/STONITH |
| [3] | http://www.linux-ha.org/LSBResourceAgent |
| [4] | http://www.linux-ha.org/ResourceAgent |
| [5] | http://www.opencf.org/cgi-bin/viewcvs.cgi/specs/ra/resource-agent-api.txt?rev=HEAD |
| [6] | http://www.linux-ha.org/Heartbeat |
| [7] | http://www.linux-ha.org/v2/Concepts/Clones |
| [8] | http://www.linux-ha.org/v2/Concepts/MultiState |
| [9] | http://www.linux-ha.org/heartbeat |
| [10] | http://www.linux-ha.org/_cache/ResourceAgentSpecs__DL321.pdf |
| [11] | http://refspecs.linux-foundation.org/LSB_3.2.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html |
| [12] | http://www.linux-ha.org/OCFResourceAgent |
| [13] | http://www.linux-ha.org/HeartbeatResourceAgent |
| [14] | http://www.linux-ha.org/resource |
| [15] | http://www.linux-ha.org/LSB |
| [16] | http://www.linux-ha.org/HeartbeatProgram |
| [17] | http://www.linux-ha.org/haresources |
This information provided courtesy of the Linux-HA project at http://linux-ha.org/