OCF Resource Agents

Background
The OCF specification is basically an extension of the definitions for an LSB Resource Agents.

OCF Resource Agents are those found in /usr/lib/ocf/resource.d/provider

The OCF Spec (as it relates to Resource Agents) can be found at http://www.opencf.org/cgi-bin/viewcvs.cgi/specs/ra/resource-agent-api.txt?rev=HEAD

A comprehensive documentation is here OCF Resource Agent Developer's Guide.

Writing your own OCF Resource Agent mini Howto
Anything found in the /usr/lib/ocf/resource.d/heartbeat is provided as part of the resource-agents (resp. cluster-agents) package, which you should install together with Heartbeat and Pacemaker. When creating your own agents, you are encouraged to create a new directory under /usr/lib/ocf/resource.d/ and use provider={your subdirectory name}. So, for example, if you want to name your provider dubrouski, and create a resource named serge, you would make a directory called /usr/lib/ocf/resource.d/dubrouski and name your resource script /usr/lib/ocf/resource.d/dubrouski/serge.

For convenience, many of the return codes, defaults and other OCF utility functions are available to be included by custom OCF agents from /usr/lib/heartbeat/ocf-shellfuncs</tt>

Beware: Linux-ha implementation have been somewhat extended from the OCF Specs, but none of those changes are incompatible with the OCF specification.

When writing/testing your OCF Resource Agent, you may find the ocf-tester script to be very useful. It comes in the resource-agents package (resp. cluster-agents, on Debian based distros).

Actions
Normal OCF Resource Agents are required to have these actions:
 * start - start the resource. Exit 0 when the resource is correctly running (i-e providing the service) and anything else except 7 if it failed
 * stop - stop the resource. Exit 0 when the resource is correctly stopped and anything else except 7 if it failed.
 * monitor - monitor the health of a resource. Exit 0 if the resource is running, 7 if it is stopped and anything else if it is failed. Note that the monitor script should test the state of the resource on the localhost.
 * meta-data - provide information about this resource as an XML snippet. Exit with 0

Note: OCF specs have strict definitions of what exit codes actions must return. We follow these specifications, and exiting with the wrong exit code will cause the cluster to behave in ways you will likely find puzzling and annoying. In particular, the cluster needs to distinguish a completely stopped resource from one which is in some erroneous and indeterminate state.

OCF Resource Agents should support the following action:
 * validate-all - validate the set of configuration parameters given in the environment, exit with 0 if parameters are valid, 2 if not valid, 6 if resource is not configured, 5 if the software the RA is supposed to run cannot be found.

Additional requirements (not part of the OCF specs) are placed on agents that will be used for cloned and multi-state resources.
 * promote - promote the local instance of a resource to the master/primary state. Should exit 0
 * demote - demote the local instance of a resource to the slave/secondary state. Should exit 0
 * notify - used by the cluster to send the agent pre and post notification events telling the resource what is or did just take place. Must exit 0

Optional actions, for usage details see the Pacemaker documentation
 * reload - reload the configuration (non-unique parameters only) of the resource instance without disrupting the service
 * migrate_from / migrate_to - perform live migration of a resource
 * recover - a variant of the start action, this should try to recover a resource locally (currently not used by Pacemaker).

Parameters
In addition to having more actions, your OCF resource agent is permitted to take parameters to tell it which instance of the resource it is being asked to control, and any simple configuration parameters it might need to tell it what to do or exactly how it should be done.

These are passed in to the script as environment variables, with the special prefix OCF_RESKEY_</tt>. So, if you need to be given a parameter which the user thinks of as ip it will be passed to the script as OCF_RESKEY_ip</tt>.

Debugging your OCF Resource Agent
The most common problems when implementing OCF Resource Agents are:
 * Not implementing the monitor operation at all
 * Not observing the correct exit status codes for start/stop/monitor actions
 * Starting a started resource returns an error (this violates the OCF spec)
 * Stopping a stopped resource returns an error (this violates the OCF spec)
 * returning 0 for a start/stop action when the resource is not yet completely started/stopped.
 * returning early from start/stop
 * Invalid XML output for the meta-data command