This site best when viewed with a modern standards-compliant browser. We recommend Firefox Get Firefox!.

Linux-HA project logo
Providing Open Source High-Availability Software for Linux and other OSes since 1999.

USA Flag UK Flag

Japanese Flag

Homepage

About Us

Contact Us

Legal Info

How To Contribute

Security Issues

This web page is no longer maintained. Information presented here exists only to avoid breaking historical links.
The Project stays maintained, and lives on: see the Linux-HA Reference Documentation.

1 February 2010 Hearbeat 3.0.2 released see the Release Notes

18 January 2009 Pacemaker 1.0.7 released see the Release Notes

16 November 2009 LINBIT new Heartbeat Steward see the Announcement

Last site update:
2010-03-16 06:07:20

Resource Agent Specifications

Contents

  1. Resource Agent Specifications
    1. Types of Resource Agents
    2. OCF Resource Agent Specifications
    3. Writing your own OCF Resource Agent
      1. Actions
      2. Parameters
      3. Debugging your OCF Resource Agent
      4. See Also
    4. LSB Resource Agent Specifications
      1. Background
      2. Init Script (LSB) Compatibility Checks
      3. See Also
    5. R1 Heartbeat Resource Agent Specifications
  2. Heartbeat Resource Agents
    1. start operation
    2. stop operation
    3. status operation
  3. Concurrency
  4. Parameters
  5. Location
    1. See Also
    2. STONITH Resource Agent Specifications

Types of Resource Agents

Heartbeat (through the LRM), supports four different types of resource agents. These are:

  • OCF (Open Cluster Framework) Resource Agents
  • LSB (Linux Standards Base) init scripts
  • R1 Heartbeat resource agents
  • STONITH resource agents

The first two of these are based on externally defined standards, and the latter two are unique to this project.

When writing new resource agents, we typically recommend that they be written as OCF resource agents, as they are the most powerful and general of all of these types, unless one needs to operate a STONITH device, in which case we recommend that you write a STONITH device handler.

OCF Resource Agent Specifications

The OCF specification is basically an extension of the definitions for an LSBResourceAgent.

OCF Resource Agents are those found in /usr/lib/ocf/resource.d/{provider} .

The OCF Spec (as it relates to ResourceAgents) can be found at: http://www.opencf.org/cgi-bin/viewcvs.cgi/specs/ra/resource-agent-api.txt?rev=HEAD

Writing your own OCF Resource Agent

Anything found in the /usr/lib/ocf/resource.d/heartbeat is provided as part of the Heartbeat package which supports them as of version 2.0.0. When creating your own agents, you are encouraged to create a new directory under /usr/lib/ocf/resource.d/ and use provider={your subdirectory name} . So, for example, if you want to name your provider dubrouski, and create a resource named serge, you would make a directory called /usr/lib/ocf/resource.d/dubrouski and name your resource script /usr/lib/ocf/resource.d/dubrouski/serge.

For convenience, many of the return codes, defaults and other OCF utility functions are available to be included by custom OCF agents from /usr/lib/heartbeat/ocf-shellfuncs

Beware: Linux-ha implementation have been somewhat extended from the OCF Specs, but none of those changes are incompatible with the OCF specification.

When writing/testing your ResourceAgent, you may find the ocf-tester script to be very useful. It comes in the heartbeat package.

Actions

Normal OCF Resource Agents are required to have these actions:

  • start - start the resource. Exit 0 when the resource is correctly running (i-e providing the service) and anything else except 7 if it failed

  • stop - stop the resource. Exit 0 when the resource is correctly stopped and anything else except 7 if it failed.

  • monitor - monitor the health of a resource. Exit 0 if the resource is running, 7 if it is stopped and anything else if it is failed. Note that the monitor script should test the state of the resource on the localhost.

  • meta-data - provide information about this resource as an XML snippet. Exit with 0

Note: OCF specs have strict definitions of what exit codes actions must return. We follow these specifications, and exiting with the wrong exit code will cause Heartbeat to behave in ways you will likely find puzzling and annoying. In particular, Heartbeat needs to distinguish a completely stopped resource from one which is in some erroneous and indeterminate state.

OCF Resource Agents should support the following action:

  • validate-all - validate the set of configuration parameters given in the environment, Exit with 0 if parameters are valid, 2 if not valid, 6 if resource is not configured, 5 if the software the RA is supposed to run cannot be found.

Additional requirements (not part of the OCF specs) are placed on agents that will be used for cloned and multi-state resources.

  • promote - promote the local instance of a resource to the master/primary state. Should exit 0

  • demote - demote the local instance of a resource to the slave/secondary state. Should exit 0

  • notify - used by heartbeat to send the agent pre and post notification events telling the resource what is or did just take place. Must exit 0

Some actions specified in the OCF specs are not used by Heartbeat at the moment

  • reload - reload the configuration of the resource instance without disrupting the service

  • recover - a variant of the start action, this should try to recover a resource locally.

Parameters

In addition to having more actions, your OCF resource agent is permitted to take parameters to tell it which instance of the resource it is being asked to control, and any simple configuration parameters it might need to tell it what to do or exactly how it should be done.

These are passed in to the script as environment variables, with the special prefix OCF_RESKEY_. So, if you need to be given a parameter which the user thinks of as ip it will be passed to the script as OCF_RESKEY_ip.

Debugging your OCF Resource Agent

The most common problems when implementing OCF ResourceAgents are:

  • Not implementing the monitor operation at all
  • Not observing the correct exit status codes for start/stop/status actions
  • Starting a started resource returns an error (this violates the OCF spec)
  • Stopping a stopped resource returns an error (this violates the OCF spec)
  • returning 0 for a start/stop action when the resource is not completely started/stopped.
  • Invalid XML output for the meta-data command

See Also

ResourceAgent DL321.pdf

LSB Resource Agent Specifications

Background

LSB Resource Agents are those found in /etc/init.d. Generally they are provided by the OS/distribution and in order to be used with version 2 of Heartbeat, must conform to the LSB Spec.

The LSB Spec (as it relates to init scripts) can be found at: http://refspecs.linux-foundation.org/LSB_3.2.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

Many distributions claim LSB compliance but ship with broken init scripts. The most common problems are:

  • Not implementing the status operation at all
  • Not observing the correct exit status codes for start/stop/status actions
  • Starting a started resource returns an error (this violates the LSB spec)
  • Stopping a stopped resource returns an error (this violates the LSB spec)

NOTE: Parameters and options can not be passed to LSB Resource Agents.

Init Script (LSB) Compatibility Checks

Assuming some_service is configured correctly and currently not active, the following sequence will help you determine if it is LSB compatible:

  1. Start (stopped)
    • /etc/init.d/some_service start ; echo "result: $?"

    • Did the service start?
    • Did the command print result: 0 (in addition to the regular output)?

  2. Status (running)
    • /etc/init.d/some_service status ; echo "result: $?"

    • Did the script accept the command?
    • Did the script indicate the service was running?
    • Did the command print result: 0 (in addition to the regular output)?

  3. Start (running)
    • /etc/init.d/some_service start ; echo "result: $?"

    • Is the service still running?
    • Did the command print result: 0 (in addition to the regular output)?

  4. Stop (running)
    • /etc/init.d/some_service stop ; echo "result: $?"

    • Was the service stopped?
    • Did the command print result: 0 (in addition to the regular output)?

  5. Status (stopped)
    • /etc/init.d/some_service status ; echo "result: $?"

    • Did the script accept the command?
    • Did the script indicate the service was not running?

    • Did the command print result: 3 (in addition to the regular output)?

  6. Stop (stopped)
    • /etc/init.d/some_service stop ; echo "result: $?"

    • Is the service still stopped?
    • Did the command print result: 0 (in addition to the regular output)?

  7. Status (failed)
    • This step is not readily testable and relies on manual inspection of the script.
      The script can optionally use one of the other codes listed in the LSB spec to indicate that it is active but failed. In such a case, this tells Heartbeat that before moving the resource to another node, it should stop it on the existing one first. Making use of these extra exit codes is encouraged.

If the answer to any of the above questions is no, then the init script is not LSB compliant.

If you are using version 2 of Heartbeat and have specified crm yes, then your options at this point are to:

  1. fix the init script
  2. write an OCFResourceAgent based on the existing init script

If you are using version 1 of Heartbeat or have not specified crm yes, then the script may still work as long as it follows the rules for HeartbeatResourceAgents.

See Also

ResourceAgent

R1 Heartbeat Resource Agent Specifications

Heartbeat Resource Agents

The Classic Heartbeat resource agents (scripts) are basically LSB init scripts - with slightly odd status operations. The only operations on the resource scripts which Heartbeat performs are:

  • start
  • stop
  • status

These operations are as follows:

start operation

Activate the given resource.

According to the LSB, it is never an error to start an already active resource. Exit with 0 on success, nonzero on failure. Heartbeat will only start a resource if it wants it to be running on the current machine, and status shows it's not already running. Heartbeat will never start the same resource at the same time in different nodes in the cluster.

stop operation

Deactivate the given resource.

Performed when we want to make sure a resource is not running. Although there are occasions when we check to see if a resource is running before stopping it, during shutdown, we will stop all resources whether or not we think they're running.

According to the LSB, stopping a resource which is already stopped is always permissible. Heartbeat will DEFINITELY stop resources it doesn't know is running. Stop failures can result in the machine being rebooted to clear up the error. Note that some Red Hat init scripts are not LSB-compliant and complain when trying to stop resources which are not running.

status operation

Determine running status of the given resource.

The status operation has to really report status correctly, AND, it has to print either OK or running when the resource is active, and it CANNOT print either of those when it's inactive. For the status operation, we ignore the return code.

This sounds quite odd, but it's a historical hangover for compatibility with earlier versions of Linux which didn't reliably give proper status exit codes, but they did print OK or running reliably.

Heartbeat calls the status operation in many places. We do it before starting any resource, and also (IIRC) when releasing resources.

After repeated stop failures, we will do a status on the resource. If the status reports that the resource is still running, then we will reboot the machine to make sure things are really stopped. Note that this behaviour is only with the old resource manager of v1 (haresources based) clusters. CRM/v2 clusters use stonith.

Concurrency

Start, stop and status operations are NEVER overlapped on a given resource on a given machine. You don't have to worry about concurrency of an operation on a resource.

Parameters

Unlike an LSBResourceAgent, a HeartbeatResourceAgent can be passed a list of positional parameters. The parameters go before the operation name, like this:

IPaddr 10.10.10.1 start

The haresources line which corresponds to this set of parameters is:

IPaddr::10.10.10.1

and invoked with the start operation.

Location

Heartbeat looks for resource scripts in /etc/ha.d/resource.d and /etc/init.d.

See Also

ResourceAgent, HeartbeatProgram, haresources, LSBResourceAgent, OCFResourceAgent

STONITH Resource Agent Specifications

Are not yet documented :-(

However, the following short description might be helpful.

STONITH resource agents are mapped from STONITH devices, so to write a STONITH Resource Agent, one has to write a STONITH device driver, and the corresponding resource agent will magically appear in the system.

Because these resource agents are mapped from STONITH devices, the APIs don't look very much like other resource agents.

If you have a host-reset device you wish to use, and we don't already support it, there are two basic approaches to writing a STONITH Resource Agent:

  • 'C' STONITH plugin
  • Script ("external") plugin

If you write a 'C' plugin, they are locked into memory and are suitable for use with cluster filesystems without danger. They are slightly harder to write than script plugins, but there are numerous worked examples in the source code.

If you write a script plugin, it is fully as functional and easy to configure as 'C' plugins, with the slight disadvantage of being risky to use with a cluster filesystem in a low-memory situation on Linux.

For now, all I can say about these is Use The Source Luke ;-) -- :-(