Linux-HA Logo

Node-granularity fencing in the new ClusterResourceManager framework

This is the summary of the discussion on linux-ha-dev[1].

The goal is to make fencing[2] as much as a regular resource[3] as that is possible. This goal is reached except for a special action which the STONITH[4] resources need to support; see below.

Who initiates STONITH requests

STONITH requests are always initiated from the DesignatedCoordinator[5].

How are the STONITH controllers configured in the ClusterInformationBase

The STONITH controllers are configured in the resources section of the CIB[6] as resources of the class stonith. All normal constraints for resource placement et cetera apply.

For sanity, a stonith-class resource may not require node fencing itself.

Who owns the STONITH device

The STONITH device is controlled via a StonithAgent[7], which is a special resource agent running under the control of the LocalResourceManager[8]; see LocalResourceManager/FencingOperations[9].

As the STONITH controller is a regular resource internally, just of a special class, the regular node placement rules apply. This limits access to the STONITH device to the nodes which actually can do so - this will likely either be a single node for serial STONITH device or a wildcard for most network power switches.

As all requests are made through this single node, we also avoid the limitation that some network power switches only allow a single session to connect to them.

As explained on the StonithAgents[10] page, we learn which nodes a given STONITH device can control on start time.

Monitoring the STONITH device

As the STONITH controller, through which all further requests to a given STONITH device are gated, is a regular resource, it will also be subject to monitoring, and thus we can find out immediately (and not only at the time where we want to use it) that a STONITH device has become unuseable and can inform the administrator and re-allocate the STONITH controller somewhere else.

When to STONITH

Whether or not a STONITH dependency is needed in the TransitionGraph[11] is of course decided by the PolicyEngine[12] via the resource parameters.

How do we determine whether a given resource needs to wait/block on node fencing

For regular resources, whether or not they need node-granularity fencing is controlled via the mandatory node_fencing="(yes|no)" attribute in the CIB.

The default for this attribute should be set by the GUI/administrator from the Resource Agent metadata for OCF agents (available in the CIB in the lrm_agent section), and for safety default to yes for heartbeat or lsb agents.

Which nodes need to be STONITHed

We need to compute the maximum set of eligible nodes for a given resource - assuming that all nodes where up right now and no other resources were running - and contrast this with the list of nodes which actually are up and healthy. Everything else needs killing.

STONITH in response to stop failures

Another scenario where a node may be STONITHed is a failed stop operation. Before we can recover the resource on another node, we must clean up by force.

Whether a failed stop operation causes the node the resource is running on to be STONITHed shall be controlled by a failstop_type=(ignore|block|stonith) attribute of either the resource or a resource depending on it.

ignore should only be used for self-fencing resources; the default must be either stonith or block for all others. As for the node_fencing attribute, the default should be retrieved from the resource agent metadata.

LarsMarowskyBree[13] still wonders what happens if a lower priority resource has stonith set, fails to stop, but a higher priority resource (not depending on the first) is happily running along on that node; if we follow the wish of the lower prio resource, we affect the service level of the higher priority resource...

STONITHing failed nodes in general

Yet another scenario is that a STONITH induced reboot of a failed node may cure a intermittent fault of the node and thus reduce the MeanTimeToRepair[14] and the time we spent in a partially degraded mode. Even if no resource might actively require the node to be shot, it may still be desireable because of this.

Whether or not a potentially failed node is shot because of this shall be controlled by a global always_stonith_failed_nodes flag; whether or not a given resource has to actually wait until this succeeded is controlled via the other parameters discussed above.

How to handle STONITH failures

If STONITH for a given node fails, we of course retry indefinetely, but in the meantime we block all resources which depend on this.

A manual override needs to be possible; the admin needs to be able to manually confirm that a given node (or set of nodes) is really down, so that the cluster can proceed.

See also

ResourceFencing[15]


References

[1]http://www.linux-ha.org/ContactUs
[2]http://www.linux-ha.org/fencing
[3]http://www.linux-ha.org/resource
[4]http://www.linux-ha.org/STONITH
[5]http://www.linux-ha.org/DesignatedCoordinator
[6]http://www.linux-ha.org/CIB
[7]http://www.linux-ha.org/StonithAgent
[8]http://www.linux-ha.org/LocalResourceManager
[9]http://www.linux-ha.org/LocalResourceManager/FencingOperations
[10]http://www.linux-ha.org/StonithAgents
[11]http://www.linux-ha.org/TransitionGraph
[12]http://www.linux-ha.org/PolicyEngine
[13]http://www.linux-ha.org/LarsMarowskyBree
[14]http://www.linux-ha.org/MeanTimeToRepair
[15]http://www.linux-ha.org/ResourceFencing


This information provided courtesy of the Linux-HA project at http://linux-ha.org/