
This is the summary of the discussion on linux-ha-dev[1].
The goal is to make fencing[2] as much as a regular resource[3] as that is possible. This goal is reached except for a special action which the STONITH[4] resources need to support; see below.
STONITH requests are always initiated from the DesignatedCoordinator[5].
The STONITH controllers are configured in the resources section of the CIB[6] as resources of the class stonith. All normal constraints for resource placement et cetera apply.
For sanity, a stonith-class resource may not require node fencing itself.
The STONITH device is controlled via a StonithAgent[7], which is a special resource agent running under the control of the LocalResourceManager[8]; see LocalResourceManager/FencingOperations[9].
As the STONITH controller is a regular resource internally, just of a special class, the regular node placement rules apply. This limits access to the STONITH device to the nodes which actually can do so - this will likely either be a single node for serial STONITH device or a wildcard for most network power switches.
As all requests are made through this single node, we also avoid the limitation that some network power switches only allow a single session to connect to them.
As explained on the StonithAgents[10] page, we learn which nodes a given STONITH device can control on start time.
As the STONITH controller, through which all further requests to a given STONITH device are gated, is a regular resource, it will also be subject to monitoring, and thus we can find out immediately (and not only at the time where we want to use it) that a STONITH device has become unuseable and can inform the administrator and re-allocate the STONITH controller somewhere else.
Whether or not a STONITH dependency is needed in the TransitionGraph[11] is of course decided by the PolicyEngine[12] via the resource parameters.
For regular resources, whether or not they need node-granularity fencing is controlled via the mandatory node_fencing="(yes|no)" attribute in the CIB.
The default for this attribute should be set by the GUI/administrator from the Resource Agent metadata for OCF agents (available in the CIB in the lrm_agent section), and for safety default to yes for heartbeat or lsb agents.
We need to compute the maximum set of eligible nodes for a given resource - assuming that all nodes where up right now and no other resources were running - and contrast this with the list of nodes which actually are up and healthy. Everything else needs killing.
Another scenario where a node may be STONITHed is a failed stop operation. Before we can recover the resource on another node, we must clean up by force.
Whether a failed stop operation causes the node the resource is running on to be STONITHed shall be controlled by a failstop_type=(ignore|block|stonith) attribute of either the resource or a resource depending on it.
ignore should only be used for self-fencing resources; the default must be either stonith or block for all others. As for the node_fencing attribute, the default should be retrieved from the resource agent metadata.
LarsMarowskyBree[13] still wonders what happens if a lower priority resource has stonith set, fails to stop, but a higher priority resource (not depending on the first) is happily running along on that node; if we follow the wish of the lower prio resource, we affect the service level of the higher priority resource...
Yet another scenario is that a STONITH induced reboot of a failed node may cure a intermittent fault of the node and thus reduce the MeanTimeToRepair[14] and the time we spent in a partially degraded mode. Even if no resource might actively require the node to be shot, it may still be desireable because of this.
Whether or not a potentially failed node is shot because of this shall be controlled by a global always_stonith_failed_nodes flag; whether or not a given resource has to actually wait until this succeeded is controlled via the other parameters discussed above.
If STONITH for a given node fails, we of course retry indefinetely, but in the meantime we block all resources which depend on this.
A manual override needs to be possible; the admin needs to be able to manually confirm that a given node (or set of nodes) is really down, so that the cluster can proceed.
ResourceFencing[15]
| [1] | http://www.linux-ha.org/ContactUs |
| [2] | http://www.linux-ha.org/fencing |
| [3] | http://www.linux-ha.org/resource |
| [4] | http://www.linux-ha.org/STONITH |
| [5] | http://www.linux-ha.org/DesignatedCoordinator |
| [6] | http://www.linux-ha.org/CIB |
| [7] | http://www.linux-ha.org/StonithAgent |
| [8] | http://www.linux-ha.org/LocalResourceManager |
| [9] | http://www.linux-ha.org/LocalResourceManager/FencingOperations |
| [10] | http://www.linux-ha.org/StonithAgents |
| [11] | http://www.linux-ha.org/TransitionGraph |
| [12] | http://www.linux-ha.org/PolicyEngine |
| [13] | http://www.linux-ha.org/LarsMarowskyBree |
| [14] | http://www.linux-ha.org/MeanTimeToRepair |
| [15] | http://www.linux-ha.org/ResourceFencing |
This information provided courtesy of the Linux-HA project at http://linux-ha.org/