Linux-HA Logo

(I'm braindumping, structural cleanup will follow a bit later.)

Introduction

Resources which can be in multiple states are of special interest to a Cluster Resource Manager, as they model important applications like replicated databases, drbd, the SAP CI Enqueue Server, failover firewalls et cetera very well. In particular, we consider resources which can be in two states, master/slave, primary/secondary, rw-master/ro-shadow et cetera.

These resources are an extension to ClusterResourceManager/MultipleIncarnationResources[1], and it is assumed that the reader is familiar with that page and concepts.

Basic model and notes

As an extension to the multiple incarnations, where a resource instance can be run on several nodes, we now support running them on different nodes in one of two states.

As a restriction, we do not support resources which are in more than two states. (Mostly because I simply couldn't find any meaningful scenarios of this so far or resources which support this.)

We have a limit (see below) on the number of promoted resources in active state. From this it follows that the others are shadows.

Additional resource parameters

Start-up sequence

On start, we assume that a started incarnation comes up in shadow mode first, and that we later promote it to active, and then can demote to shadow again.

(It makes no sense for a resource to come up in active mode directly, as that could too easily violate the number of maximally active incarnations until it would be demoted; none of the resources looked at does.)

Capability to become active

Whether or not a resource incarnation can be promoted to active we need to verify first, for which we need an extended status operation once more; it should tell us whether the resource is currently stopped, started/shadow, started/active, failed/something etc.

This status should tell us also

Dependency handling

We introduce a special flag to depend on the state of a resource in the rsc_to_rsc constraint, so the admin can easily depend on, for example, drbd being in active mode.

Error recovery

Superset of error recovery for multiple incarnations

Running too many ''active'' resources

Again, we have several options:

Internal split-brain

How do we handle internal split-brain[4] scenarios - the inability of one or more master or slave to talk to the other(s) -, but us (by virtue of our redundant cluster communication media) still being able to talk to them all? How is that error reported to us, and what's the recovery strategy? Do we simply stop and invalidate/disconnect the slaves (or lower priority incarnations) and continue running in degraded mode etc?

Some discussion between lge[5], AndrewBeekhof[6] and LarsMarowskyBree[7] suggests that this might be handled partially by the post notify after a monitor failure. The other part is provided by No news is good news, ie as long as we don't deliver that monitor or stop notification, it has to assume that the other side is healthy and up, and thus it is experiencing a split-brain if it can't talk to it itself. This would allow the instance/incarnations to detect a split-brain and then respond to it in a resource-specific manner.

(One example of how this might be handled in the resource agent is in the proof-of-concept code for the drbd agent.)

Complexity analysis

demote and promote are fairly straightforward, the complexity is in the evaluation of which incarnations can/cannot become active etc.

Open Issues


References

[1]http://www.linux-ha.org/ClusterResourceManager/MultipleIncarnationResources
[2]http://www.linux-ha.org/SyncMaster
[3]http://www.linux-ha.org/PolicyEngine
[4]http://www.linux-ha.org/SplitBrain
[5]http://www.linux-ha.org/lge
[6]http://www.linux-ha.org/AndrewBeekhof
[7]http://www.linux-ha.org/LarsMarowskyBree
[8]http://www.linux-ha.org/AlanRobertson


This information provided courtesy of the Linux-HA project at http://linux-ha.org/