Linux-HA Logo

The algorithm

This is a rough sketch of the algorithm to be used. It is transferred from highly sophisticated tools and rigourous mathematical descriptions (aka drawings on handkerchiefs) to this mediocre format and thus will need a few rounds to settle down.

To recall, the purpose of this algorithm is to take the current cluster state, the configuration, and to determine the next steps.

The algorithm is basically divided into three phases. The first phase can be thought of as normalization of the input data. For the division between the second and third phase, keep in mind that we are solving two aspects: the location of the resources, and the ordering in which these actions are performed. By dividing of this into two distinct phases, the complexity of each is greatly reduced.

Phase 1

In phase one, we read in all the information from the ClusterInformationBase[1] XML description provided to us and convert them into our internal data structures:

However, during the building of this internal data structure, we also translate certain complex configuration directives into RISC-style primitives. This is actually a good place for PolicyPlugins[2] to be inserted!

Examples of such translations might include, but is not fully covered yet:

Maybe some of these also happen later, who knows ;-)

Phase 2

In phase two, we determine the location of the resources.

We traverse the graph according to the resource placement priority (so that a top-level resource with a weight of 100 will get it's wishes satisfied before a less important resource), following the placement dependencies, ignoring the ordering ones.

This is a graph colouring problem. We follow the locational dependencies from top to bottom, setting all yet-uncoloured resources to our current colour, and narrowing down the set of nodes associated with the colour at the same time.

At the end of this phase, all resources will have been assigned a colour, and each colour will in fact correspond to a node on which to run the resources of this colour.

However, if the current cluster state does not allow a certain subtree of the resources to be allocated, it may also be an empty set. (If all nodes are going offline, or if a high priority resource pushes off a low priority one after a failure.)

Phase 3

In phase three, we finally determine the ordering in which the operations have to be performed. Thus, everytime I say dependency in the following, I'm referring to the ordering, not to the location.

This time, we do not traverse the graph according to resource priority, but least number of dependencies. (Or should we use the resource priority once more, using the number of references as a tie-breaker only?)

We are actively building the TransitionGraph[4] in this step - we stop resources which are running where they should not be running, and start them on nodes where they should be (if any). We insert STONITH operations into the tree as needed.

Done

We write out the TransitionGraph[4] and hand it back to the CRM.

OpenIssues

Please, do discuss these issues with me by e-mail instead of the Wiki ;)

See also

TransitionGraph[4], ClusterResourceManager[6], NodeFencing[7], ClusterInformationBase[1].


References

[1]http://www.linux-ha.org/ClusterInformationBase
[2]http://www.linux-ha.org/PolicyPlugins
[3]http://www.linux-ha.org/AlanRobertson
[4]http://www.linux-ha.org/TransitionGraph
[5]http://www.linux-ha.org/ResourceInstance
[6]http://www.linux-ha.org/ClusterResourceManager
[7]http://www.linux-ha.org/NodeFencing


This information provided courtesy of the Linux-HA project at http://linux-ha.org/