Linux-HA Logo

How Do I Monitor Connectivity in Version 2

pingd, An ipfail Replacement

pingd is a replacement for ipfail that like the rest of version 2 allows for connectivity (and resource placement based on relative connectivity) to work in clusters with any number of nodes.

The role of pingd is to detect changes to a node's connectivity and ensure that updates of this information to the CIB[1] occur (at least effectively) simultaneously.

To locate your resources on the node(s) with the greatest connectivity, an admin needs to use the information placed in the CIB by pingd. This is achieved with the creation of resource location[2] constraints that reference the attribute created by pingd. See "Using pingd Output in Location Constraints" below.

Configuration Methods

There are two options for configuring pingd

  1. The first is by adding a respawn directive to ha.cf eg:

    • respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s
      See "pingd Usage Information" below for details on the meaning of the options passed to pingd.

  2. The other option is to create a clone[3] using the pingd OCFResourceAgent[4]. Since RA's are started as root, you need to add a line like "apiauth pingd uid=root" to your ha.cf[5] or - even better - add the parameter user=hacluster to your RA configuration to use the implicit apiauth directive like the respawn method does. With version 2.0.6 you also have to add parameter pidfile which points to somewhere the chosen user has write permission (e.g. /tmp/pingd-default).

Both methods need the ping nodes listed in your ha.cf[5] file. The host_list parameter of the RA can only use a subset of those, not some other hosts.

Both methods also require the addition of one-or-more colocation constraints to the CIB. See "Using pingd Output in Location Constraints" below.

The advantage of using the resource agent is that you can:

An equivalent resource for the respawn directive above would be:

<clone id="pingd-clone">
  <meta_attributes id="pingd-clone-ma">
    <attributes>
      <nvpair id="pingd-clone-1" name="globally_unique" value="false"/>
    </attributes>
  </meta_attributes>
  <primitive id="pingd-child" provider="heartbeat" class="ocf" type="pingd">
    <operations>
      <op id="pingd-child-monitor" name="monitor" interval="20s" timeout="60s" prereq="nothing"/>
      <op id="pingd-child-start" name="start" prereq="nothing"/>
    </operations>
    <instance_attributes id="pingd_inst_attr">
      <attributes>
         <nvpair id="pingd-1" name="dampen" value="5s"/>
         <nvpair id="pingd-2" name="multiplier" value="100"/>
      </attributes>
    </instance_attributes>
  </primitive>
</clone>

NOTE: Changing the attribute's location in the CIB, while possible, is discouraged. This is because you may end up with multiple copies of the attribute for each node... causing the cluster to behave differently than expected.

pingd Usage Information

usage: pingd [-V?p:a:d:s:S:h:Dm:]
        --help (-?)                     This text
        --daemonize (-D)                Run in daemon mode
        --pid-file (-p) <filename>      File in which to store the process' PID
                                        * Default=/tmp/pingd.pid
        --attr-name (-a) <string>       Name of the node attribute to set
                                        * Default=pingd
        --attr-set (-s) <string>        Name of the set in which to set the attribute
                                        * Default=cib-bootstrap-options
        --attr-section (-S) <string>    Which part of the CIB to put the attribute in
                                        * Default=status
        --ping-host (-h) <single_host_name> Monitor a subset of the ping nodes listed in ha.cf
                                            (can be specified multiple times)
        --attr-dampen (-d) <integer>        How long to wait for no further changes to occur before
                                            updating the CIB with a changed attribute
        --value-multiplier (-m) <integer>   For every connected node, add <integer> to the value set in the CIB
                                            * Default=1

Using pingd Output in Location Constraints

Example pingd Configuration

Example CIB Contents

Node

Connected Ping Nodes

default_ping_set Value

c001n01

5

500

c001n02

4

400

c001n03

5

500

Example Constraint

<rsc_location id="my_resource_connected" rsc="my_resource">
    <rule id="my_resource_connected_rule" score_attribute="default_ping_set">
       <expression id="my_resource_connected_expr_defined" attribute="default_ping_set" operation="defined"/>
    </rule>
</rsc_location>

The above constraint:

Node

Connected Ping Nodes

default_ping_set Value

Combined Score

c001n01

5

500

500

c001n02

4

400

400

c001n03

5

500

500

If we also had the following constraint:

<rsc_location id="my_resource_preferred" rsc="my_resource">
    <rule id="my_resource_prefer_c001n01" score="100">
       <expression id="my_resource_prefer_c001n01_expr" attribute="#uname" operation="eq" value="c001n01"/>
    </rule>
    <rule id="my_resource_prefer_c001n02" score="200">
       <expression id="my_resource_prefer_c001n02_expr" attribute="#uname" operation="eq" value="c001n02"/>
    </rule>
    <rule id="my_resource_prefer_c001n03" score="300">
       <expression id="my_resource_prefer_c001n03_expr" attribute="#uname" operation="eq" value="c001n03"/>
    </rule>
    <rule id="my_resource_never" score="-INFINITY" boolean_op="or">
       <expression id="my_resource_never_c001n04_expr" attribute="#uname" operation="eq" value="c001n04"/>
       <expression id="my_resource_never_c001n05_expr" attribute="#uname" operation="eq" value="c001n05"/>
    </rule>
</rsc_location>

Then the updated scores for running the resource would be:

Node

Connected Ping Nodes

default_ping_set Value

Combined Score

c001n01

5

500

600

c001n02

4

400

600

c001n03

5

500

800

At this point, if the resource was not running or v2/dtd1.0/annotated#default resource stickiness[6] was set to zero, then the resource would be started on c001n03 with c001n01 and c001n02 equally preferred as a backup.

However if the resource was running on c001n02 and resource_stickiness was set to 1000, then the updated scores would be:

Node

Connected Ping Nodes

default_ping_set Value

Combined Score

c001n01

5

500

600

c001n02

4

400

1600

c001n03

5

500

800

and the resource would be left running on c001n02.

Alternatively, if resource_stickiness was set to 100, then the scores would look like this:

Node

Connected Ping Nodes

default_ping_set Value

Combined Score

c001n01

5

500

600

c001n02

4

400

700

c001n03

5

500

800

and the resource would be moved to c001n03.

This should also adequately demonstrate the importance of correctly setting:

Quickstart - Only Run my_resource on Nodes with Access to at Least One Ping Node

Add this to ha.cf

Add this constraint to the CIB:

It is sometimes desirable to shut a particular service down if ping connectivity is lost. This rule will prohibit the service from running anywhere that there is no ping connectivity to the outside world, and all nodes with some connectivity are treated as the same, regardless of how many ping nodes are accessible.

<rsc_location id="my_resource:connected" rsc="my_resource">
  <rule id="my_resource:connected:rule" score="-INFINITY" boolean_op="or">
    <expression id="my_resource:connected:expr:undefined"
      attribute="pingd" operation="not_defined"/>
    <expression id="my_resource:connected:expr:zero"
      attribute="pingd" operation="lte" value="0"/>
  </rule>
</rsc_location>

Of course, if you have configured the pingd[7] daemon to set some attribute name besides its default (pingd), then you need to change the name of the attribute above from pingd to whatever name you have configured the pingd[7] daemon to use.

Attention: Note that this will stop the resource everywhere if the pinged node(s) indeed go down or heartbeat loses connectivity to them (firewalls et cetera). Consider using the wiki:CIB/Idioms/PingdAttrAsScore[8] instead, which instead expresses a positive preference for the node with the best connectivity.

The respawn rule for pingd from above can be used, or virtually any method of starting pingd with any non-zero --value-multiplier factor. If you have more than one ping node, you can run the resource, if not you can't. It's that simple.

Quickstart - Run my resource on the node with the best connectivity

This is probably one of the better ways to use pingd. In this method, the pingd attribute value becomes the score for the rule. So, the --value-multiplier you set will depend heavily on the scores you give other criteria. This rule will not stop a resource completely if all nodes lose connectivity to the outside world.

It is often desirable to allow the value of the attribute that pingd[7] sets directly as a the score for a particular rule.

If you set the pingd[7] scaling factor to 100, then having access to one node is worth 100, 2 nodes is worth 200, and so on.

This way, if all else is equal, the node with the highest ping connectivity will be selected. If two or more eligible nodes have the same score, then they will be given equal weight according to the rule below.

<rsc_location id="my_resource:connected" rsc="my_resource">
  <rule id="my_resource:connected:rule" score_attribute="pingd" >
    <expression id="my_resource:connected:expr:defined"
      attribute="pingd" operation="defined"/>
  </rule>
</rsc_location>

Of course, if you have configured the pingd[7] daemon to set some attribute name besides its default (pingd), then you need to change the name of the score_attribute above from pingd to whatever attribute you have configured the pingd[7] daemon to use.


References

[1]http://www.linux-ha.org/CIB
[2]http://www.linux-ha.org/v2/dtd1.0/annotated#rsc%2Blocation
[3]http://www.linux-ha.org/v2/Concepts/Clones
[4]http://www.linux-ha.org/OCFResourceAgent
[5]http://www.linux-ha.org/ha.cf
[6]http://www.linux-ha.org/v2/dtd1.0/annotated#default%2Bresource%2Bstickiness
[7]http://www.linux-ha.org/pingd
[8]http://www.linux-ha.org/CIB/Idioms/PingdAttrAsScore


This information provided courtesy of the Linux-HA project at http://linux-ha.org/