Ciblint
From Linux-HA
Contents |
ciblint
The ciblint program examines your CIB in detail, looking for inconsistencies, possible errors, and things you might not have noticed. When it finds them, it prints them out. Not everything it finds is an error, but is probably worth investigating and making sure you understand. The version below works with current versions of Pacemaker.
Source to ciblint
media:Ciblint.gz
You may have to change the value of the HA_LIBHBDIR constant in the code. This will eventually get fixed when this gets put under source control somewhere.
ciblint usage message
usage: ./ciblint [-C] -f cib-file
./ciblint [-C] -L
./ciblint [-w] (-A|--list-meta_attributes-config-options
./ciblint [-w] (-l|--list-crm-config-options)
./ciblint [-w] -h --help
-f cib-filename analyze CIB from this XML file
-L --live-cib analyze live CIB gotten via cibadmin -Q
-C --ignore-non-defaults don't print messages for non-default crm_config values
-w --wiki-format print usage or crm-config-options in wiki format
-l --list-crm_config-options print all valid names for use in <nvpair> sections
inside the <crm_config> section
-A --list-meta_attributes-config-options
print all valid names for use in <nvpair> sections
inside <meta_attributes> sections
CIB file can either have a status section or not. Either is acceptable.
This program is a work-in-progress, but for many CIBs it's probably useful now.
It currently looks for a number of classes of possible errors, including these:
- Non-unique 'id' strings for the given <tag>
- 'id' strings (outside the status section) which are not globally unique
- Incorrect <nvpair> 'name's or 'value's
- Duplicate <nvpair> 'name's in a list
- Incorrect XML attribute names or values
- references to non-existent resources
- references to non-existent nodes
- invalid values for the data type (integer, boolean, enum, etc.) involved
- resources you're not monitoring
- non-negative values for default-resource-failure-stickiness
- STONITH not enabled
- No STONITH resources configured
- Use of for-testing-only ssh or external/ssh STONITH resource agents
- validation of <nvpair> names and values in <meta_attributes> sections
- validation of class and type for <primitive> resources
- validation of <attributes> names and values in <nvpair>s for <primitive> resources
- check if clone form resource names are used for non-clone resources
- ensure that clone form resource names have integral clone numbers
- check for ids with ":" characters in them in <primitive>, <group>, <clone>, or <master_slave> tags
- check for resources with non-zero failcounts
- check for <rule> tags with both score and score_attribute
- check for missing values and attributes in expressions
- make sure attributes mentioned in the CIB are defined somewhere in the cluster
More documentation can be found online at http://linux-ha.org/ciblint
The section above is the result of running ciblint -w -h
Special Notes
For ciblint to do the most checking, it will need to run some harmless lrmadmin commands as root. Currently, it will attempt to do that using sudo - which may result in you getting some password: prompts.
Sample Output
INFO: CIB has non-default value for expected-quorum-votes [3]. Default value is [2]
Explanation of expected-quorum-votes option: The number of nodes
expected to be in the cluster
Used to calculate quorum in openais based clusters.
INFO: CIB has non-default value for startup-fencing [false]. Default value is [true]
Explanation of startup-fencing option: STONITH unseen nodes
Advanced Use Only! Not using the default is very unsafe!
INFO: CIB has non-default value for pe-input-series-max [5000]. Default value is [-1]
Explanation of pe-input-series-max option: The number of other
PE inputs to save
Zero to disable, -1 to store unlimited.
INFO: CIB has non-default value for dc-deadtime [5s]. Default value is [60s]
Explanation of dc-deadtime option: How long to wait for a
response from other nodes during startup.
The "correct" value will depend on the speed/load of your
network and the type of switches used.
INFO: CIB has non-default value for no-quorum-policy [ignore]. Default value is [stop]
Explanation of no-quorum-policy option: What to do when the
cluster does not have quorum
What to do when the cluster does not have quorum Allowed values:INFO: CIB has non-default value for expected-quorum-votes [3]. Default value is [2]
Explanation of expected-quorum-votes option: The number of nodes
expected to be in the cluster
Used to calculate quorum in openais based clusters.
INFO: CIB has non-default value for startup-fencing [false]. Default value is [true]
Explanation of startup-fencing option: STONITH unseen nodes
Advanced Use Only! Not using the default is very unsafe!
INFO: CIB has non-default value for pe-input-series-max [5000]. Default value is [-1]
Explanation of pe-input-series-max option: The number of other
PE inputs to save
Zero to disable, -1 to store unlimited.
INFO: CIB has non-default value for dc-deadtime [5s]. Default value is [60s]
Explanation of dc-deadtime option: How long to wait for a
response from other nodes during startup.
The "correct" value will depend on the speed/load of your
network and the type of switches used.
INFO: CIB has non-default value for no-quorum-policy [ignore]. Default value is [stop]
Explanation of no-quorum-policy option: What to do when the
cluster does not have quorum
What to do when the cluster does not have quorum Allowed values:
stop, freeze, ignore, suicide
INFO: CIB has non-default value for cluster-recheck-interval [15m]. Default value is [15min]
Explanation of cluster-recheck-interval option: Polling interval
for time based changes to options, resource parameters and
constraints.
The Cluster is primarily event driven, however the configuration
can have elements that change based on time. To ensure these
changes take effect, we can optionally poll the cluster's status
for changes. Allowed values: Zero disables polling. Positive
values are an interval in seconds (unless other SI units are
specified. eg. 5min)
INFO: CIB has non-default value for batch-limit [10]. Default value is [30]
Explanation of batch-limit option: The number of jobs that the
TE is allowed to execute in parallel
The "correct" value will depend on the speed and load of your
network and cluster nodes.
WARNING: external/ssh STONITH resource NOT approved for production
INFO: Resource vip1 running on node xen-e
INFO: Resource ping-1:0 running on node xen-e
WARNING: Resource stateful-1:0 not running anywhere.
INFO: Resource ping-1:2 running on node xen-d
INFO: Resource app1 running on node xen-d
INFO: Resource d2 running on node xen-d
INFO: Resource migrator running on node xen-d
INFO: Resource ping-1:1 running on node xen-f
WARNING: Resource FencingChild not running anywhere.
INFO: Resource stateful-1:1 running on node xen-f
INFO: Resource stateful-1:2 running on node xen-d
INFO: Resource d1 running on node xen-d
stop, freeze, ignore, suicide
INFO: CIB has non-default value for cluster-recheck-interval [15m]. Default value is [15min]
Explanation of cluster-recheck-interval option: Polling interval
for time based changes to options, resource parameters and
constraints.
The Cluster is primarily event driven, however the configuration
can have elements that change based on time. To ensure these
changes take effect, we can optionally poll the cluster's status
for changes. Allowed values: Zero disables polling. Positive
values are an interval in seconds (unless other SI units are
specified. eg. 5min)
INFO: CIB has non-default value for batch-limit [10]. Default value is [30]
Explanation of batch-limit option: The number of jobs that the
TE is allowed to execute in parallel
The "correct" value will depend on the speed and load of your
network and cluster nodes.
WARNING: external/ssh STONITH resource NOT approved for production
INFO: Resource vip1 running on node xen-e
INFO: Resource ping-1:0 running on node xen-e
WARNING: Resource stateful-1:0 not running anywhere.
INFO: Resource ping-1:2 running on node xen-d
INFO: Resource app1 running on node xen-d
INFO: Resource d2 running on node xen-d
INFO: Resource migrator running on node xen-d
INFO: Resource ping-1:1 running on node xen-f
WARNING: Resource FencingChild not running anywhere.
INFO: Resource stateful-1:1 running on node xen-f
INFO: Resource stateful-1:2 running on node xen-d
INFO: Resource d1 running on node xen-d
