833e5946fe
A powerful metric to watch for a swift cluster is the number of handoff partitions on a drive on a storage node. A build up of handoff nodes on a particular server could indicate a disk problem somewhere in the cluster. A bottleneck somewhere. Or better, when would be a good time to rebalance the ring (as you'd want to do it when existing backend data movement is at a minimum. So it turns out to be a great visualisation of the health of a cluster. That's what this check plugin does. Each instance check takes the following values: ring: <path to a Swift ring file> devices: <path to the directory of mountpoints> granularity: <either server or device> To be able to determine primary vs handoff partitions on a drive the swift ring needs to be consulted. If a storage node stores more then 1 ring, and an instance would be defined for each. You give swift a bunch of disks. These disks are placed in what swift calls the 'devices' location. That is a directory where a mount point for each mounted swift drive is located. Finally, you can decide on the granularity, which defaults to `server` if not defined. Only 2 metrics are created from this check: swift.partitions.primary_count swift.partitions.handoff_count But with the hostname dimension a ring dimension will also be set. Allowing the graphing of the handoff vs partitions of each ring. When the granularity is set to device, then an additional dimension to the metric is added, the device name (the name of the devices mount point). This allows the graphing and monitoring of each device in a server if a finer granularity is required. Because we need to consult the Swift ring there is a runtime requirement on the Python Swift module being installed. But this isn't required for the unit tests. Making it a runtime dependency means when the check is loaded it'll log an error and then exit if it can't import the swift module. This is the second of two Swift check plugins I've been working on. For more details see my blog post[1] [1] - https://oliver.net.au/?p=358 Change-Id: Ie91add9af39f2ab0e5b575390c0c6355563c0bfc |
||
---|---|---|
.. | ||
a10_system_check.yaml.example | ||
activemq_58.yaml.example | ||
activemq.yaml.example | ||
apache.yaml.example | ||
cacti.yaml.example | ||
cadvisor_host.yaml.example | ||
cassandra.yaml.example | ||
ceph.yaml.example | ||
check_mk_local.yaml.example | ||
couch.yaml.example | ||
couchbase.yaml.example | ||
cpu.yaml | ||
crash.yaml.example | ||
directory.yaml.example | ||
disk.yaml | ||
docker.yaml.example | ||
elastic.yaml.example | ||
file_size.yaml.example | ||
gearmand.yaml.example | ||
gunicorn.yaml.example | ||
haproxy.yaml.example | ||
hdfs.yaml.example | ||
host_alive.yaml.example | ||
http_check.yaml.example | ||
http_metrics.yaml.example | ||
iis.yaml.example | ||
jenkins.yaml.example | ||
jmx.yaml.example | ||
json_plugin.yaml.example | ||
kafka_consumer.yaml.example | ||
kibana.yaml.example | ||
kubernetes_api.yaml.example | ||
kubernetes.yaml.example | ||
kyototycoon.yaml.example | ||
libvirt.yaml.example | ||
lighttpd.yaml.example | ||
load.yaml | ||
lxc.yaml.example | ||
mcache.yaml.example | ||
memory.yaml | ||
mk_livestatus.yaml.example | ||
mongo.yaml.example | ||
mysql.yaml.example | ||
nagios_wrapper.yaml.example | ||
network.yaml | ||
nginx.yaml.example | ||
ntp.yaml.example | ||
ovs.yaml.example | ||
postfix.yaml.example | ||
postgres.yaml.example | ||
process.yaml.example | ||
prometheus.yaml.example | ||
rabbitmq.yaml.example | ||
redisdb.yaml.example | ||
riak.yaml.example | ||
solidfire.yaml.example | ||
solr.yaml.example | ||
sqlserver.yaml.example | ||
supervisord.yaml.example | ||
swift_diags.yaml.example | ||
swift_handoffs.yaml.example | ||
swift_recon.yaml.example | ||
tcp_check.yaml.example | ||
tomcat.yaml.example | ||
varnish.yaml.example | ||
vcenter_slim.yaml.example | ||
vcenter.yaml | ||
vertica.yaml.example | ||
win32_event_log.yaml.example | ||
wmi_check.yaml.example | ||
zk.yaml.example |