masakari-specs/specs/xena/approved/host-monitor-by-consul.rst

4.4 KiB

host monitor by consul

https://blueprints.launchpad.net/masakari-monitors/+spec/host-monitor-by-consul

Problem description

Usually, there are management network, tenant network and storage network in one cloud platform. The compute nodes may have management, tenant and storage interface to connect to these three networks.

Currently, Masakari host monitor uses pacemaker and pacemaker-remote to monitor hosts' connection. Actually, it is monitoring the hosts heartbeat through management interface. Once a host's management connectivity is detected down, it will send notification to masakari to trigger host failure recovery workflow.

This solution has some flaws especially when management connectivity is down and the other two connectivity tenant and storage are up. Users can still access their VMs without any interruptions, so there is no need to send notification in this case.

Proposed change

This spec introduces a new host monitor. Specifically, host connectivity monitoring via management, tenant and storage interfaces by consul agent.

The low-level architecture for host monitoring is shown as below:

image

Each host runs three consul agents, which respectively bind management, tenant and storage interfaces. They make up three independent consul cluster.

Consul agent runs in server mode on controller nodes, while in client mode on compute nodes.

For example, consul cluster via management connectivity.

All agents bind management interface, and are responsible for running checks and keeping services in sync.

Consul is built on top of Serf which provides a full gossip protocol that is used for multiple purposes. Serf provides membership, failure detection, and event broadcast. Consul use gossip protocol to manage membership. If one agent is found disconnected, it will broadcast messages to the cluster quickly.

Host-monitor periodically retreives all consul members heath data from local consul agents. It picks out every nodes' managent, tenant and storage health state separately, and combine them together.

Then It will send notification depending on defined host states as listed below in the table:

management

tenant

storage

actions

up

up

down

recovery

up

down

down

recovery

down

up

down

recovery

down

down

down

recovery

  • 'up' represents connectivity up.
  • 'down' represents connectivity down.
  • 'recovery' represents host recovery.

Alternatives

None

Data model impact

None

REST API impact

None

Security impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

None

Other deployer impact

None

Developer impact

None

Implementation

Assignee(s)

Primary assignee:

Work Items

  • Masakari host-monitor driver based on consul
  • masakari documentation updates

Dependencies

  • Requires that consul agents are installed and running to monitor the hosts management, tenant and storage connectivity.

Testing

Unit tests will be needed.

Documentation Impact

The admin configuration documentation need to be updated.

References

https://www.consul.io/

History

Revisions
Release Name Description
Victoria Introduced
Xena Re-proposed