Initial draft specification of introspective instance monitoring.
Specifically VM Heartbeat Monitoring via the QEMU Guest Agent. Implements-blueprint: introspective-instance-monitoring Change-Id: Ie41d92651128b41967c1118bbcdaf3656c498801 Signed-off-by: Greg Waines <greg.waines@windriver.com>
This commit is contained in:
parent
c6c09dbe3c
commit
f4abd4319c
|
@ -0,0 +1,240 @@
|
||||||
|
..
|
||||||
|
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
..
|
||||||
|
|
||||||
|
==================================
|
||||||
|
Introspective Instance Monitoring
|
||||||
|
==================================
|
||||||
|
|
||||||
|
https://blueprints.launchpad.net/masakari/+spec/introspective-instance-monitoring
|
||||||
|
|
||||||
|
Currently, Masakari instance monitoring is strictly non-intrusive black-box
|
||||||
|
type monitoring through qemu and libvirt. There are however a number of
|
||||||
|
internal instance/VM faults (kernel scheduling and IO, application health),
|
||||||
|
that if detected by Masakari, could be recovered by existing Masakari auto-recovery
|
||||||
|
mechanisms; increasing the overall availability of the instance/VM. This blueprint
|
||||||
|
introduces the capability of performing introspective instance monitoring of VMs, in
|
||||||
|
order to detect, report and optionally recover VMs from internal VM faults. Specifically,
|
||||||
|
VM Heartbeat Monitoring via the QEMU Guest Agent is introduced by this spec, in order
|
||||||
|
to indirectly detect some of these internal VM faults.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
Currently, Masakari instance monitoring is a strictly non-intrusive black-box
|
||||||
|
type monitoring through qemu and libvirt. This detects a number of faults
|
||||||
|
for which Masakari's auto-recovery mechanisms can be used to recover the
|
||||||
|
instance/VM.
|
||||||
|
|
||||||
|
However, there are a number of internal instance/VM faults not detected by
|
||||||
|
this black-box monitoring, that if detected by Masakari, could be recovered
|
||||||
|
by these same Masakari auto-recovery mechanisms. This includes faults such as
|
||||||
|
hung Guest OS, failure of the Guest OS to schedule Application process(es), failure
|
||||||
|
to route basic IO within the Guest, Application-specific process failures or data
|
||||||
|
corruption, etc. . The exact scope of the proposed monitoring of this blueprint
|
||||||
|
is described at the end of the 'Proposed change' section.
|
||||||
|
|
||||||
|
Monitoring of Internal instance/VM faults requires that the Guest VM
|
||||||
|
supports software to respond to this monitoring. In the following proposal,
|
||||||
|
the Guest VM must support the QEMU Guest Agent. Because not all VMs will support
|
||||||
|
this software, the monitoring of internal instance/VM faults, by the OpenStack Host,
|
||||||
|
must be optionally enabled per VM or per VM image.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
This blueprint introduces introspective instance monitoring; specifically, VM
|
||||||
|
Heartbeat Monitoring via the QEMU Guest Agent. Any VM Heartbeat fault will be
|
||||||
|
reported through the Masakari instance-alerter to registered API drivers
|
||||||
|
(e.g. masakari-api).
|
||||||
|
|
||||||
|
The high-level architecture for Introspective Instance Monitoring is shown below::
|
||||||
|
|
||||||
|
+--------------------+ instance +-------------+ + - - - - - - +
|
||||||
|
| instance-alerter |<------------| Masakari | | |
|
||||||
|
|- - - - - - - - - - | fault | VM | F U T U R E
|
||||||
|
| driver abstraction | | Heartbeat | | |
|
||||||
|
| layer | | Agent |
|
||||||
|
+--------------------+ +-------------+ + - - - - - - +
|
||||||
|
| | ^ ^
|
||||||
|
other <--+ | | |
|
||||||
|
apis | | +----------------+
|
||||||
|
v | |
|
||||||
|
+--------------------+ | |
|
||||||
|
| masakari-api | v v
|
||||||
|
+--------------------+ +-------------+
|
||||||
|
| | Libvirtd |
|
||||||
|
v +-------------+
|
||||||
|
+--------------------+ ^
|
||||||
|
| masakari-engine | | unix socket
|
||||||
|
+--------------------+ v
|
||||||
|
| +-------------+
|
||||||
|
| (recovery) | QEMU |
|
||||||
|
v +-------------+
|
||||||
|
+--------------------+ ^
|
||||||
|
| | |
|
||||||
|
| OpenStack | +--------------------------------------+
|
||||||
|
| | | VM | virtio serial device |
|
||||||
|
+--------------------+ | v |
|
||||||
|
| +--------------------+ |
|
||||||
|
| | QEMU | |
|
||||||
|
| | Guest Agent | |
|
||||||
|
| | ( guest-ping{} ) | |
|
||||||
|
| +--------------------+ |
|
||||||
|
| |
|
||||||
|
| +-------------+ |
|
||||||
|
| +-------------+ | |
|
||||||
|
| | | | |
|
||||||
|
| | Application | | |
|
||||||
|
| | | + |
|
||||||
|
| +-------------+ |
|
||||||
|
+--------------------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
VM Heartbeat and Healthcheck Monitoring will leverage the QEMU feature, Guest
|
||||||
|
Agent [1], for both the transport level
|
||||||
|
communication between OpenStack Host and the Guest VM, and the built-in
|
||||||
|
guest ping command (guest-ping{}). A QEMU Guest Agent
|
||||||
|
daemon, built as part of QEMU, is installed and run inside the Guest and
|
||||||
|
implements support for QMP commands that are sent to
|
||||||
|
the guest. Specifically the QEMU Guest Agent daemon
|
||||||
|
connects to a virtio-serial device (/dev/virtio-ports/org.qemu.guest_agent.0),
|
||||||
|
feeds the input to a QMP JSON parser, and when a command is received, invokes
|
||||||
|
the QAPI generated dispatch routine. In the case of VM Heartbeat Monitoring,
|
||||||
|
the QEMU Guest Agent command, 'guest-ping', will be used as the heartbeat challenge
|
||||||
|
request from the Host.
|
||||||
|
|
||||||
|
On the host, OpenStack Nova already supports an image property,
|
||||||
|
hw_qemu_guest_agent, that can be used to specify that the VM should
|
||||||
|
be created with the QEMU guest agent virto-serial-interface. The Masakari
|
||||||
|
VM Heartbeat Agent will discover VMs with hw_qemu_guest_agent enabled
|
||||||
|
by monitoring the files representing the socket identifiers for the QEMU Guest
|
||||||
|
Agents' virtual-serial-interfaces.
|
||||||
|
|
||||||
|
libvirt-qemu provides a virDomainQemuAgentCommand() for sending commands
|
||||||
|
to a selected VM's QEMU guest agent. This command opens the unix socket to
|
||||||
|
the VM's virtio-serial-interface, sends the command, waits to receive the response
|
||||||
|
and closes the socket. The command fails if the unix socket is openned by
|
||||||
|
another process, i.e. another process is sending a command to the same VM.
|
||||||
|
|
||||||
|
Masakari VM Heartbeat Agent will leverage virDomainQemuAgentCommand() provided
|
||||||
|
by libvirtd to send the heartbeat challenge requests (i.e. the QEMU Guest Agent's
|
||||||
|
guest-ping command) to the VM(s) and report any detected faults to the masakari
|
||||||
|
instance-alerter.
|
||||||
|
|
||||||
|
The Masakari VM Heartbeat Agent, on the host, will initiate VM Heartbeating as soon
|
||||||
|
as it discovers the VM has QEMU Guest Agent communication enabled. However, in order
|
||||||
|
to deal with arbitrary boot times for VMs/Guests, which may delay the Guests ability
|
||||||
|
to start responding to the heartbeat challenges, the Masakari VM Heartbeat Agent will
|
||||||
|
not enable reporting of heartbeat failures until after the first successful heartbeat
|
||||||
|
response is received from the VM/Guest.
|
||||||
|
|
||||||
|
This functionality will support a flag in masakari.conf for overall enabling/disabling of
|
||||||
|
introspective-instance-monitoring. It will also support parameters for configuring
|
||||||
|
default heartbeat period and default consecutive heartbeat miss threshold (before
|
||||||
|
declaring fault); in future, flavor extraspecs could be used for VMs to specify
|
||||||
|
specific values for these.
|
||||||
|
|
||||||
|
At a high-level, the scope of this heartbeat monitoring is that the QEMU Guest Agent
|
||||||
|
is running within the VM. However, just the fact that a Heartbeat message can get
|
||||||
|
from the Host to the QEMU Guest Agent inside the VM and back, inherently validates
|
||||||
|
that a lot of basic Guest Kernel functionality is working; i.e. the Guest OS is not
|
||||||
|
hung or failed, the QEMU heartbeat message was properly routed through basic linux
|
||||||
|
socket IO, etc. . In the future, the heartbeating can be extended to
|
||||||
|
do more than just reply/ack the message ... i.e. basic sanity / health tests on key
|
||||||
|
applications within the VM can be done.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
Could simply leverage the virtual hardware watchdog of QEMU/KVM
|
||||||
|
[2] for Instance monitoring.
|
||||||
|
|
||||||
|
However, VM Heartbeat Monitoring:
|
||||||
|
|
||||||
|
- provides notification of the Heartbeat status to higher-level cloud
|
||||||
|
entities through instance-alerter, such as Masakari, Mistral and/or Vitrage,
|
||||||
|
|
||||||
|
* which depending on the backend can result in VM auto-recovery (Masakari) or
|
||||||
|
deduced-state updates in Nova for the VM and resulting Aodh Event generation
|
||||||
|
due to the VM state change (Vitrage).
|
||||||
|
|
||||||
|
- in the future can be extended to provide a higher-level (i.e. application-level)
|
||||||
|
heartbeating
|
||||||
|
|
||||||
|
* i.e. if the Heartbeat requests are being answered by the Application running
|
||||||
|
within the VM
|
||||||
|
|
||||||
|
- in the future can be extended to provide more than just heartbeating, as the
|
||||||
|
Application can use it to trigger a variety of audits,
|
||||||
|
|
||||||
|
- in the future can be extended to provide a mechanism for the Application within the
|
||||||
|
VM to report a Health Status / Info back to the Host / Cloud.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Limitation
|
||||||
|
----------
|
||||||
|
|
||||||
|
Only VMs supporting the QEMU Guest Agent can be monitored by the functionality of
|
||||||
|
this proposal.
|
||||||
|
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
greg-waines
|
||||||
|
|
||||||
|
|
||||||
|
Milestones
|
||||||
|
----------
|
||||||
|
|
||||||
|
Target Milestone for completion:
|
||||||
|
Rocky-2
|
||||||
|
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
- Masakari VM Heartbeat Agent on the Compute
|
||||||
|
|
||||||
|
* discovery of VMs with QEMU Guest Agent communication enabled,
|
||||||
|
|
||||||
|
* high-level logic for Heartbeat / Healthcheck monitoring,
|
||||||
|
|
||||||
|
* reporting of faults to masakari instance-alerter.
|
||||||
|
|
||||||
|
- tox and/or tempest test suite updates
|
||||||
|
|
||||||
|
- masakari documentation updates
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
- requires that VMs are installed with and running the QEMU Guest Agent [1]
|
||||||
|
built as part of QEMU.
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
[1] http://wiki.qemu.org/Features/GuestAgent
|
||||||
|
|
||||||
|
[2] https://libvirt.org/formatdomain.html#elementsWatchdog
|
||||||
|
|
Loading…
Reference in New Issue