Browse Source

Initial draft specification of introspective instance monitoring.

Specifically VM Heartbeat Monitoring via the QEMU Guest Agent.

Implements-blueprint: introspective-instance-monitoring
Change-Id: Ie41d92651128b41967c1118bbcdaf3656c498801
Signed-off-by: Greg Waines <greg.waines@windriver.com>
Greg Waines 1 year ago
parent
commit
f4abd4319c
1 changed files with 240 additions and 0 deletions
  1. 240
    0
      specs/rocky/approved/introspective-instance-monitoring.rst

+ 240
- 0
specs/rocky/approved/introspective-instance-monitoring.rst View File

@@ -0,0 +1,240 @@
1
+..
2
+
3
+This work is licensed under a Creative Commons Attribution 3.0 Unported License.
4
+http://creativecommons.org/licenses/by/3.0/legalcode
5
+
6
+..
7
+
8
+==================================
9
+ Introspective Instance Monitoring
10
+==================================
11
+
12
+https://blueprints.launchpad.net/masakari/+spec/introspective-instance-monitoring
13
+
14
+Currently, Masakari instance monitoring is strictly non-intrusive black-box
15
+type monitoring through qemu and libvirt.  There are however a number of
16
+internal instance/VM faults (kernel scheduling and IO, application health),
17
+that if detected by Masakari, could be recovered by existing Masakari auto-recovery
18
+mechanisms; increasing the overall availability of the instance/VM.  This blueprint
19
+introduces the capability of performing introspective instance monitoring of VMs, in
20
+order to detect, report and optionally recover VMs from internal VM faults.  Specifically,
21
+VM Heartbeat Monitoring via the QEMU Guest Agent is introduced by this spec, in order
22
+to indirectly detect some of these internal VM faults.
23
+
24
+
25
+
26
+Problem description
27
+===================
28
+
29
+Currently, Masakari instance monitoring is a strictly non-intrusive black-box
30
+type monitoring through qemu and libvirt.  This detects a number of faults
31
+for which  Masakari's auto-recovery mechanisms can be used to recover the
32
+instance/VM.
33
+
34
+However, there are a number of internal instance/VM faults not detected by
35
+this black-box monitoring, that if detected by Masakari, could be recovered
36
+by these same Masakari auto-recovery mechanisms.  This includes faults such as
37
+hung Guest OS, failure of the Guest OS to schedule Application process(es), failure
38
+to route basic IO within the Guest, Application-specific process failures or data
39
+corruption, etc. .  The exact scope of the proposed monitoring of this blueprint
40
+is described at the end of the 'Proposed change' section.
41
+
42
+Monitoring of Internal instance/VM faults requires that the Guest VM
43
+supports software to respond to this monitoring.  In the following proposal,
44
+the Guest VM must support the QEMU Guest Agent.  Because not all VMs will support
45
+this software, the monitoring of internal instance/VM faults, by the OpenStack Host,
46
+must be optionally enabled per VM or per VM image.
47
+
48
+
49
+
50
+Proposed change
51
+===============
52
+
53
+This blueprint introduces introspective instance monitoring; specifically, VM
54
+Heartbeat Monitoring via the QEMU Guest Agent.  Any VM Heartbeat fault will be
55
+reported through the Masakari instance-alerter to registered  API drivers
56
+(e.g. masakari-api).
57
+
58
+The high-level architecture for Introspective Instance Monitoring is shown below::
59
+
60
+   +--------------------+   instance  +-------------+    + - - - - - - +
61
+   | instance-alerter   |<------------|  Masakari   |    |             |
62
+   |- - - - - - - - - - |   fault     |     VM      |      F U T U R E
63
+   | driver abstraction |             |  Heartbeat  |    |             |
64
+   |       layer        |             |    Agent    |
65
+   +--------------------+             +-------------+    + - - - - - - +
66
+              |    |                         ^                  ^
67
+     other <--+    |                         |                  |
68
+     apis          |                         | +----------------+
69
+                   v                         | |
70
+   +--------------------+                    | |
71
+   |    masakari-api    |                    v v
72
+   +--------------------+             +-------------+
73
+            |                         |  Libvirtd   |
74
+            v                         +-------------+
75
+   +--------------------+                    ^
76
+   |   masakari-engine  |                    | unix socket
77
+   +--------------------+                    v
78
+            |                         +-------------+
79
+            | (recovery)              |    QEMU     |
80
+            v                         +-------------+
81
+   +--------------------+                    ^
82
+   |                    |                    |
83
+   |      OpenStack     |       +--------------------------------------+
84
+   |                    |       | VM         | virtio serial device    |
85
+   +--------------------+       |            v                         |
86
+                                |       +--------------------+         |
87
+                                |       |   QEMU             |         |
88
+                                |       |   Guest Agent      |         |
89
+                                |       |   ( guest-ping{} ) |         |
90
+                                |       +--------------------+         |
91
+                                |                                      |
92
+                                |         +-------------+              |
93
+                                |       +-------------+ |              |
94
+                                |       |             | |              |
95
+                                |       | Application | |              |
96
+                                |       |             | +              |
97
+                                |       +-------------+                |
98
+                                +--------------------------------------+
99
+
100
+
101
+VM Heartbeat and Healthcheck Monitoring will leverage the QEMU feature, Guest
102
+Agent [1], for both the transport level
103
+communication between OpenStack Host and the Guest VM, and the built-in
104
+guest ping command (guest-ping{}).  A QEMU Guest Agent
105
+daemon, built as part of QEMU, is installed and run inside the Guest and
106
+implements support for QMP commands that are sent to
107
+the guest.  Specifically the QEMU Guest Agent daemon
108
+connects to a virtio-serial device (/dev/virtio-ports/org.qemu.guest_agent.0),
109
+feeds the input to a QMP JSON parser, and when a command is received, invokes
110
+the QAPI generated dispatch routine.  In the case of VM Heartbeat Monitoring,
111
+the QEMU Guest Agent command, 'guest-ping', will be used as the heartbeat challenge
112
+request from the Host.
113
+
114
+On the host, OpenStack Nova already supports an image property,
115
+hw_qemu_guest_agent, that can be used to specify that the VM should
116
+be created with the QEMU guest agent virto-serial-interface.  The Masakari
117
+VM Heartbeat Agent will discover VMs with hw_qemu_guest_agent enabled
118
+by monitoring the files representing the socket identifiers for the QEMU Guest
119
+Agents' virtual-serial-interfaces.
120
+
121
+libvirt-qemu provides a virDomainQemuAgentCommand() for sending commands
122
+to a selected VM's QEMU guest agent.  This command opens the unix socket to
123
+the VM's virtio-serial-interface, sends the command, waits to receive the response
124
+and closes the socket.  The command fails if the unix socket is openned by
125
+another process, i.e. another process is sending a command to the same VM.
126
+
127
+Masakari VM Heartbeat Agent will leverage virDomainQemuAgentCommand() provided
128
+by libvirtd to send the heartbeat challenge requests (i.e. the QEMU Guest Agent's
129
+guest-ping command) to the VM(s) and report any detected faults to the masakari
130
+instance-alerter.
131
+
132
+The Masakari VM Heartbeat Agent, on the host, will initiate VM Heartbeating as soon
133
+as it discovers the VM has QEMU Guest Agent communication enabled.  However, in order
134
+to deal with arbitrary boot times for VMs/Guests, which may delay the Guests ability
135
+to start responding to the heartbeat challenges, the Masakari VM Heartbeat Agent will
136
+not enable reporting of heartbeat failures until after the first successful heartbeat
137
+response is received from the VM/Guest.
138
+
139
+This functionality will support a flag in masakari.conf for overall enabling/disabling of
140
+introspective-instance-monitoring.  It will also support parameters for configuring
141
+default heartbeat period and default consecutive heartbeat miss threshold (before
142
+declaring fault); in future, flavor extraspecs could be used for VMs to specify
143
+specific values for these.
144
+
145
+At a high-level, the scope of this heartbeat monitoring is that the QEMU Guest Agent
146
+is running within the VM.  However, just the fact that a Heartbeat message can get
147
+from the Host to the QEMU Guest Agent inside the VM and back, inherently validates
148
+that a lot of basic Guest Kernel functionality is working; i.e. the Guest OS is not
149
+hung or failed, the QEMU heartbeat message was properly routed through basic linux
150
+socket IO, etc. .  In the future, the heartbeating can be extended to
151
+do more than just reply/ack the message ... i.e. basic sanity / health tests on key
152
+applications within the VM can be done.
153
+
154
+
155
+
156
+
157
+Alternatives
158
+------------
159
+
160
+Could simply leverage the virtual hardware watchdog of QEMU/KVM
161
+[2] for Instance monitoring.
162
+
163
+However, VM Heartbeat Monitoring:
164
+
165
+- provides notification of the Heartbeat status to higher-level cloud
166
+  entities through instance-alerter, such as Masakari, Mistral and/or Vitrage,
167
+
168
+   * which depending on the backend can result in VM auto-recovery (Masakari) or
169
+     deduced-state updates in Nova for the VM and resulting Aodh Event generation
170
+     due to the VM state change (Vitrage).
171
+
172
+- in the future can be extended to provide a higher-level (i.e. application-level)
173
+  heartbeating
174
+
175
+   * i.e. if the Heartbeat requests are being answered by the Application running
176
+     within the VM
177
+
178
+- in the future can be extended to provide more than just heartbeating, as the
179
+  Application can use it to trigger a variety of audits,
180
+
181
+- in the future can be extended to provide a mechanism for the Application within the
182
+  VM to report a Health Status / Info back to the Host / Cloud.
183
+
184
+
185
+
186
+Limitation
187
+----------
188
+
189
+Only VMs supporting the QEMU Guest Agent can be monitored by the functionality of
190
+this proposal.
191
+
192
+
193
+Implementation
194
+==============
195
+
196
+Assignee(s)
197
+-----------
198
+
199
+Primary assignee:
200
+  greg-waines
201
+
202
+
203
+Milestones
204
+----------
205
+
206
+Target Milestone for completion:
207
+  Rocky-2
208
+
209
+
210
+Work Items
211
+----------
212
+
213
+- Masakari VM Heartbeat Agent on the Compute
214
+
215
+   * discovery of VMs with QEMU Guest Agent communication enabled,
216
+
217
+   * high-level logic for Heartbeat / Healthcheck monitoring,
218
+
219
+   * reporting of faults to masakari instance-alerter.
220
+
221
+- tox and/or tempest test suite updates
222
+
223
+- masakari documentation updates
224
+
225
+
226
+
227
+Dependencies
228
+============
229
+
230
+- requires that VMs are installed with and running the QEMU Guest Agent [1]
231
+  built as part of QEMU.
232
+
233
+
234
+References
235
+==========
236
+
237
+[1] http://wiki.qemu.org/Features/GuestAgent
238
+
239
+[2] https://libvirt.org/formatdomain.html#elementsWatchdog
240
+

Loading…
Cancel
Save