Libvirt: Use the virtlogd deamon
If the *serial console* feature is enabled on a compute node with ``[serial_console].enabled = True`` it deactivates the logging of the boot messages. From a REST API perspective, this means that the two APIs ``os-getConsoleOutput`` and ``os-getSerialConsole`` are mutually exclusive. Both APIs can be valuable for cloud operators in the case when something goes wrong during the launch of an instance. This blueprint wants to lift the XOR relationship between those two REST APIs. blueprint libvirt-virtlogd Change-Id: I9a1fbf005b0f48df90093a346579d9ddc64f7846
This commit is contained in:
310
specs/newton/approved/libvirt-virtlogd.rst
Normal file
310
specs/newton/approved/libvirt-virtlogd.rst
Normal file
@@ -0,0 +1,310 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
================================
|
||||
Libvirt: Use the virtlogd deamon
|
||||
================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/libvirt-virtlogd
|
||||
|
||||
If the *serial console* feature is enabled on a compute node with
|
||||
``[serial_console].enabled = True`` it deactivates the logging of the
|
||||
boot messages. From a REST API perspective, this means that the two APIs
|
||||
``os-getConsoleOutput`` and ``os-getSerialConsole`` are mutually exclusive.
|
||||
Both APIs can be valuable for cloud operators in the case when something
|
||||
goes wrong during the launch of an instance. This blueprint wants to lift
|
||||
the XOR relationship between those two REST APIs.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The problem can be seen in the method ``_create_serial_console_devices``
|
||||
in the libvirt driver. The simplified logic is::
|
||||
|
||||
def _create_serial_console_devices(self, guest, instance, flavor,
|
||||
image_meta):
|
||||
if CONF.serial_console.enabled:
|
||||
console = vconfig.LibvirtConfigGuestSerial()
|
||||
console.type = "tcp"
|
||||
guest.add_device(console)
|
||||
else:
|
||||
consolelog = vconfig.LibvirtConfigGuestSerial()
|
||||
consolelog.type = "file"
|
||||
guest.add_device(consolelog)
|
||||
|
||||
This ``if-else`` establishes the XOR relationship between having a log of
|
||||
the guest's boot messages or getting a handle to the guest's serial console.
|
||||
From a driver point of view, this means getting valid return values for the
|
||||
method ``get_serial_console`` or ``get_console_output`` which are used to
|
||||
satisfy the two REST APIs ``os-getConsoleOutput`` and ``os-getSerialConsole``.
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
From an end user point of view, this means that, with the current state, it
|
||||
is possible to get the console output of an instance on host A (serial console
|
||||
is not enabled) but after a rebuild on host B (serial console is enabled) it
|
||||
is not possible to get the console output. As an end user is not aware of the
|
||||
host's configuration, this could be a confusing experience. Written that down
|
||||
I'm wondering why the serial console was designed with a compute node scope
|
||||
and not with an instance scope, but that's another discussion I don't want to
|
||||
do here.
|
||||
|
||||
After the implementation, deployers will have both means by hand if there is
|
||||
something wrong during the launch of an instance. The persisted log in case
|
||||
the instance crashed AND the serial console in case the instance launched but
|
||||
has issues, for example a failed establishing of networking so that SSH access
|
||||
is not possible. Also, they will be impacted with a new dependency on the
|
||||
hosts (see `Dependencies`_).
|
||||
|
||||
Developers won't be impacted.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
I'd like to switch from the log file to the ``virtlogd`` deamon. This logging
|
||||
deamon was announced on the libvirt ML [1] and is available with libvirt
|
||||
version 1.3.3 and Qemu 2.6.0. This logging deamon handles the output from the
|
||||
guest's console and writes it into the file
|
||||
``/var/log/libvirt/qemu/guestname-serial0.log`` on the host but
|
||||
truncates/rotates that log so that it doesn't exhaust the hosts disk space
|
||||
(this would solve an old bug [3]).
|
||||
|
||||
Nova would generate::
|
||||
|
||||
<serial type="tcp">
|
||||
<source mode="connect" host="0.0.0.0" service="2445"/>
|
||||
<log file="/var/log/libvirt/qemu/guestname-serial0.log" append="on"/>
|
||||
<protocol type="raw"/>
|
||||
<target port="1"/>
|
||||
</serial>
|
||||
|
||||
For providing the console log data, nova would need to read the console
|
||||
log file from disk directly. As the log file gets rotated automatically
|
||||
we have to ensure that all necessary rotated log files get read to satisfy
|
||||
the upper limit of the ``get_console_output`` driver API contract.
|
||||
|
||||
|
||||
FAQ
|
||||
---
|
||||
|
||||
#. How is the migration/rebuild handled? The 4 cases which are possible
|
||||
(based on the node's patch level):
|
||||
|
||||
#. ``N -> N``: Neither source nor target node is patched. That's what
|
||||
we have today. Nothing to do.
|
||||
|
||||
#. ``N -> N+1``: The target node is patched, which means it can make
|
||||
use of the output from *virtlogd*. Can we "import" the existing log
|
||||
of the source node into the *virtlogd* logs of the target node?
|
||||
|
||||
A: The guest will keep its configuration from the source host
|
||||
and don't make use of the *virtlogd* service until it gets rebuilt.
|
||||
|
||||
#. ``N+1 -> N``: The source node is patched and the instance gets
|
||||
migrated to a target node which cannot utilize the *virtlogd*
|
||||
output. If the serial console is enable on the target node, do
|
||||
we throw away the log because we cannot update it on the target
|
||||
node
|
||||
|
||||
A: In the case of migration to an old host, we try to copy the
|
||||
existing log file across, and configure the guest with the
|
||||
``type=tcp`` backend. This provides ongoing support for interactive
|
||||
console. The log file will remain unchanged if possible. A failed
|
||||
copy operation should not prevent the migration of the guest.
|
||||
|
||||
#. ``N+1 -> N+1``: Source and target node are patched. Will libvirt
|
||||
migrate the existing log from the source node too, which would
|
||||
solve another open bug [4].
|
||||
|
||||
#. Q: Could a stalling of the guest happen if *nova-compute* is reading the
|
||||
log file and *virtlogd* tries to write to the file but is blocked?
|
||||
|
||||
A: No, *virtlogd* will ensure things are fully parallelized
|
||||
|
||||
#. Q: The *virtlogd* deamon has a ``1:1`` relationship to a compute node.
|
||||
It would be interesting how well it performs when, for example,
|
||||
hundreds of instances are running on one compute node.
|
||||
|
||||
A: We could add a I/O rate limit to *virtlogd* so it refuses to read data
|
||||
too quickly from a single guest. This prevents a single guest DOS'ing
|
||||
the host.
|
||||
|
||||
#. Q: Are there architecture dependencies? Right now, a nova-compute node on a
|
||||
s390 architecture depends on the *serial console* feature because it
|
||||
cannot provide the other console types (VNC, SPICE, RDP). Which means it
|
||||
would benefit from having both.
|
||||
|
||||
A: No architecture dependencies.
|
||||
|
||||
#. Q: How are restarts of the *virtlogd* deamon handled? Do we lose
|
||||
information in the timeframe between stop and start?
|
||||
|
||||
A: The *virtlogd* daemon will be able to re-exec() itself while keeping
|
||||
file handles open. This will ensure no data loss during restart of
|
||||
*virtlogd*.
|
||||
|
||||
#. Q: Do we need a version check of libvirt to detect if the *virtlodg* is
|
||||
available on the host? Or is it sufficient to check if the folder
|
||||
``/var/log/virtlogd/`` is present?
|
||||
|
||||
A: We will do a version number check on libvirt to figure out if it is
|
||||
capable to use it.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
#. In case where the *serial console* is enabled, we could establish a
|
||||
connection to the guest with it and execute ``tail /var/log/dmesg.log``
|
||||
and return that output in the driver's ``get_console_output`` method which
|
||||
is used to satisfy the ``os-getConsoleOutput`` REST API.
|
||||
|
||||
**Counter-arguments:** We would need to save the authentication data to
|
||||
the guest, which would not be technically challenging but the customers
|
||||
could be unhappy that Nova can access their guests at any time. A second
|
||||
argument is, that the serial console access is blocking, which means
|
||||
if user A uses the serial console of an instance, user B is not able to do
|
||||
the same.
|
||||
|
||||
#. We could remove the ``if-else`` and create both devices.
|
||||
|
||||
**Counter-arguments:** This was tried in [2] and stopped because this could
|
||||
introduce a backwards incompatibility which could prevent the rebuild
|
||||
of an instance. The root cause for this was, that there is an upper bound
|
||||
of 4 serial devices on a guest, and this upper bound could be exceeded if
|
||||
an instance which already has 4 serial devices gets rebuilt on a compute
|
||||
node which would have patch [2].
|
||||
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
* The *virtlogd* service has to run for this functionality and should be
|
||||
monitored.
|
||||
* This would also solve a long-running bug which can cause a host disc space
|
||||
exhaustion (see [3]).
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Markus Zoeller (https://launchpad.net/~mzoeller)
|
||||
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* (optional) get a gate job running which has the *serial console* activated
|
||||
* add version check if libvirt supports the *virtlogd* functionality
|
||||
* add "happy path" which creates a guest device which uses *virtlogd*
|
||||
* ensure "rebuild" uses the new functionality when migrated from an old host
|
||||
* add reconfiguration of the guest when migrating from N+1 -> N hosts
|
||||
to keep backwards compatibility
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
* Libvirt 1.3.3 which brings the *libvirt virtlod logging deamon* as
|
||||
described in [1].
|
||||
* Qemu 2.6.0
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
The tempest tests which are annotated with
|
||||
``CONF.compute_feature_enabled.console_output`` will have to work with
|
||||
a setup which
|
||||
|
||||
* has the dependency to the *virtlogd deamon* resolved.
|
||||
* AND has the serial console feature enabled (AFAIK there is not job right
|
||||
now which has this enabled)
|
||||
|
||||
* A new functional test for the live-migration case has to be added
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
None
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[1] libvirt ML, "[libvirt] RFC: Building a virtlogd daemon":
|
||||
http://www.redhat.com/archives/libvir-list/2015-January/msg00762.html
|
||||
|
||||
[2] Gerrit; "libvirt: use log file and serial console at the same time":
|
||||
https://review.openstack.org/#/c/188058/
|
||||
|
||||
[3] Launchpad; " console.log grows indefinitely ":
|
||||
https://bugs.launchpad.net/nova/+bug/832507
|
||||
|
||||
[4] Launchpad; "live block migration results in loss of console log":
|
||||
https://bugs.launchpad.net/nova/+bug/1203193
|
||||
|
||||
[5] A set of patches on the libvirt/qemu ML:
|
||||
|
||||
* [PATCH 0/5] Initial patches to introduce a virtlogd daemon
|
||||
* [PATCH 1/5] util: add API for writing to rotating files
|
||||
* [PATCH 2/5] Import stripped down virtlockd code as basis of virtlogd
|
||||
* [PATCH 3/5] logging: introduce log handling protocol
|
||||
* [PATCH 4/5] logging: add client for virtlogd daemon
|
||||
* [PATCH 5/5] qemu: add support for sending QEMU stdout/stderr to virtlogd
|
||||
|
||||
[6] libvirt ML, "[libvirt] [PATCH v2 00/13] Introduce a virtlogd daemon":
|
||||
https://www.redhat.com/archives/libvir-list/2015-November/msg00412.html
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Newton
|
||||
- Introduced
|
||||
Reference in New Issue
Block a user