Merge "Libvirt: Use the virtlogd deamon"
This commit is contained in:
310
specs/newton/approved/libvirt-virtlogd.rst
Normal file
310
specs/newton/approved/libvirt-virtlogd.rst
Normal file
@@ -0,0 +1,310 @@
|
|||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
================================
|
||||||
|
Libvirt: Use the virtlogd deamon
|
||||||
|
================================
|
||||||
|
|
||||||
|
https://blueprints.launchpad.net/nova/+spec/libvirt-virtlogd
|
||||||
|
|
||||||
|
If the *serial console* feature is enabled on a compute node with
|
||||||
|
``[serial_console].enabled = True`` it deactivates the logging of the
|
||||||
|
boot messages. From a REST API perspective, this means that the two APIs
|
||||||
|
``os-getConsoleOutput`` and ``os-getSerialConsole`` are mutually exclusive.
|
||||||
|
Both APIs can be valuable for cloud operators in the case when something
|
||||||
|
goes wrong during the launch of an instance. This blueprint wants to lift
|
||||||
|
the XOR relationship between those two REST APIs.
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
The problem can be seen in the method ``_create_serial_console_devices``
|
||||||
|
in the libvirt driver. The simplified logic is::
|
||||||
|
|
||||||
|
def _create_serial_console_devices(self, guest, instance, flavor,
|
||||||
|
image_meta):
|
||||||
|
if CONF.serial_console.enabled:
|
||||||
|
console = vconfig.LibvirtConfigGuestSerial()
|
||||||
|
console.type = "tcp"
|
||||||
|
guest.add_device(console)
|
||||||
|
else:
|
||||||
|
consolelog = vconfig.LibvirtConfigGuestSerial()
|
||||||
|
consolelog.type = "file"
|
||||||
|
guest.add_device(consolelog)
|
||||||
|
|
||||||
|
This ``if-else`` establishes the XOR relationship between having a log of
|
||||||
|
the guest's boot messages or getting a handle to the guest's serial console.
|
||||||
|
From a driver point of view, this means getting valid return values for the
|
||||||
|
method ``get_serial_console`` or ``get_console_output`` which are used to
|
||||||
|
satisfy the two REST APIs ``os-getConsoleOutput`` and ``os-getSerialConsole``.
|
||||||
|
|
||||||
|
Use Cases
|
||||||
|
----------
|
||||||
|
|
||||||
|
From an end user point of view, this means that, with the current state, it
|
||||||
|
is possible to get the console output of an instance on host A (serial console
|
||||||
|
is not enabled) but after a rebuild on host B (serial console is enabled) it
|
||||||
|
is not possible to get the console output. As an end user is not aware of the
|
||||||
|
host's configuration, this could be a confusing experience. Written that down
|
||||||
|
I'm wondering why the serial console was designed with a compute node scope
|
||||||
|
and not with an instance scope, but that's another discussion I don't want to
|
||||||
|
do here.
|
||||||
|
|
||||||
|
After the implementation, deployers will have both means by hand if there is
|
||||||
|
something wrong during the launch of an instance. The persisted log in case
|
||||||
|
the instance crashed AND the serial console in case the instance launched but
|
||||||
|
has issues, for example a failed establishing of networking so that SSH access
|
||||||
|
is not possible. Also, they will be impacted with a new dependency on the
|
||||||
|
hosts (see `Dependencies`_).
|
||||||
|
|
||||||
|
Developers won't be impacted.
|
||||||
|
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
I'd like to switch from the log file to the ``virtlogd`` deamon. This logging
|
||||||
|
deamon was announced on the libvirt ML [1] and is available with libvirt
|
||||||
|
version 1.3.3 and Qemu 2.6.0. This logging deamon handles the output from the
|
||||||
|
guest's console and writes it into the file
|
||||||
|
``/var/log/libvirt/qemu/guestname-serial0.log`` on the host but
|
||||||
|
truncates/rotates that log so that it doesn't exhaust the hosts disk space
|
||||||
|
(this would solve an old bug [3]).
|
||||||
|
|
||||||
|
Nova would generate::
|
||||||
|
|
||||||
|
<serial type="tcp">
|
||||||
|
<source mode="connect" host="0.0.0.0" service="2445"/>
|
||||||
|
<log file="/var/log/libvirt/qemu/guestname-serial0.log" append="on"/>
|
||||||
|
<protocol type="raw"/>
|
||||||
|
<target port="1"/>
|
||||||
|
</serial>
|
||||||
|
|
||||||
|
For providing the console log data, nova would need to read the console
|
||||||
|
log file from disk directly. As the log file gets rotated automatically
|
||||||
|
we have to ensure that all necessary rotated log files get read to satisfy
|
||||||
|
the upper limit of the ``get_console_output`` driver API contract.
|
||||||
|
|
||||||
|
|
||||||
|
FAQ
|
||||||
|
---
|
||||||
|
|
||||||
|
#. How is the migration/rebuild handled? The 4 cases which are possible
|
||||||
|
(based on the node's patch level):
|
||||||
|
|
||||||
|
#. ``N -> N``: Neither source nor target node is patched. That's what
|
||||||
|
we have today. Nothing to do.
|
||||||
|
|
||||||
|
#. ``N -> N+1``: The target node is patched, which means it can make
|
||||||
|
use of the output from *virtlogd*. Can we "import" the existing log
|
||||||
|
of the source node into the *virtlogd* logs of the target node?
|
||||||
|
|
||||||
|
A: The guest will keep its configuration from the source host
|
||||||
|
and don't make use of the *virtlogd* service until it gets rebuilt.
|
||||||
|
|
||||||
|
#. ``N+1 -> N``: The source node is patched and the instance gets
|
||||||
|
migrated to a target node which cannot utilize the *virtlogd*
|
||||||
|
output. If the serial console is enable on the target node, do
|
||||||
|
we throw away the log because we cannot update it on the target
|
||||||
|
node
|
||||||
|
|
||||||
|
A: In the case of migration to an old host, we try to copy the
|
||||||
|
existing log file across, and configure the guest with the
|
||||||
|
``type=tcp`` backend. This provides ongoing support for interactive
|
||||||
|
console. The log file will remain unchanged if possible. A failed
|
||||||
|
copy operation should not prevent the migration of the guest.
|
||||||
|
|
||||||
|
#. ``N+1 -> N+1``: Source and target node are patched. Will libvirt
|
||||||
|
migrate the existing log from the source node too, which would
|
||||||
|
solve another open bug [4].
|
||||||
|
|
||||||
|
#. Q: Could a stalling of the guest happen if *nova-compute* is reading the
|
||||||
|
log file and *virtlogd* tries to write to the file but is blocked?
|
||||||
|
|
||||||
|
A: No, *virtlogd* will ensure things are fully parallelized
|
||||||
|
|
||||||
|
#. Q: The *virtlogd* deamon has a ``1:1`` relationship to a compute node.
|
||||||
|
It would be interesting how well it performs when, for example,
|
||||||
|
hundreds of instances are running on one compute node.
|
||||||
|
|
||||||
|
A: We could add a I/O rate limit to *virtlogd* so it refuses to read data
|
||||||
|
too quickly from a single guest. This prevents a single guest DOS'ing
|
||||||
|
the host.
|
||||||
|
|
||||||
|
#. Q: Are there architecture dependencies? Right now, a nova-compute node on a
|
||||||
|
s390 architecture depends on the *serial console* feature because it
|
||||||
|
cannot provide the other console types (VNC, SPICE, RDP). Which means it
|
||||||
|
would benefit from having both.
|
||||||
|
|
||||||
|
A: No architecture dependencies.
|
||||||
|
|
||||||
|
#. Q: How are restarts of the *virtlogd* deamon handled? Do we lose
|
||||||
|
information in the timeframe between stop and start?
|
||||||
|
|
||||||
|
A: The *virtlogd* daemon will be able to re-exec() itself while keeping
|
||||||
|
file handles open. This will ensure no data loss during restart of
|
||||||
|
*virtlogd*.
|
||||||
|
|
||||||
|
#. Q: Do we need a version check of libvirt to detect if the *virtlodg* is
|
||||||
|
available on the host? Or is it sufficient to check if the folder
|
||||||
|
``/var/log/virtlogd/`` is present?
|
||||||
|
|
||||||
|
A: We will do a version number check on libvirt to figure out if it is
|
||||||
|
capable to use it.
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
#. In case where the *serial console* is enabled, we could establish a
|
||||||
|
connection to the guest with it and execute ``tail /var/log/dmesg.log``
|
||||||
|
and return that output in the driver's ``get_console_output`` method which
|
||||||
|
is used to satisfy the ``os-getConsoleOutput`` REST API.
|
||||||
|
|
||||||
|
**Counter-arguments:** We would need to save the authentication data to
|
||||||
|
the guest, which would not be technically challenging but the customers
|
||||||
|
could be unhappy that Nova can access their guests at any time. A second
|
||||||
|
argument is, that the serial console access is blocking, which means
|
||||||
|
if user A uses the serial console of an instance, user B is not able to do
|
||||||
|
the same.
|
||||||
|
|
||||||
|
#. We could remove the ``if-else`` and create both devices.
|
||||||
|
|
||||||
|
**Counter-arguments:** This was tried in [2] and stopped because this could
|
||||||
|
introduce a backwards incompatibility which could prevent the rebuild
|
||||||
|
of an instance. The root cause for this was, that there is an upper bound
|
||||||
|
of 4 serial devices on a guest, and this upper bound could be exceeded if
|
||||||
|
an instance which already has 4 serial devices gets rebuilt on a compute
|
||||||
|
node which would have patch [2].
|
||||||
|
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
REST API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Notifications impact
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other end user impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other deployer impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
* The *virtlogd* service has to run for this functionality and should be
|
||||||
|
monitored.
|
||||||
|
* This would also solve a long-running bug which can cause a host disc space
|
||||||
|
exhaustion (see [3]).
|
||||||
|
|
||||||
|
Developer impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
Markus Zoeller (https://launchpad.net/~mzoeller)
|
||||||
|
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
* (optional) get a gate job running which has the *serial console* activated
|
||||||
|
* add version check if libvirt supports the *virtlogd* functionality
|
||||||
|
* add "happy path" which creates a guest device which uses *virtlogd*
|
||||||
|
* ensure "rebuild" uses the new functionality when migrated from an old host
|
||||||
|
* add reconfiguration of the guest when migrating from N+1 -> N hosts
|
||||||
|
to keep backwards compatibility
|
||||||
|
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
* Libvirt 1.3.3 which brings the *libvirt virtlod logging deamon* as
|
||||||
|
described in [1].
|
||||||
|
* Qemu 2.6.0
|
||||||
|
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
The tempest tests which are annotated with
|
||||||
|
``CONF.compute_feature_enabled.console_output`` will have to work with
|
||||||
|
a setup which
|
||||||
|
|
||||||
|
* has the dependency to the *virtlogd deamon* resolved.
|
||||||
|
* AND has the serial console feature enabled (AFAIK there is not job right
|
||||||
|
now which has this enabled)
|
||||||
|
|
||||||
|
* A new functional test for the live-migration case has to be added
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
[1] libvirt ML, "[libvirt] RFC: Building a virtlogd daemon":
|
||||||
|
http://www.redhat.com/archives/libvir-list/2015-January/msg00762.html
|
||||||
|
|
||||||
|
[2] Gerrit; "libvirt: use log file and serial console at the same time":
|
||||||
|
https://review.openstack.org/#/c/188058/
|
||||||
|
|
||||||
|
[3] Launchpad; " console.log grows indefinitely ":
|
||||||
|
https://bugs.launchpad.net/nova/+bug/832507
|
||||||
|
|
||||||
|
[4] Launchpad; "live block migration results in loss of console log":
|
||||||
|
https://bugs.launchpad.net/nova/+bug/1203193
|
||||||
|
|
||||||
|
[5] A set of patches on the libvirt/qemu ML:
|
||||||
|
|
||||||
|
* [PATCH 0/5] Initial patches to introduce a virtlogd daemon
|
||||||
|
* [PATCH 1/5] util: add API for writing to rotating files
|
||||||
|
* [PATCH 2/5] Import stripped down virtlockd code as basis of virtlogd
|
||||||
|
* [PATCH 3/5] logging: introduce log handling protocol
|
||||||
|
* [PATCH 4/5] logging: add client for virtlogd daemon
|
||||||
|
* [PATCH 5/5] qemu: add support for sending QEMU stdout/stderr to virtlogd
|
||||||
|
|
||||||
|
[6] libvirt ML, "[libvirt] [PATCH v2 00/13] Introduce a virtlogd daemon":
|
||||||
|
https://www.redhat.com/archives/libvir-list/2015-November/msg00412.html
|
||||||
|
|
||||||
|
History
|
||||||
|
=======
|
||||||
|
|
||||||
|
.. list-table:: Revisions
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Release Name
|
||||||
|
- Description
|
||||||
|
* - Newton
|
||||||
|
- Introduced
|
||||||
Reference in New Issue
Block a user