This moves the specs that were completed in Ocata from the 'approved' directory to the 'implemented' directory and updates the redirects file. This also fixes a bug in the move-implemented-specs script such that we filter out completed blueprints which are not implemented, which was the case with blueprint cold-migration-with-target-ocata which was superseded. Change-Id: I476e2e2d4bb089d901b887ce7728fc29c35a0957
313 lines
11 KiB
ReStructuredText
313 lines
11 KiB
ReStructuredText
..
|
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
|
License.
|
|
|
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
|
|
|
================================
|
|
Libvirt: Use the virtlogd deamon
|
|
================================
|
|
|
|
https://blueprints.launchpad.net/nova/+spec/libvirt-virtlogd
|
|
|
|
If the *serial console* feature is enabled on a compute node with
|
|
``[serial_console].enabled = True`` it deactivates the logging of the
|
|
boot messages. From a REST API perspective, this means that the two APIs
|
|
``os-getConsoleOutput`` and ``os-getSerialConsole`` are mutually exclusive.
|
|
Both APIs can be valuable for cloud operators in the case when something
|
|
goes wrong during the launch of an instance. This blueprint wants to lift
|
|
the XOR relationship between those two REST APIs.
|
|
|
|
Problem description
|
|
===================
|
|
|
|
The problem can be seen in the method ``_create_serial_console_devices``
|
|
in the libvirt driver. The simplified logic is::
|
|
|
|
def _create_serial_console_devices(self, guest, instance, flavor,
|
|
image_meta):
|
|
if CONF.serial_console.enabled:
|
|
console = vconfig.LibvirtConfigGuestSerial()
|
|
console.type = "tcp"
|
|
guest.add_device(console)
|
|
else:
|
|
consolelog = vconfig.LibvirtConfigGuestSerial()
|
|
consolelog.type = "file"
|
|
guest.add_device(consolelog)
|
|
|
|
This ``if-else`` establishes the XOR relationship between having a log of
|
|
the guest's boot messages or getting a handle to the guest's serial console.
|
|
From a driver point of view, this means getting valid return values for the
|
|
method ``get_serial_console`` or ``get_console_output`` which are used to
|
|
satisfy the two REST APIs ``os-getConsoleOutput`` and ``os-getSerialConsole``.
|
|
|
|
Use Cases
|
|
----------
|
|
|
|
From an end user point of view, this means that, with the current state, it
|
|
is possible to get the console output of an instance on host A (serial console
|
|
is not enabled) but after a rebuild on host B (serial console is enabled) it
|
|
is not possible to get the console output. As an end user is not aware of the
|
|
host's configuration, this could be a confusing experience. Written that down
|
|
I'm wondering why the serial console was designed with a compute node scope
|
|
and not with an instance scope, but that's another discussion I don't want to
|
|
do here.
|
|
|
|
After the implementation, deployers will have both means by hand if there is
|
|
something wrong during the launch of an instance. The persisted log in case
|
|
the instance crashed AND the serial console in case the instance launched but
|
|
has issues, for example a failed establishing of networking so that SSH access
|
|
is not possible. Also, they will be impacted with a new dependency on the
|
|
hosts (see `Dependencies`_).
|
|
|
|
Developers won't be impacted.
|
|
|
|
|
|
Proposed change
|
|
===============
|
|
|
|
I'd like to switch from the log file to the ``virtlogd`` deamon. This logging
|
|
deamon was announced on the libvirt ML [1] and is available with libvirt
|
|
version 1.3.3 and Qemu 2.7.0. This logging deamon handles the output from the
|
|
guest's console and writes it into the file
|
|
``/var/log/libvirt/qemu/guestname-serial0.log`` on the host but
|
|
truncates/rotates that log so that it doesn't exhaust the hosts disk space
|
|
(this would solve an old bug [3]).
|
|
|
|
Nova would generate::
|
|
|
|
<serial type="tcp">
|
|
<source mode="connect" host="0.0.0.0" service="2445"/>
|
|
<log file="/var/log/libvirt/qemu/guestname-serial0.log" append="on"/>
|
|
<protocol type="raw"/>
|
|
<target port="1"/>
|
|
</serial>
|
|
|
|
For providing the console log data, nova would need to read the console
|
|
log file from disk directly. As the log file gets rotated automatically
|
|
we have to ensure that all necessary rotated log files get read to satisfy
|
|
the upper limit of the ``get_console_output`` driver API contract.
|
|
|
|
|
|
FAQ
|
|
---
|
|
|
|
#. How is the migration/rebuild handled? The 4 cases which are possible
|
|
(based on the node's patch level):
|
|
|
|
#. ``N -> N``: Neither source nor target node is patched. That's what
|
|
we have today. Nothing to do.
|
|
|
|
#. ``N -> N+1``: The target node is patched, which means it can make
|
|
use of the output from *virtlogd*. Can we "import" the existing log
|
|
of the source node into the *virtlogd* logs of the target node?
|
|
|
|
A: The guest will keep its configuration from the source host
|
|
and don't make use of the *virtlogd* service until it gets rebuilt.
|
|
|
|
#. ``N+1 -> N``: The source node is patched and the instance gets
|
|
migrated to a target node which cannot utilize the *virtlogd*
|
|
output. If the serial console is enable on the target node, do
|
|
we throw away the log because we cannot update it on the target
|
|
node
|
|
|
|
A: In the case of migration to an old host, we try to copy the
|
|
existing log file across, and configure the guest with the
|
|
``type=tcp`` backend. This provides ongoing support for interactive
|
|
console. The log file will remain unchanged if possible. A failed
|
|
copy operation should not prevent the migration of the guest.
|
|
|
|
#. ``N+1 -> N+1``: Source and target node are patched. Will libvirt
|
|
migrate the existing log from the source node too, which would
|
|
solve another open bug [4].
|
|
|
|
#. Q: Could a stalling of the guest happen if *nova-compute* is reading the
|
|
log file and *virtlogd* tries to write to the file but is blocked?
|
|
|
|
A: No, *virtlogd* will ensure things are fully parallelized
|
|
|
|
#. Q: The *virtlogd* deamon has a ``1:1`` relationship to a compute node.
|
|
It would be interesting how well it performs when, for example,
|
|
hundreds of instances are running on one compute node.
|
|
|
|
A: We could add a I/O rate limit to *virtlogd* so it refuses to read data
|
|
too quickly from a single guest. This prevents a single guest DOS'ing
|
|
the host.
|
|
|
|
#. Q: Are there architecture dependencies? Right now, a nova-compute node on a
|
|
s390 architecture depends on the *serial console* feature because it
|
|
cannot provide the other console types (VNC, SPICE, RDP). Which means it
|
|
would benefit from having both.
|
|
|
|
A: No architecture dependencies.
|
|
|
|
#. Q: How are restarts of the *virtlogd* deamon handled? Do we lose
|
|
information in the timeframe between stop and start?
|
|
|
|
A: The *virtlogd* daemon will be able to re-exec() itself while keeping
|
|
file handles open. This will ensure no data loss during restart of
|
|
*virtlogd*.
|
|
|
|
#. Q: Do we need a version check of libvirt to detect if the *virtlodg* is
|
|
available on the host? Or is it sufficient to check if the folder
|
|
``/var/log/virtlogd/`` is present?
|
|
|
|
A: We will do a version number check on libvirt to figure out if it is
|
|
capable to use it.
|
|
|
|
Alternatives
|
|
------------
|
|
|
|
#. In case where the *serial console* is enabled, we could establish a
|
|
connection to the guest with it and execute ``tail /var/log/dmesg.log``
|
|
and return that output in the driver's ``get_console_output`` method which
|
|
is used to satisfy the ``os-getConsoleOutput`` REST API.
|
|
|
|
**Counter-arguments:** We would need to save the authentication data to
|
|
the guest, which would not be technically challenging but the customers
|
|
could be unhappy that Nova can access their guests at any time. A second
|
|
argument is, that the serial console access is blocking, which means
|
|
if user A uses the serial console of an instance, user B is not able to do
|
|
the same.
|
|
|
|
#. We could remove the ``if-else`` and create both devices.
|
|
|
|
**Counter-arguments:** This was tried in [2] and stopped because this could
|
|
introduce a backwards incompatibility which could prevent the rebuild
|
|
of an instance. The root cause for this was, that there is an upper bound
|
|
of 4 serial devices on a guest, and this upper bound could be exceeded if
|
|
an instance which already has 4 serial devices gets rebuilt on a compute
|
|
node which would have patch [2].
|
|
|
|
|
|
Data model impact
|
|
-----------------
|
|
|
|
None
|
|
|
|
REST API impact
|
|
---------------
|
|
|
|
None
|
|
|
|
Security impact
|
|
---------------
|
|
|
|
None
|
|
|
|
Notifications impact
|
|
--------------------
|
|
|
|
None
|
|
|
|
Other end user impact
|
|
---------------------
|
|
|
|
None
|
|
|
|
Performance Impact
|
|
------------------
|
|
|
|
None
|
|
|
|
Other deployer impact
|
|
---------------------
|
|
|
|
* The *virtlogd* service has to run for this functionality and should be
|
|
monitored.
|
|
* This would also solve a long-running bug which can cause a host disc space
|
|
exhaustion (see [3]).
|
|
|
|
Developer impact
|
|
----------------
|
|
|
|
None
|
|
|
|
|
|
Implementation
|
|
==============
|
|
|
|
Assignee(s)
|
|
-----------
|
|
|
|
Primary assignee:
|
|
Markus Zoeller (https://launchpad.net/~mzoeller)
|
|
|
|
|
|
Work Items
|
|
----------
|
|
|
|
* (optional) get a gate job running which has the *serial console* activated
|
|
* add version check if libvirt supports the *virtlogd* functionality
|
|
* add "happy path" which creates a guest device which uses *virtlogd*
|
|
* ensure "rebuild" uses the new functionality when migrated from an old host
|
|
* add reconfiguration of the guest when migrating from N+1 -> N hosts
|
|
to keep backwards compatibility
|
|
|
|
|
|
Dependencies
|
|
============
|
|
|
|
* Libvirt 1.3.3 which brings the *libvirt virtlod logging deamon* as
|
|
described in [1].
|
|
* Qemu 2.7.0
|
|
|
|
|
|
Testing
|
|
=======
|
|
|
|
The tempest tests which are annotated with
|
|
``CONF.compute_feature_enabled.console_output`` will have to work with
|
|
a setup which
|
|
|
|
* has the dependency to the *virtlogd deamon* resolved.
|
|
* AND has the serial console feature enabled (AFAIK there is not job right
|
|
now which has this enabled)
|
|
|
|
* A new functional test for the live-migration case has to be added
|
|
|
|
Documentation Impact
|
|
====================
|
|
|
|
None
|
|
|
|
References
|
|
==========
|
|
|
|
[1] libvirt ML, "[libvirt] RFC: Building a virtlogd daemon":
|
|
http://www.redhat.com/archives/libvir-list/2015-January/msg00762.html
|
|
|
|
[2] Gerrit; "libvirt: use log file and serial console at the same time":
|
|
https://review.openstack.org/#/c/188058/
|
|
|
|
[3] Launchpad; " console.log grows indefinitely ":
|
|
https://bugs.launchpad.net/nova/+bug/832507
|
|
|
|
[4] Launchpad; "live block migration results in loss of console log":
|
|
https://bugs.launchpad.net/nova/+bug/1203193
|
|
|
|
[5] A set of patches on the libvirt/qemu ML:
|
|
|
|
* [PATCH 0/5] Initial patches to introduce a virtlogd daemon
|
|
* [PATCH 1/5] util: add API for writing to rotating files
|
|
* [PATCH 2/5] Import stripped down virtlockd code as basis of virtlogd
|
|
* [PATCH 3/5] logging: introduce log handling protocol
|
|
* [PATCH 4/5] logging: add client for virtlogd daemon
|
|
* [PATCH 5/5] qemu: add support for sending QEMU stdout/stderr to virtlogd
|
|
|
|
[6] libvirt ML, "[libvirt] [PATCH v2 00/13] Introduce a virtlogd daemon":
|
|
https://www.redhat.com/archives/libvir-list/2015-November/msg00412.html
|
|
|
|
History
|
|
=======
|
|
|
|
.. list-table:: Revisions
|
|
:header-rows: 1
|
|
|
|
* - Release Name
|
|
- Description
|
|
* - Newton
|
|
- Proposed and approved but blocked by https://bugs.launchpad.net/qemu/+bug/1599214
|
|
* - Ocata
|
|
- Re-proposed.
|