This reverts commit 5a10047f9d.
This gets us back to Ib0cf5d55750f13d0499a570f14024dca551ed4d4
which was meant to address an issue introduced
by Id188d48609f3d22d14e16c7f6114291d547a8986.
So we essentially had three changes:
1. Hard reboot would blow away volumes and vifs and then wait for the
vifs to be plugged; this caused a problem for some vif types (
linuxbridge was reported) because the event never came and we
timed out.
2. To workaround that, a second change was made to simply not wait for
vif plugging events.
3. Since #2 was a bit heavy-handed for a problem that didn't impact
openvswitch, another change was made to only wait for non-bridge vif
types, so we'd wait for OVS.
But it turns out that opendaylight is an OVS vif type and doesn't send
events for plugging the vif, only for binding the port (and we don't
re-bind the port during reboot). There is also a report of this being a
problem for other types of ports, see
If209f77cff2de00f694b01b2507c633ec3882c82.
So rather than try to special-case every possible vif type that could
be impacted by this, we are simply reverting the change so we no longer
wait for vif plugged events during hard reboot.
Note that if we went back to Id188d48609f3d22d14e16c7f6114291d547a8986
and tweaked that to not unplug/plug the vifs we wouldn't have this
problem either, and that change was really meant to deal with an
encrypted volume issue on reboot. But changing that logic is out of the
scope of this change. Alternatively, we could re-bind the port during
reboot but that could have other implications, or neutron could put
something into the port details telling us which vifs will send events
and which won't, but again that's all outside of the scope of this
patch.
Change-Id: Ib3f10706a7191c58909ec1938042ce338df4d499
Closes-Bug: #1755890
With the addition of multiattach we need to ensure that we
don't make brick calls to remove connections on detach volume
if that volume is attached to another Instance on the same
node.
This patch adds a new helper method (_should_disconnect_target)
to the virt driver that will inform the caller if the specified
volume is attached multiple times to the current host.
The general strategy for this call is to fetch a current reference
of the specified volume and then:
1. Check if that volume has >1 active attachments
2. Fetch the attachments for the volume and extract the server_uuids
for each of the attachments.
3. Check the server_uuids against a list of all known server_uuids
on the current host. Increment a connection_count for each item
found.
If the connection_count is >1 we return `False` indicating that the
volume is being used by more than one attachment on the host and
we therefore should NOT destroy the connection.
*NOTE*
This scenario is very different than the `shared_targets`
case (for which we supply a property on the Volume object). The
`shared_targets` scenario is specifically for Volume backends that
present >1 Volumes using a single Target. This mechanism is meant
to provide a signal to consumers that locking is required for the
creation and deletion of initiator/target sessions.
Closes-Bug: #1752115
Change-Id: Idc5cecffa9129d600c36e332c97f01f1e5ff1f9f
(cherry picked from commit 139426d514)
We need this in a later change to pull volume attachment
information from cinder for the volume being detached so
that we can do some attachment counting for multiattach
volumes being detached from instances on the same host.
Change-Id: I751fcb7532679905c4279744919c6cce84a11eb4
Related-Bug: #1752115
(cherry picked from commit d2941bfd16)
Under certain failure scenarios it may be that although the libvirt
definition for the volume has been removed for the instance that the
associated storage lun on the compute server may not have been fully
cleaned up yet.
In case users try an other attempt to detach volume we should not stop
the process whether the device is not found in domain definition but
try to disconnect the logical device from host.
This commit makes the process to attempt a disconnect volume even if
the device is not attached to the guest.
Closes-Bug: #1727260
Change-Id: I4182642aab3fd2ffb1c97d2de9bdca58982289d8
Signed-off-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@redhat.com>
(cherry picked from commit ce531dd1b7)
In change Ib0cf5d55750f13d0499a570f14024dca551ed4d4, we stopped waiting
for vif plug events during hard reboot and start, because the behavior
of neutron plug events depended on the vif type and we couldn't rely on
the stale network info cache.
This refines the logic not to wait for vif plug events only for the
bridge vif type, instead of for all types. We also add a flag to
_create_domain_and_network to indicate that we're in the middle of a
reboot and to expect to wait for plug events for all vif types except
the bridge vif type, regardless of the vif's 'active' status.
We only query network_info from neutron at the beginning of a reboot,
before we've unplugged anything, so the majority of the time, the vifs
will be active=True. The current logic in get_neutron_events will only
expect events for vifs with 'active' status False. This adds a way to
override that if we know we've already unplugged vifs as part of a
reboot.
Related-Bug: #1744361
Change-Id: Ib08afad3822f2ca95cfeea18d7f4fc4cb407b4d6
(cherry picked from commit aaf37a26d6)
Graphical console is optional thing. But when it is enabled then it is
good to have some input devices.
On x86(-64) it is handled by PS/2 keyboard/mouse. On ppc64 we have USB
keyboard/mouse. On aarch64 we have nothing.
So make sure that we have USB Host controller available and that usb
keyboard is present. Also USB tablet (default pointer_model device) will
have a port to plug to.
Closes-bug: 1745340
Change-Id: I69a934d188446a1aa95ab33975dbe1d6e058ebf9
Not all volume types put a 'volume_id' entry in the
connection_info['data'] dict. This change uses a new
utility method to look up the volume_id in the connection_info
data dict and if not found there, uses the 'serial' value
from the connection_info, which we know at least gets set
when the DriverVolumeBlockDevice code attaches the volume.
This also has to update pre_live_migration since the connection_info
dict doesn't have a 'volume_id' key in it. It's unclear what
this code was expecting, or if it ever really worked, but since
an attached volume represented by a BlockDeviceMapping here has
a volume_id attribute, we can just use that. As that code path
was never tested, this updates related unit tests and refactors
the tests to actually use the type of DriverVolumeBlockDevice
objects that the ComputeManager would be sending down to the
driver pre_live_migration method. The hard-to-read squashed
dicts in the tests are also re-formatted so a human can actually
read them.
Change-Id: Ie02d298cd92d5b5ebcbbcd2b0e8be01f197bfafb
Closes-Bug: #1746609
Originally, in change Id188d48609f3d22d14e16c7f6114291d547a8986 we
added a re-initialization of volumes, encryptors, and vifs to hard
reboot. When creating the libvirt domain and network, we were waiting
for vif plug events from neutron when we replugged the vifs. Then, we
started seeing timeouts in the linuxbridge gate job because compute
was timing out waiting for plug events from neutron during a hard
reboot.
It turns out that the behavior of neutron plug events depends on what
vif type we're using and we're also using a stale network info_cache
throughout the hard reboot code path, so we can't be 100% sure we know
which vifs to wait for plug events from anyway. We coincidentally get
some info_cache refreshes from network-changed events from neutron,
but we shouldn't rely on that.
Ideally, we could do something like wait for an unplug event after we
unplug the vif, then refresh the network_info cache, then wait for the
plug event. BUT, in the case of the os-vif linuxbridge unplug method,
it is a no-op, so I don't think we could expect to get an unplug
event for it (and we don't see any network-vif-unplugged events sent
in the q-svc log for the linuxbridge job during a hard reboot).
Closes-Bug: #1744361
Change-Id: Ib0cf5d55750f13d0499a570f14024dca551ed4d4
We currently log that we timed out waiting for vif plugging callbacks
from neutron but we don't include detail about which vif_ids we were
waiting for. This adds the event_names to the log message to aid
in debugging.
Related-Bug: #1744361
Change-Id: I8b67f1049b6a968b1dd0d839a7f0b30aa1730eeb
Since libvirt doesn't allow us to hot-unplug mediated devices, we need to
short-circuit the suspend action if the instance has mediated devices
and set it back to the ACTIVE state.
Change-Id: I01147bb3c66d94fdecb76395e5205767a905d18a
StorPool is distributed data storage software running on standard x86
servers. StorPool aggregates the performance and capacity of all drives
into a shared pool of storage distributed among the servers. Within
this storage pool the user creates thin-provisioned volumes that are
exposed to the clients as block devices. StorPool consists of two parts
wrapped in one package - a server and a client. The StorPool server
allows a hypervisor to act as a storage node, while the StorPool client
allows a hypervisor node to access the storage pool and act as a compute
node. In OpenStack terms the StorPool solution allows each hypervisor
node to be both a storage and a compute node simultaneously.
This driver allows StorPool volumes defined in Cinder to be attached as
additional disks to a virtual machine.
Change-Id: I3d40009eb17d054f33b3ed641643e285ba094ec2
Implements: blueprint libvirt-storpool-volume-attach
When we reboot a guest (like when stopping/starting it), we recreate the guest
XML. Since we need to know the existing mediated devices before destroying
the guest, let's pass them when recreating the XML.
NOTE: I'm also amending the reno file to exactly mention what has been tested
and fully functional and what is not working for all the instance actions.
I'm also considering that change as the last patch that makes the feature
production-ready as most of the left quirks can be easily worked around by
either rebuilding the instance (for example when you resize) or just shelving
the instance instead of suspending it.
Next changes in the series will address those issues but won't be identified
as part of the blueprint itself.
Change-Id: Idba99f3f9b4abe77c042a6edaf9f5fe5c75ac32c
Implements: blueprint add-support-for-vgpu
QEMU 2.6 and Libvirt 2.2.0 allow LUKS encrypted RAW files, block devices
and network devices (such as rbd) to be decrypted natively by QEMU. This
change enables the use of this feature within Nova when the appropriate
versions of QEMU and Libvirt are installed and the encryption provider
for the volume is of type 'luks'.
When these conditions are met a Libvirt secret is created when
connecting encrypted volumes to a compute host to hold the LUKS
passphrase used to unlock the volume. The presence of this Libvirt
secret is then used by the volume driver to generate the required
encryption XML for the disk. QEMU is then able to natively read from and
write to the encrypted disk, removing the need for the os-brick supplied
dm-crypt style encryptors, previously used to handle encrypted volumes.
When disconnecting a volume the presence of a Libvirt secret will result
in no attempt being made to detach an os-brick provided encryptor from
the volume. This will only occur when no secret for the volume is found
on the compute host. This will allow encrypted volumes attached prior to
this change to still be detached correctly.
Attempts to swap between volumes while using native QEMU decryption will
be blocked in the same manner as they are when using volumes that do not
provide a local block device. Both use cases still require additional
implementation work in Libvirt before being allowed within Nova.
LibvirtLiveMigrateData and LibvirtLiveMigrateBDMInfo are both extended
to support the following matrix of live migration (LM) scenarios:
- Pike using os-brick encryptors to Queens using os-brick encryptors
- Queens using os-brick encryptors to Queens using os-brick encryptors
- Queens using os-brick encryptors to Queens using native QEMU decrypt
- Queens using native QEMU decrypt to Queens using native QEMU decrypt
A new 'src_supports_native_luks' attribute has been added to
LibvirtLiveMigrateData in Queens to indicate that the source host is
capable of configuring QEMU LUKS decryption for a volume during LM.
The presence of this attribute in migrate_data during pre_live_migration
on the destination is then used to decide how encrypted volumes are
connected on that host ahead of LM starting.
When missing or False the os-brick encryptors will be attached, when
present and True the native QEMU decryption approach will be taken only
if the destination host also supports native LUKS decryption by QEMU.
The UUID of this secret is then stored in the individual
LibvirtLiveMigrateBDMInfo object associated with the volume and passed
back to the source host to be added to the volume configuration prior to
the start of the live migration.
Implements: blueprint libvirt-qemu-native-luks
Change-Id: Ibfa64f18bbd2fb70db7791330ed1a64fe61c1355
There is a lacking feature in libvirt (due to the nature of what is mediated
device as just a sysfs file) that mediated devices are not persisted.
When you reboot your host, you loose all of them which can be a big pain for
operators having allocated guests using vGPUs.
This change will iterate over all the instances, see if they have a nested
vGPU, check if the related mediated device exists, and if not, rebalance
between physical GPUs to find a proper one which can fit.
Note that due to the fact we don't persist neither in Nova the mediated device
information, mediated devices can be created on different physical devices that
they were before the reboot. That's not a big deal since we only support one
type at the moment, but that could become a problem later as we would need to
figure out which type the mediated device was before the reboot.
Partially-Implements: blueprint add-support-for-vgpu
Change-Id: Ie6c9108808a461359d717d8a9e9399c8a407bfe9
This change introduces new utility methods for attaching and detaching
frontend volume encryptors. These methods centralise the optional
fetching of encryption metadata associated with a volume, fetching of the
required encryptor and calls to detach or attach the encryptor.
These new utility methods are called either after initially connecting
to or before disconnecting from a volume. This ensures encryptors are
correctly connected when swapping volumes for example, where previously
no attempt was made to attach an encryptor to the target volume.
The request context is provided to swap_volume and various other config
generation related methods to allow for the lookup of the relevant
encryption metadata if it is not provided.
Closes-bug: #1739593
Change-Id: Ica323b87fa85a454fca9d46ada3677f18fe50022
If an allocation is asking for a VGPU, then libvirt will look at the
available mediated devices and call the sysfs to create one of them if
needed.
Please note I commented in the relnote all the caveats we currently have
with mediated devices in libvirt, but I'll provide some workarounds for
those in the next changes.
Change-Id: Ibf210dd27972fed2651d6c9bd73a0bcf352c8bab
Partially-Implements: blueprint add-support-for-vgpu
When we changed the default value of the
workarounds.disable_libvirt_livesnapshot config option value
to False in 980d0fcd75 earlier
in Queens, we were testing against the Pike UCA packages which
has libvirt 3.6.0 and qemu 2.10. Live snapshots of a paused
instance work with those package versions as shown by the
test_create_image_from_paused_server test in Tempest.
However, if you just use the Ubuntu 16.04 packages for libvirt
(1.3.1) and qemu (2.5), that test fails and the live snapshot hangs
on the paused instance.
This change adds PAUSED to a list of power states that aren't
valid for live snapshot. We can eventually remove this when we
require (or add a conditional check for) libvirt>=3.6.0 and
qemu>=2.10.
Change-Id: If6c4dd6890ad6e2d00b186c6a9aa85f507b354e0
Closes-Bug: #1741667
Add multiattach support to libvirt driver, by updating the
xml configuration info if the multiattach support is turned
on for a volume and set the virt driver capability
'support_multiattach' to true. This capability is set to false
for all the other drivers.
Also the attach function in nova/virt/block_device.py is updated
to call out to Cinder in case of each attachment request for
multiattach volumes, which is needed for Cinder in order to track
all attachments for a volume and be able to detach properly.
Co-Authored-By: Matt Riedemann <mriedem.os@gmail.com>
Partially-implements: blueprint multi-attach-volume
Change-Id: I947bf0ad34a48e9182a3dc016f47f0c9f71c9d7b
Nova is assuming 'host-model' for KVM/QEMU setup.
On AArch64 it results in "libvirtError: unsupported configuration: CPU
mode 'host-model' for aarch64 kvm domain on aarch64 host is not
supported by hypervisor" message.
AArch64 lacks 'host-model' support because neither libvirt nor QEMU
are able to tell what the host CPU model exactly is. And there is no
CPU description code for ARM(64) at this point.
So instead we fallback to 'host-passthrough' to get VM instances
running. This will completely break live migration, *unless* all the
Compute nodes (running libvirtd) have *identical* CPUs.
Small summary: https://marcin.juszkiewicz.com.pl/2018/01/04/today-i-was-fighting-with-nova-no-idea-who-won/
Closes-bug: #1741230
Co-authored-by: Kevin Zhao <Kevin.Zhao@arm.com>
Co-authored-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Change-Id: Iafb5f1790d68489db73b9f0549333108c6426a00
libvirt 3.4.0 supports providing number of available vGPUs by passing
in a specific PCI device XML the list of possible mediated device types
a specific physical GPU can support.
See https://libvirt.org/drvnodedev.html#MDEVCap for reference
Using that information to populate a new Resource Class with the
corresponding number of available instances for a configured type.
NOTE : Since a pGPU can support multiple types, we will only ask for Queens
to pass a single type in the configuration option and accumulate the total
number of possible vGPUs for that type by looping over all the physical
devices.
Change-Id: I6ac601e633ab2b0a67b4802ff880865255188a93
Partially-Implements: blueprint add-support-for-vgpu
There are two ways of booting VM on AArch64 architecture:
1. UEFI
2. kernel+initrd
No one sane goes for 2nd option so hw_firmware_type=uefi has to be set
for every image. Otherwise they simply hang. So let's set is as default
if no other value for hw_firmware_type is set.
If someone will implement own way then they can set hw_firmware_type to
own value and add support to Nova.
Closes-Bug: #1740824
Co-authored-by: Kevin Zhao <Kevin.Zhao@arm.com>
Change-Id: I70ad5ecb420b7d469854e8743e38ba27fd204747
Because of where this message was in the finally block, it
would log that we successfully extracted a snapshot even
if we failed, like if _live_snapshot fails because the guest
is deleted concurrently during the snapshot process.
This moves the info log message into the block of code where
we actually know that we successfully extracted the snapshot.
Change-Id: Ie12592c2daecbe764fa52dde6b0dafdbcafe102e
The last use of this was removed in change
Id188d48609f3d22d14e16c7f6114291d547a8986. It can now be removed.
Change-Id: I13de970c3beed29311d43991115a0c6d28ac14e0
_disconnect_volume was being passed disk_dev, which was is name of the
disk as presented to the gust. However, _disconnect_volume is only
concerned with unmounting the volume from the host, so this isn't
relevant.
A couple of volume drivers were using it for logging. We remove these
because it wasn't done consistently, is better done by the caller, and
isn't required as the information is available from other log
messages.
In the very minor associated debug log cleanup we also remove logging
of connection_info from one volume driver, which is potential security
bug.
We also do some associated cleanup in a few volume driver tests which
were assuming that disconnect_volume was being passed disk_info rather
than disk_dev, and that connection_info['data'] is related to
disk_info, which it isn't.
Change-Id: I61a0bee9e71e9a67f6a7c04a7bfd6e77fe818a77
_connect_volume was being passed disk_info as an argument. However,
the purpose of _connect_volume is to mount a volume on the compute
host, and disk_info only contains metadata about how the disk will be
presented to a guest. It's therefore not relevant, so it's not
surprising that no drivers are using it.
Change-Id: I843b5c46f9f93a30e7121259feff17a8170a2e48
If qemu guest agent is not reponsive when setting admin password,
an InternalError will be thrown by nova,
then the VM state will be set to ERROR.
Actually we did nothing to VMs as qga did not answer our request,
such as qga is crashed/halt, etc.
In this kind of senario, libvirt will throw VIR_ERR_AGENT_UNRESPONSIVE
to us by an error code.
We should check the error code return by libvirt and
throw a NotImplementedError, which we've already expected
in compute.manager.
Other cases like qga got our request
but failed to do its work should follow what we used to.
This patch checks the return code from libvirt.
If got VIR_ERR_AGENT_UNRESPONSIVE,
then throws NotImplementedError.
Change-Id: I4603711c435e15593e4a979906a12b560737bd77
Signed-off-by: Chen Hanxiao <chenhx@certusnet.com.cn>
Prior to microversion 2.25, the migration api supported a
'disk_over_commit' parameter, which indicated that we should do disk
size check on the destination host taking into account disk
over-commit. Versions since 2.25 no longer have this parameter, but we
continue to support the older microversion.
This disk size check was broken when disk over commit was in use on
the host. In LibvirtDriver._assert_dest_node_has_enough_disk() we
correctly calculate the required space using allocated disk size
rather than virtual disk size when doing an over-committed check.
However, we were checking against 'disk_available_least' as reported
by the destination host. This value is: the amount of disk space which
the host would have if all instances fully allocated their storage. On
an over-committed host it will therefore be negative, despite there
actually being space on the destination host.
The check we actually want to do for disk over commit is: does the
target host have enough free space to take the allocated size of this
instance's disks? We leave checking over-allocation ratios to the
scheduler. Note that if we use disk_available_least here and the
destination host is over-allocated, this will always fail because free
space will be negative, even though we're explicitly ok with that.
Using disk_available_least would make sense for the non-overcommit
case, where the test would be: would the target host have enough free
space for this instance if the instance fully allocated all its
storage, and everything else on the target also fully allocated its
storage? As noted, we no longer actually run that test, though.
We fix the issue for legacy microversions by fixing the destination
host's reported disk space according to the given disk_over_commit
parameter.
Change-Id: I8a705114d47384fcd00955d4a4f204072fed57c2
Resolves-bug: #1708572
If /var/lib/nova/instances is mounted on a filesystem like tmpfs that
doesn't have support for O_DIRECT, "qemu-img convert" currently crashes
because it's unconditionally using the "-t none" flag.
This patch therefore:
- moves the _supports_direct_io() function out of the libvirt driver,
from nova/virt/libvirt/driver.py to nova/utils.py and makes it public.
- uses that function to decide to use -t none or -t writethrough when
converting images with qemu-img.
Closes-Bug: #1734784
Co-Authored-By: melanie witt <melwittt@gmail.com>
Change-Id: Ifb47de00abf3f83442ca5264fbc24885df924a19
When a user calls the volume-update API, we swap_volume in the libvirt
driver from the old volume attachment to the new volume attachment.
Currently, we're saving the domain XML with the old configuration prior
to updating the volume and upon a soft-reboot request, it results in an
error:
Instance soft reboot failed: Cannot access storage file <old path>
and falls back to a hard reboot, which is like pulling the power cord,
possibly resulting in file system inconsistencies.
This changes to saving the new, updated domain XML after the volume
swap.
Closes-Bug: #1713857
Change-Id: I166cde5ad8b00699e4ec02609f0d7b69236d855d
We call _hard_reboot during reboot, power_on, and
resume_state_on_host_boot. It functions essentially by tearing as much
of an instance as possible before recreating it, which additionally
makes it useful to operators for attempting automated recovery of
instances in an inconsistent state.
The Libvirt driver would previously only call _destroy and
_undefine_domain when hard rebooting an instance. This would leave vifs
plugged, volumes connected, and encryptors attached on the host. It
also means that when we try to restart the instance, we assume all
these things are correctly configured. If they are not, the instance
may fail to start at all, or may be incorrectly configured when
starting.
For example, consider an instance with an encrypted volume after a
compute host reboot. When we attempt to start the instance, power_on
will call _hard_reboot. The volume will be coincidentally re-attached
as a side-effect of calling _get_guest_xml(!), but when we call
_create_domain_and_network we pass reboot=True, which tells it not to
reattach the encryptor, as it is assumed to be already attached. We
are therefore left presenting the encrypted volume data directly to
the instance without decryption.
The approach in this patch is to ensure we recreate the instance as
fully as possible during hard reboot. This means not passing
vifs_already_plugged and reboot to _create_domain_and_network, which
in turn requires that we fully destroy the instance first. This
addresses the specific problem given in the example, but also a whole
class of potential volume and vif related issues of inconsistent
state.
Because we now always tear down volumes, encryptors, and vifs, we are
relying on the tear down of these things to be idempotent. This
highlighted that detach of the luks and cryptsetup encryptors were not
idempotent. We depend on the fixes for those os-brick drivers.
Depends-On: I31d72357c89db53a147c2d986a28c9c6870efad0
Depends-On: I9f52f89b8466d03699cfd5c0e32c672c934cd6fb
Closes-bug: #1724573
Change-Id: Id188d48609f3d22d14e16c7f6114291d547a8986