This reverts commit e06ad602f3.
This gets us back to Ib0cf5d55750f13d0499a570f14024dca551ed4d4
which was meant to address an issue introduced
by Id188d48609f3d22d14e16c7f6114291d547a8986.
So we essentially had three changes:
1. Hard reboot would blow away volumes and vifs and then wait for the
vifs to be plugged; this caused a problem for some vif types (
linuxbridge was reported) because the event never came and we
timed out.
2. To workaround that, a second change was made to simply not wait for
vif plugging events.
3. Since #2 was a bit heavy-handed for a problem that didn't impact
openvswitch, another change was made to only wait for non-bridge vif
types, so we'd wait for OVS.
But it turns out that opendaylight is an OVS vif type and doesn't send
events for plugging the vif, only for binding the port (and we don't
re-bind the port during reboot). There is also a report of this being a
problem for other types of ports, see
If209f77cff2de00f694b01b2507c633ec3882c82.
So rather than try to special-case every possible vif type that could
be impacted by this, we are simply reverting the change so we no longer
wait for vif plugged events during hard reboot.
Note that if we went back to Id188d48609f3d22d14e16c7f6114291d547a8986
and tweaked that to not unplug/plug the vifs we wouldn't have this
problem either, and that change was really meant to deal with an
encrypted volume issue on reboot. But changing that logic is out of the
scope of this change. Alternatively, we could re-bind the port during
reboot but that could have other implications, or neutron could put
something into the port details telling us which vifs will send events
and which won't, but again that's all outside of the scope of this
patch.
Change-Id: Ib3f10706a7191c58909ec1938042ce338df4d499
Closes-Bug: #1755890
If an instance is deleted before it is scheduled, the BDM
clean-up code uses the mappings from the build request as
they don't exist in the database yet.
When using the older attachment flow with reserve_volume,
there is no attachment_id bound to the block device mapping
and because it is not loaded from database but rather from
the build request, accessing the attachment_id field raises
an exception with 'attachment_id not lazy-loadable'
If we did a new style attach, _validate_bdm will add the
attachment_id from Cinder. If we did not, then this patch
will make sure to set it to 'None' to avoid raising an
exception when checking if we have an attachment_id set in
the BDM clean-up code
Conflicts:
nova/tests/functional/wsgi/test_servers.py
Change-Id: I3cc775fc7dafe691b97a15e50ae2e93c92f355be
Closes-Bug: #1750666
(cherry picked from commit 16c2c8b3ee)
Detach volumes when deleting a BFV server pre-scheduling
If the user creates a volume-backed server from an existing
volume, the API reserves the volume by creating an attachment
against it. This puts the volume into 'attaching' status.
If the user then deletes the server before it's created in a
cell, by deleting the build request, the attached volume is
orphaned and requires admin intervention in the block storage
service.
This change simply pulls the BDMs off the BuildRequest when
we delete the server via the build request and does the same
local cleanup of those volumes as we would in a "normal" local
delete scenario that the instance was created in a cell but
doesn't have a host.
We don't have to worry about ports in this scenario since
ports are created on the compute, in a cell, and if we're
deleting a build request then we never got far enough to
create ports.
Conflicts:
nova/tests/functional/wsgi/test_servers.py
Change-Id: I1a576bdb16befabe06a9728d7adf008fc0667077
Partial-Bug: #1404867
(cherry picked from commit 0652e4ab3d)
When creating a new instance and deleting it before it gets scheduled
with the old attachment flow (reserve_volume), the block device mappings
are not persisted to database which means that the clean up fails
because it tries to lookup attachment_id which cannot be lazy loaded.
This patch adds a (failing) functional test to check for this issue
which will be addressed in a follow-up patch.
Conflicts:
nova/tests/functional/wsgi/test_servers.py
Related-Bug: #1750666
Change-Id: I294c54e5a22dd6e5b226a4b00e7cd116813f0704
(cherry picked from commit 3120627d98)
Usually, when instance.host = None, it means the instance was never
scheduled. However, the exception handling routine in compute manager
[1] will set instance.host = None and set instance.vm_state = ERROR
if the instance fails to build on the compute host. If that happens, we
end up with an instance with host = None and vm_state = ERROR which may
have ports and volumes still allocated.
This adds some logic around deleting the instance when it may have
ports or volumes allocated.
1. If the instance is not in ERROR or SHELVED_OFFLOADED state, we
expect instance.host to be set to a compute host. So, if we find
instance.host = None in states other than ERROR or
SHELVED_OFFLOADED, we consider the instance to have failed
scheduling and not require ports or volumes to be freed, and we
simply destroy the instance database record and return. This is
the "delete while booting" scenario.
2. If the instance is in ERROR because of a failed build or is
SHELVED_OFFLOADED, we expect instance.host to be None even though
there could be ports or volumes allocated. In this case, run the
_local_delete routine to clean up ports and volumes and delete the
instance database record.
Co-Authored-By: Ankit Agrawal <ankit11.agrawal@nttdata.com>
Co-Authored-By: Samuel Matzek <smatzek@us.ibm.com>
Co-Authored-By: melanie witt <melwittt@gmail.com>
Closes-Bug: 1404867
Closes-Bug: 1408527
Conflicts:
nova/tests/unit/compute/test_compute_api.py
[1] https://github.com/openstack/nova/blob/55ea961/nova/compute/manager.py#L1927-L1929
Change-Id: I4dc6c8bd3bb6c135f8a698af41f5d0e026c39117
(cherry picked from commit b3f39244a3)
In certain cases, such as when an instance fails to be scheduled,
the volume may already have an attachment created (or the volume
has been reserved in the old flows).
This patch adds a test to check that these volume attachments
are deleted and removed once the instance has been deleted. It
also adds some functionality to allow checking when an volume
has been reserved in the Cinder fixtures.
This backported patch drops the tests for the new-attach flow
for Cinder as it does not exist in Pike.
Change-Id: I85cc3998fbcde30eefa5429913ca287246d51255
Related-Bug: #1404867
(cherry picked from commit 20edeb3623)
If an instance fails to get scheduled, it gets buried in cell0 but
none of it's block device mappings are stored. At the API layer,
Nova reserves and creates attachments for new instances when
it gets a create request so these attachments are orphaned if the
block device mappings are not registered in the database somewhere.
This patch makes sure that if an instance is being buried in cell0,
all of it's block device mappings are recorded as well so they can
be later removed when the instance is deleted.
Conflicts:
nova/conductor/manager.py
Change-Id: I64074923fb741fbf5459f66b8ab1a23c16f3303f
Related-Bug: #1404867
(cherry picked from commit ad9e2a568f)
There is an existing loop which sets the proper value for status
and attach_status right above, so this is doing nothing other than
changing it to the incorrect value.
Conflicts:
nova/tests/fixtures.py
Co-Authored-By: Mohammed Naser <mnaser@vexxhost.com>
Change-Id: Iea0c1ea0a699b9519f66977391202956f17aac66
(cherry picked from commit 8cd64670ea)
If we're doing a lazy-load of a generic attribute on instance, we
should be using read_deleted=yes. Otherwise we just fail in the load
process which is confusing and not helpful to a cleanup routine that
needs to handle the deleted instance. This makes us load those things
with read_deleted=yes.
Change-Id: Ide6cc5bb1fce2c9aea9fa3efdf940e8308cd9ed0
Closes-Bug: #1745977
(cherry picked from commit 6ba8a35825)
(cherry picked from commit 619754f5c8)
The bug report says all:
"There is a mismatch configuration for placement.
In the controller configuration, the guide suggests endpoints creation
pointing to port 8778, however in the default file provided in SLES 12
SP3, the port used is 8780."
Fix documentation to match sample file.
Closes-Bug: #1741329
(cherry picked from commit 0f8cdc606f)
Change-Id: Ib4c881058b9b90ba136ff223064c113e63f98379
A swap on a stopped or suspended instance will fail silently. Remove
these allowed instance states on swap_volume:
suspended, stopped, soft_deleted
Change-Id: Iff17f7cee7a56037b35d1a361a0b3279d0a885d6
Closes-Bug: #1673090
(cherry picked from commit b40d949b31)
This was broken during the openstack admin guide docs migration
in pike.
Change-Id: Ibb886657ed97f3e6462ceef0002ef3fe1aec767d
Closes-Bug: #1748327
(cherry picked from commit b516c48fdf)
In change Ib0cf5d55750f13d0499a570f14024dca551ed4d4, we stopped waiting
for vif plug events during hard reboot and start, because the behavior
of neutron plug events depended on the vif type and we couldn't rely on
the stale network info cache.
This refines the logic not to wait for vif plug events only for the
bridge vif type, instead of for all types. We also add a flag to
_create_domain_and_network to indicate that we're in the middle of a
reboot and to expect to wait for plug events for all vif types except
the bridge vif type, regardless of the vif's 'active' status.
We only query network_info from neutron at the beginning of a reboot,
before we've unplugged anything, so the majority of the time, the vifs
will be active=True. The current logic in get_neutron_events will only
expect events for vifs with 'active' status False. This adds a way to
override that if we know we've already unplugged vifs as part of a
reboot.
Conflicts:
nova/virt/libvirt/driver.py
NOTE(lyarwood): Ica323b87fa85a454fca9d46ada3677f18fe50022 and
I13de970c3beed29311d43991115a0c6d28ac14e0 are the source of the above
conflicts in driver.py. The first removed encryptor attach logic from
_create_domain_and_network in Queens while the second altered the
signature of _create_domain_and_network in Queens, removing reboot that
is then reintroduced with this change.
Related-Bug: #1744361
Change-Id: Ib08afad3822f2ca95cfeea18d7f4fc4cb407b4d6
(cherry picked from commit aaf37a26d6)
(cherry picked from commit 5a10047f9d)
The last sentence here where it links to "Manage Flavors"
is the wrong link. It goes here:
https://docs.openstack.org/nova/latest/admin/flavors.html which
doesn't talk about NUMA extra specs. It should be pointing at
the "NUMA topology" section of the flavor extra specs page:
https://docs.openstack.org/nova/latest/user/flavors.html#extra-specs-numa-topology
Conflicts:
doc/source/user/flavors.rst
NOTE(mriedem): The conflict is due to a change in Queens
Ia57c93ef1e72ccf134ba6fc7fcb85ab228d68a47 which refactored
where the flavor docs live.
Change-Id: I30f6bc70afc5be00737cdf76e0e47bcb898a3a7f
Closes-Bug: #1747562
(cherry picked from commit 26de90a14d)
During live migration disk devices are updated with the latest
block device mapping information for volumes. Previously this
relied on libvirt to assign addresses in order after the already
assigned devices like the root disk had been accounted for. In
the latest libvirt the unassigned devices are allocated first which
makes the root disk address double allocated causing the migration to
fail. A running instance should never have the hardware addresses
of its disks changed mid flight. While disk address changes during
live migration produce fatal errors for the operator it would likely
cause errors inside the instance and unexpected behavior if the device
addresses change during cold migrationt review. With this disk addresses are no
longer updated with block device mapping information while every
other element of the disk definition for a volume is updated.
Closes-Bug: 1715569
Change-Id: I17af9848f4c0edcbcb101b30e45ca4afa93dcdbb
(cherry picked from commit b196857f04)
This should say that the maximum compute API microversion
in *Pike* is 2.53, not Ocata.
This change is only made on the stable/pike branch so
as to not cause issues with it showing up again as a new
release note.
Change-Id: I415504f5ba669cea544f0144ea529b88094be741
We call _validate_bdm during instance creation to validate block device
mappings boot indexes, accessibility, attachability, and so on. We need
to query the service version in order to decide which Cinder APIs to
call and because we're in the middle of creating the instance, we
don't yet know which cell it's going to land in.
This changes the service version query to check all cells so that
_validate_bdm will use the 'reserve_volume' Cinder API in a multi-cell
environment. Use of the 'reserve_volume' API is based on the service
version check and without targeting any cells, the service version will
be 0 and we'll use the old 'check_attach' API.
Conflicts:
nova/tests/unit/compute/test_compute_api.py
NOTE(mriedem): Conflicts are due to not having change
Ifc01dbf98545104c998ab96f65ff8623a6db0f28 in Pike which added
a test and updated some other tests which we now have to do
in this change.
Closes-Bug: #1746634
Change-Id: I68d5398d2a6d85c833e46ce682672008dbd5c4c1
(cherry picked from commit 0258cecaca)