The QoS minimum bandwidth feature will have a separate doc from the
generic QoS neutron doc. This patch updates the links in the release
notes and api version history of the 2.72 microversion
blueprint: bandwidth-resource-provider
Depends-On: https://review.openstack.org/#/c/640390
Change-Id: Ic753112cf73cb10a6e377bc24c6ee51a057c69f8
Nova will leak minimum bandwidth resources in placement if a user
deletes a bound port from Neutron out-of-band. This adds a note about
how users can work around the issue.
Related-Bug: #1820588
Change-Id: I41f42c1a7595d9e6a73d1261bf1ac1d47ddadcdf
tl;dr: Use 'writeback' instead of 'writethrough' as the cache mode of
the target image for `qemu-img convert`. Two reasons: (a) if the image
conversion completes succesfully, then 'writeback' calls fsync() to
safely write data to the physical disk; and (b) 'writeback' makes the
image conversion a _lot_ faster.
Back-of-the-envelope "benchmark" (on an SSD)
--------------------------------------------
(Ran both the tests thrice each; version: qemu-img-2.11.0)
With 'writethrough':
$> time (qemu-img convert -t writethrough -f qcow2 -O raw \
Fedora-Cloud-Base-29.qcow2 Fedora-Cloud-Base-29.raw)
real 1m43.470s
user 0m8.310s
sys 0m3.661s
With 'writeback':
$> time (qemu-img convert -t writeback -f qcow2 -O raw \
Fedora-Cloud-Base-29.qcow2 5-Fedora-Cloud-Base-29.raw)
real 0m7.390s
user 0m5.179s
sys 0m1.780s
I.e. ~103 seconds of elapsed wall-clock time for 'writethrough' vs. ~7
seconds for 'writeback' -- IOW, 'writeback' is nearly _15_ times faster!
Details
-------
Nova commit e6ce9557f84cdcdf4ffdd12ce73a008c96c7b94a ("qemu-img do not
use cache=none if no O_DIRECT support") was introduced to make instances
boot on filesystems that don't support 'O_DIRECT' (which bypasses the
host page cache and flushes data directly to the disk), such as 'tmpfs'.
In doing so it introduced the 'writethrough' cache for the target image
for `qemu-img convert`.
This patch proposes to change that to 'writeback'.
Let's addresses the 'safety' concern:
"What about data integrity in the event of a host crash (especially
on shared file systems such as NFS)?"
Answer: If the host crashes mid-way during image conversion, then
neither "data integrity" nor the cache mode in use matters. But if the
image conversion completes _succesfully_, then 'writeback' will safely
write the data to the physical disk, just as 'writethough' does.
So we are as safe as we can, but with the extra benefit of image
conversion being _much_ faster.
* * *
The `qemu-img convert` command defaults to 'cache=writeback' for the
source image. And 'cache=unsafe' for the target, because if `qemu-img`
"crashes during the conversion, the user will throw away the broken
output file anyway and start over"[1]. And `qemu-img convert`
supports[2] fsync() for the target image since QEMU 1.1 (2012).
[1] https://git.qemu.org/?p=qemu.git;a=commitdiff;h=1bd8e175
-- "qemu-img convert: Use cache=unsafe for output image"
[2] https://git.qemu.org/?p=qemu.git;a=commitdiff;h=80ccf93b
-- "qemu-img: let 'qemu-img convert' flush data"
Closes-Bug: #1818847
Change-Id: I574be2b629aaff23556e25f8db0d740105be6f07
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
Looks-good-to-me'd-by: Kevin Wolf <kwolf@redhat.com>
Via change [1], ironicclient began to use endpoint_filter in the
version negotiation code path, whereas it was previously unused if a
fully-qualified endpoint had already been determined. Suddenly it was
important that the `interface` part of this endpoint_filter be correct.
Prior to ironicclient change [2], there was no way to pass an
appropriate `interface` value through ironicclient's initialization, so
the ironicclient used from nova would always end up with the default
value, `public`, in the endpoint_filter. This would break in clouds
lacking a public ironic API endpoint (see the referenced bug).
With this change, we pass the value of the (standard, per ksa)
`valid_interfaces` ironic config option into the ironicclient
initialization, where (if and only if the ironicclient fix [2] is also
present) it eventually gets passed through to the ksa Adapter
initialization (which is set up to accept values from exactly that conf
option) to wind up in the endpoint_filter.
The effect is that nova's ironicclient will actually be using the
interface from nova.conf throughout. (Because `valid_interfaces` is also
used in recommended configuration setups - i.e. those that use the
service catalog to determine API endpoints - to construct the
endpoint_override used to initialize the ironicclient, the value used
during version negotiation should be in sync with that used for regular
API calls.)
[1] I42b66daea1f4397273a3f4eb1638abafb3bb28ce
[2] I610836e5038774621690aca88b2aee25670f0262
Change-Id: I5f78d21c39ed2fd58d2a0f3649116e39883d5a2c
closes-bug: 1818295
Implement support for extending RBD attached volumes using the libvirt
network volume driver.
This adds a new parameter "requested_size" to the extend_volume method.
This is necessary because the new volume size can not be detected by
libvirt for network volumes. All other volume types currently
implementing the extend_volume call have a block device on the
hypervisor which needs to be updated and can be polled for it's new
size. For network volumes no such block device exists.
Alternatively this could be implemented without a new parameter by
calling into Ceph using os_brick to get the new size of the volume.
This would make the LibvirtNetVolumeDriver Ceph specific.
This also extends the logic to get the device_path for extending volumes
in the libvirt driver. This is necessary as network volumes don't have
the device path in the connection_info. The device_path is retrieved by
matching the connection_info serial (= volume UUID) against all guest
disks.
Co-Authored-By: Jose Castro Leon <jose.castro.leon@cern.ch>
Blueprint: extend-in-use-rbd-volumes
Change-Id: I5698e451861828a8b1240d046d1610d8d37ca5a2
Validate the combination of the flavor extra-specs and image properties
as early as possible once they're both known (since you can specify
mutually-incompatible changes in the two places). If validation failed
then synchronously return error to user. We need to do this anywhere
the flavor or image changes, so basically instance creation, rebuild,
and resize.
- Rename _check_requested_image() to _validate_flavor_image() and add
a call from the resize code path. (It's already called for create
and rebuild.)
- In _validate_flavor_image() add new checks to validate numa related
options from flavor and image including CPU policy, CPU thread
policy, CPU topology, memory topology, hugepages, CPU pinning,
serial ports, realtime mask, etc.
- Added new 400 exceptions in Server API correspondent to added
validations.
blueprint: flavor-extra-spec-image-property-validation
Change-Id: I06fad233006c7bab14749a51ffa226c3801f951b
Signed-off-by: Jack Ding <jack.ding@windriver.com>
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
A new API microversion, 2.72, is added that enables support for Neutron
ports having resource request during server create.
Note that server delete and port detach operations already handle such
ports and will clean up the allocation properly.
Change-Id: I7555914473e16782d8ba4513a09ce45679962c14
blueprint: bandwidth-resource-provider
This reverts commit 525631d8dc058910728e55def616358b0e7f2f69.
This change could potentially leave us with an empty file in the image
cache, which would result in failure to spawn for any subsequent
instance using the same image and image cache. This situation can only
be recovered by manually deleting the file from the image cache. Until
we can determine the root cause we should do something safer, like
ignoring the utime failure.
Change-Id: I55c072937d648d7840d01d31ed781bf93cfd94ab
There is a race condition that occurs over NFS when multiple instances
are being created where utime fails, due to some other process
modifying the file path. This patch ensures the path is created and
is readable before attempting to modify with utime.
Closes-Bug: 1809123
Change-Id: Id68aa27a8ab08d9c00655e5ed6b48d194aa8e6f6
Signed-off-by: Tim Rozet <trozet@redhat.com>
Update aggregate and update aggregate metadata API calls have the
ability to update availability zone name for the aggregate. If the
aggregate is not empty (has hosts with instances on it)
the update leads to discrepancy for objects saving availability zone as a
string but not reference.
From devstack DB they are:
- cinder.backups.availability_zone
- cinder.consistencygroups.availability_zone
- cinder.groups.availability_zone
- cinder.services.availability_zone
- cinder.volumes.availability_zone
- neutron.agents.availability_zone
- neutron.networks.availability_zone_hints
- neutron.router_extra_attributes.availability_zone_hints
- nova.dns_domains.availability_zone
- nova.instances.availability_zone
- nova.volume_usage_cache.availability_zone
- nova.shadow_dns_domains.availability_zone
- nova.shadow_instances.availability_zone
- nova.shadow_volume_usage_cache.availability_zone
Why that's bad?
First, API and Horizon show different values for host and instance for
example. Second, migration for instances with changed availability
zone fails with "No valid host found" for old AZ.
This change adds an additional check to aggregate an Update Aggregate API call.
With the check, it's not possible to rename AZ if the corresponding
aggregate has instances in any hosts.
PUT /os-aggregates/{aggregate_id} and
POST /os-aggregates/{aggregate_id}/action return HTTP 400 for
availability zone renaming if the hosts of the aggregate have any instances.
It's similar to conflicting AZ names error already available.
Change-Id: Ic27195e46502067c87ee9c71a811a3ca3f610b73
Closes-Bug: #1378904
We return cached data to sync_power_state to avoid pummeling the ironic
API. However, this can lead to a race condition where an instance is
powered on, but nova thinks it should be off and calls stop(). Check
again without the cache when this happens to make sure we don't
unnecessarily kill an instance.
Closes-Bug: #1815791
Change-Id: I907b69eb689cf6c169a4869cfc7889308ca419d5
This builds on the ProviderTree work in the compute driver and
resource tracker to take the supported capabilities from a driver and
turn those into standard traits on the compute node resource provider.
This is a simple way to expose in a REST API (Placement in this case)
what a compute node, via its driver, supports.
This is also something easy that we can do in lieu of a full-blown
compute capabilities REST API in nova, which we've talked about for
years but never actually done anything about.
We can later build on this to add a request filter which will mark
certain types of boot-from-volume requests as requiring specific
capabilities, like for volume multiattach and tagged devices.
Any traits provided by the driver will be automatically added during
startup or a periodic update of a compute node:
https://pasteboard.co/I3iqqNm.jpg
Similarly any traits later retracted by the driver will be
automatically removed.
However any traits associated with capabilities which are
inappropriately added to or removed from the resource provider by the
admin via the Placement API will not be reverted until the compute
service's provider cache is reset.
The new call graph is shown in this sequence diagram:
https://pasteboard.co/I25qICd.png
Co-Authored-By: Adam Spiers <aspiers@suse.com>
Related to blueprint placement-req-filter
Related to blueprint expose-host-capabilities
Change-Id: I15364d37fb7426f4eec00ca4eaf99bec50e964b6
The server-groups UUID add to response of 'GET /servers/{id}',
'PUT /servers/{server_id}" and rebuild API
'POST /servers/{server_id}/action'.
Change-Id: I4a2a584df56ece7beb8b12c0ce9b0e6b30237120
Implements: blueprint show-server-group
Co-authored-by: Gerry Kopec <Gerry.Kopec@windriver.com>
Signed-off-by: Yongli He <yongli.he@intel.com>
This was added to handle gate issues seen with libvirt 1.2.2. We haven't
supported that version of libvirt for some time and we don't enable this
in the gate anymore. Deprecate it and remove unnecessary references to
it from tests and the support FAQ document.
Change-Id: Ie3fa537a42d208a35467f03bd2110c2976927477
This uses ironic’s conductor group feature to limit the subset of nodes
which a nova-compute service will manage. This allows for partitioning
nova-compute services to a particular location (building, aisle, rack,
etc), and provides a way for operators to manage the failure domain of a
given nova-compute service.
This adds two config options to the [ironic] config group:
* partition_key: the key to match with the conductor_group property on
ironic nodes to limit the subset of nodes which can be managed.
* peer_list: a list of other compute service hostnames which manage the
same partition_key, used when building the hash ring.
Change-Id: I1b184ff37948dc403fe38874613cd4d870c644fd
Implements: blueprint ironic-conductor-groups
This implements the reshaper routine for the libvirt driver
to detect and move, if necessary, VGPU inventory and allocations
from the root compute node provider to a child provider of
VGPU resources. The reshape will be performed on first start
of nova-compute with this code.
For a fresh compute node deploy, no reshaping will be necessary
and the VGPU inventory will start on the child provider.
Part of blueprint reshape-provider-tree
Part of blueprint vgpu-stein
Co-Authored-By: Sylvain Bauza <sbauza@free.fr>
Change-Id: I511d26dc6487fadfcf22ba747abd385068e975a4
This is *mostly* used for nova-network and therefore can be deprecated.
There is one corner case where we could conceivably use this with
neutron and the libvirt driver, but it's only ever going to work with
with Linux bridge, is probably broken and should just be removed in
favour of neutron variants of this functionality. A note is included
detailing this, just in case people do want to spend the time getting to
the bottom of this, but I wouldn't recommend that.
The help text for this option is improved based on information I found
while researching the option and mostly taken from the commit message
for commit 8f1c54ce.
Change-Id: I33607453b3174192a33d9d56e203227bc9237f31
The 'os_compute_api:flavors' policy has been deprecated
since 16.0.0 Pike.
Remove the 'os_compute_api:flavors' policy.
Change-Id: I771b6f641d25d6b27076cf36dd8552df50b7ccd3
Remove the 'os_compute_api:os-server-groups' policy.
The 'os_compute_api:os-server-groups' policy has been deprecated
since 16.0.0 Pike.
Change-Id: If84e14f0c00db3306e1756553e69b989dcf1373e
This change adds a new microversion to expose virtual
device tags for volumes and ports attached to a server.
Implements blueprint expose-virtual-device-tags-in-rest-api
Change-Id: I09420ff7134874dfe4dc399931c7740e81ecc2d0
This patch adds the documentation around the work regarding
handling down cells that was introduced in v2.69.
Related to blueprint handling-down-cell
Change-Id: I78ed924a802307a992ff90e61ae7ff07c2cc39d1