The stats module is used to decide if the InstancePCIRequests of a boot
request can fit to a given compute host and to decide which PCI device
pool can fulfill the requests. It is used both during scheduling and also
during the PCI claim code.
PCI devices now modelled in placement and the allocation_candidate query
now requests PCI resources therefore each allocation candidate returned
from placement already restricts which PCI devices can be used during
the PciPassthroughFilter, the NumaTopologyFilter, and PCI claim code
paths. This patch adapts the stats module to consider the PCI
allocation candidate or the already made placement PCI allocation when
filtering the PCI device pools.
blueprint: pci-device-tracking-in-placement
Change-Id: If363981c4aeeb09a96ee94b140070d3d0f6af48f
This patch extends the HostState object with an allocation_candidates
list populated by the scheduler manager. Also this changes the generic
scheduler logic to allocate the candidate of the selected host based on
the candidates in the host state.
So after this patch scheduler filters can be extended to filter the
allocation_candidates list of the HostState object while processing a
host and restrict which candidate can be allocated if the host passes
the all the filters. Potentially all candidates can be removed by
multiple consecutive filters making the host as a non viable scheduling
target.
blueprint: pci-device-tracking-in-placement
Change-Id: Id0afff271d345a94aa83fc886e9c3231c3ff2570
Fixed various small issues from the already merged (or being merged)
patches.
The logic behind the dropped FIXME and empty condition are handled in
the update_allocations() calls of the translator already.
The removal of excutils.save_and_reraise_exception from the report
client is safe as self._clear_provider_cache_for_tree does not raise.
blueprint: pci-device-tracking-in-placement
Change-Id: If87dedc6a14f7b116c4238e7534b67574428c01c
* Add iommu model trait for viommu model
* Add ``hw_viommu_model`` to request_filter, this will extend the
transform_image_metadata prefilter to select host with the correct model.
* Provide new compute ``COMPUTE_VIOMMU_MODEL_*`` capablity trait for each model
it supports in driver.
Implements: blueprint libvirt-viommu-device
Depends-On: https://review.opendev.org/c/openstack/os-traits/+/844336
Change-Id: I2caa1a9c473a2b11c287061280b4a78edb96f859
PCI devices which are allocated to instances can be removed from the
[pci]device_spec configuration or can be removed from the hypervisor
directly. The existing PciTracker code handle this cases by keeping the
PciDevice in the nova DB exists and allocated and issue a warning in the
logs during the compute service startup that nova is in an inconsistent
state. Similar behavior is now added to the PCI placement tracking code
so the PCI inventories and allocations in placement is kept in such
situation.
There is one case where we cannot simply accept the PCI device
reconfiguration by keeping the existing allocations and applying the new
config. It is when a PF that is configured and allocated is removed and
VFs from this PF is now configured in the [pci]device_spec. And vice
versa when VFs are removed and its parent PF is configured. In this case
keeping the existing inventory and allocations and adding the new inventory
to placement would result in placement model where a single PCI device
would provide both PF and VF inventories. This dependent device
configuration is not supported as it could lead to double consumption.
In such situation the compute service will refuse to start.
blueprint: pci-device-tracking-in-placement
Change-Id: Id130893de650cc2d38953cea7cf9f53af71ced93
During a normal update_available_resources run if the local provider
tree caches is invalid (i.e. due to the scheduler made an allocation
bumping the generation of the RPs) and the virt driver try to update the
inventory of an RP based on the cache Placement will report conflict,
the report client will invalidate the caches and the retry decorator
on ResourceTracker._update_to_placement will re-drive the top of the
fresh RP data.
However the same thing can happen during reshape as well but the retry
mechanism is missing in that code path so the stale caches can cause
reshape failures.
This patch adds specific error handling in the reshape code path to
implement the same retry mechanism as exists for inventory update.
blueprint: pci-device-tracking-in-placement
Change-Id: Ieb954a04e6aba827611765f7f401124a1fe298f3
We have many places where we implement singleton behavior for the
placement client. This unifies them into a single place and
implementation. Not only does this DRY things up, but may cause us
to initialize it fewer times and also allows for emitting a common
set of error messages about expected failures for better
troubleshooting.
Change-Id: Iab8a791f64323f996e1d6e6d5a7e7a7c34eb4fb3
Related-Bug: #1846820
This change removes return statement from rpc cast method calls.
As rpc cast are asynchronous, so doesn't return anything.
Change-Id: I766f64f2c086dd652bc28b338320cc94ccc48f1f
host arch in libvirt driver support
This is split 2 of 3 for the architecture emulation feature.
This implements emulated multi-architecture support through qemu
within OpenStack Nova.
Additional config variable check to pull host architecture into
hw_architecture field for emulation checks to be made.
Adds a custom function that simply performs a check for
hw_emulation_architecture field being set, allowing for core code to
function as normal while enabling a simple check to enable emulated
architectures to follow the same path as all multi-arch support
already established for physical nodes but instead levaraging qemu
which allows for the overall emulation.
Added check for domain xml unit test to strip arch from the os tag,
as it is not required uefi checks, and only leveraged for emulation
checks.
Added additional test cases test_driver validating emulation
functionality with checking hw_emulation_architecture against the
os_arch/hw_architecture field. Added required os-traits and settings
for scheduler request_filter.
Added RISCV64 to architecture enum for better support in driver.
Implements: blueprint pick-guest-arch-based-on-host-arch-in-libvirt-driver
Closes-Bug: 1863728
Change-Id: Ia070a29186c6123cf51e1b17373c2dc69676ae7c
Signed-off-by: Jonathan Race <jrace@augusta.edu>
A follow on patch will use this code to enforce the limits, this patch
provides integration with oslo.limit and a new internal nova API that is
able to enforce those limits.
The first part is providing a callback for oslo.limit to be able to count
the resources being used. We only count resources grouped by project_id.
For counting servers, we make use of the instance mappings list in the
api database, just as the existing quota code does. While we do check to
ensure the queued for delete migration has been completed, we simply
error out if that is not the case, rather than attempting to fallback to
any other counting system. We hope one day we can count this in
placement using consumer records, or similar.
For counting all other resource usage, they must refer to some usage
relating to a resource class being consumed in placement. This is similar
to how the count with placement variant of the existing placement code
works today. This is not restricted to RAM and VCPU, it is open to any
resource class that is known to placement.
The second part is the enforcement method, that keeps a similar
signature to the existing enforce_num_instnaces call that is use to
check quotas using the legacy quota system.
From the flavor we extract the current resource usage. This is
considered the simplest first step that helps us deliver Ironic limits
alongside all the existing RAM and VCPU limits. At a later date, we
would ideally get passed a more complete view of what resources are
being requested from placement.
NOTE: given the instance object doesn't exist when enforce is called, we
can't just pass the instance into here.
A [workarounds] option is also available for operators who need the
legacy quota usage behavior where VCPU = VCPU + PCPU.
blueprint unified-limits-nova
Change-Id: I272b59b7bc8975bfd602640789f80d2d5f7ee698
Add a pre-filter for requests that contain VNIC_TYPE_REMOTE_MANAGED
ports in them: hosts that do not have either the relevant compute
driver capability COMPUTE_REMOTE_MANAGED_PORTS or PCI device pools
with "remote_managed" devices are filtered out early. Presence of
devices actually available for allocation is checked at a later
point by the PciPassthroughFilter.
Change-Id: I168d3ccc914f25a3d4255c9b319ee6b91a2f66e2
Implements: blueprint integration-with-off-path-network-backends
autopep8 is a code formating tool that makes python code pep8
compliant without changing everything. Unlike black it will
not radically change all code and the primary change to the
existing codebase is adding a new line after class level doc strings.
This change adds a new tox autopep8 env to manually run it on your
code before you submit a patch, it also adds autopep8 to pre-commit
so if you use pre-commit it will do it for you automatically.
This change runs autopep8 in diff mode with --exit-code in the pep8
tox env so it will fail if autopep8 would modify your code if run
in in-place mode. This allows use to gate on autopep8 not modifying
patches that are submited. This will ensure authorship of patches is
maintianed.
The intent of this change is to save the large amount of time we spend
on ensuring style guidlines are followed automatically to make it
simpler for both new and old contibutors to work on nova and save
time and effort for all involved.
Change-Id: Idd618d634cc70ae8d58fab32f322e75bfabefb9d
When a compute node goes down and all instances on the compute node
are evacuated, allocation records about these instance are still left
in the source compute node until nova-compute service is again started
on the node. However if a compute node is completely broken, it is not
possible to start the service again.
In this situation deleting nova-compute service for the compute node
doesn't delete its resource provider record, and even if a user tries
to delete the resource provider, the delete request is rejected because
allocations are still left on that node.
This change ensures that remaining allocations left by successful
evacuations are cleared when deleting a nova-compute service, to avoid
any resource provider record left even if a compute node can't be
recovered. Migration records are still left in 'done' status to trigger
clean-up tasks in case the compute node is recovered later.
Closes-Bug: #1829479
Change-Id: I3ce6f6275bfe09d43718c3a491b3991a804027bd
The interface attach and detach logic is now fully adapted to the new
extended resource request format, and supports more than one request
group in a single port.
blueprint: qos-minimum-guaranteed-packet-rate
Change-Id: I73e6acf5adfffa9203efa3374671ec18f4ea79eb
This adds a force kwarg to delete_allocation_for_instance which
defaults to True because that was found to be the most common use case
by a significant margin during implementation of this patch.
In most cases, this method is called when we want to delete the
allocations because they should be gone, e.g. server delete, failed
build, or shelve offload. The alternative in these cases is the caller
could trap the conflict error and retry but we might as well just force
the delete in that case (it's cleaner).
When force=True, it will DELETE the consumer allocations rather than
GET and PUT with an empty allocations dict and the consumer generation
which can result in a 409 conflict from Placement. For example, bug
1836754 shows that in one tempest test that creates a server and then
immediately deletes it, we can hit a very tight window where the method
GETs the allocations and before it PUTs the empty allocations to remove
them, something changes which results in a conflict and the server
delete fails with a 409 error.
It's worth noting that delete_allocation_for_instance used to just
DELETE the allocations before Stein [1] when we started taking consumer
generations into account. There was also a related mailing list thread
[2].
Closes-Bug: #1836754
[1] I77f34788dd7ab8fdf60d668a4f76452e03cf9888
[2] http://lists.openstack.org/pipermail/openstack-dev/2018-August/133374.html
Change-Id: Ife3c7a5a95c5d707983ab33fd2fbfc1cfb72f676
There's only one driver now, which means there isn't really a driver at
all. Move the code into the manager altogether and avoid a useless layer
of abstraction.
Change-Id: I609df5b707e05ea70c8a738701423ca751682575
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
The new format of the resource_request field of the Neutron port allows
expressing not just request groups but also request global parameters
for the allocation candidate query. This patch adapts the neutron client
in nova to parse such parameters. Then transfer this information to the
scheduler to include it in the allocation candidate request.
It relies on previous patches that already extended the
RequestLevelParams ovo and the allocation candidate query generation.
Change-Id: Icb91f6429050a161f577d0ed94d4cd906d3da461
blueprint: qos-minimum-guaranteed-packet-rate
Extend the allocation candidate query generation to include same_subtree
query parameters based on the RequestSpec.
The same_subtree parameter is available since placement microversion
1.36 but we already bumped our minimum placement requirement to 1.36
in a previous patch.
Change-Id: I7447bbb49fcee5197b176aa64fd5f2c930f70fc0
blueprint: qos-minimum-guaranteed-packet-rate
To implement the usage of same_subtree query parameter in the
allocation candidate request first the minimum requires placement
microversion needs to be bumped from 1.35 to 1.36. This patch makes such
bump and update the related nova upgrade check. Later patches will
modify the query generation to include the same_subtree param to the
request.
Change-Id: I5bfec9b9ec49e60c454d71f6fc645038504ef9ef
blueprint: qos-minimum-guaranteed-packet-rate
There is a race condition in nova-compute with the ironic virt driver
as nodes get rebalanced. It can lead to compute nodes being removed in
the DB and not repopulated. Ultimately this prevents these nodes from
being scheduled to.
The issue being addressed here is that if a compute node is deleted by a
host which thinks it is an orphan, then the resource provider for that
node might also be deleted. The compute host that owns the node might
not recreate the resource provider if it exists in the provider tree
cache.
This change fixes the issue by clearing resource providers from the
provider tree cache for which a compute node entry does not exist. Then,
when the available resource for the node is updated, the resource
providers are not found in the cache and get recreated in placement.
Change-Id: Ia53ff43e6964963cdf295604ba0fb7171389606e
Related-Bug: #1853009
Related-Bug: #1841481
There are no longer any custom filters. We don't need the abstract base
class. Merge the code in and give it a more useful 'SchedulerDriver'
name.
Change-Id: Id08dafa72d617ca85e66d50b3c91045e0e8723d0
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
This hasn't had any users since we removed the 'ChanceScheduler' in
change I44f9c1cabf9fc64b1a6903236bc88f5ed8619e9e.
Change-Id: I5c009b2cf5d64d28c706795de06c0ea1dedf054b
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Follow-up on change I64dc67e2bacd7a6c86153db5ae983dfb54bd40eb by
removing additional code paths that are no longer relevant following the
removal of the 'USES_ALLOCATION_CANDIDATES' option. This is kept
separate from the aforementioned change to help keep both changes
readable.
Change-Id: I1d2b51f5dd2ca75eb565ca5242cfdb938868bff9
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
There is only one scheduler driver now. This variable is no longer
necessary.
We remove a couple of test that validated behavior for scheduler drivers
that didn't support allocation candidates (there are none) and update
the test for the 'nova-manage placement heal_allocations' command to
drop allocations instead of relying on them not being created.
Change-Id: I64dc67e2bacd7a6c86153db5ae983dfb54bd40eb
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
This change deprecates the AZ filters which is no longer required.
This also enable the use of placement for AZ enforcement by default and
deprecates the config option for removal.
Change-Id: I92b0386432444fc8bdf852de4bdb6cebb370a8ca
This continues on from I81fec10535034f3a81d46713a6eda813f90561cf and
removes all other references to 'instance_type' where it's possible to
do so. The only things left are DB columns, o.vo fields, some
unversioned objects, and RPC API methods. If we want to remove these, we
can but it's a lot more work.
Change-Id: I264d6df1809d7283415e69a66a9153829b8df537
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
For the upcoming `socket` affinity, PCI stats needs to know the
host's NUMA topology in order to calculate the socket affinity of PCI
devices (based on their 'numa_node'). In this patch, the PCI manager
starts using its compute_node parameter to give a NUMA topology to PCI
stats.
PCI stats needs to track the NUMA topology at the object (not the
class) level, so all the '_filter_*()' classmethods that, in a
subsequent patch, will either use 'self.numa_topology' directly or
call a method that does, are moved to the object level. They were only
ever called via self anyways.
Implements: blueprint pci-socket-affinity
Change-Id: I1d270cc3e88a74097eefe3b887106222ae06fe1c
You've got a 'os:secure_boot' flavor extra spec. Have a trait, you crazy
guy!
At the moment nothing will report this. That's coming soon.
Blueprint: allow-secure-boot-for-qemu-kvm-guests
Change-Id: Ie217063ba721d4b417e67e9a14e938948d050f4a
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Since only Wallaby compute nodes will support the 'socket' PCI NUMA
affinity policy, this patch adds a ResourceRequest translator that adds
a required trait if the value of '(hw_|hw:)pci_numa_affinity_policy' is
'socket'.
The actual trait reporting by the libvirt driver will be added in a
future patch. Until then the 'socket' value remains a hidden no-op.
Implements: blueprint pci-socket-affinity
Change-Id: I908ff07e1107304ca5926cc04d2fdc8fe0da5ed9
There are already type hints in this file so now we enable checking
those hints. To make mypy happy some fields needed an explicit type
declaration.
Change-Id: Id1df976f067777d4115e3cb2c72d54764bb31c6a