Since all remaining SchedulerClient methods were direct passthroughs to
the SchedulerQueryClient, and nothing special is being done anymore to
instantiate that client, the SchedulerClient is no longer necessary. All
references to it are replaced with direct references to
A step toward getting rid of the SchedulerClient intermediary, this
patch removes the reportclient member from SchedulerClient, instead
instantiating SchedulerReportClient directly wherever it's needed.
There were a bunch of report client methods around updating inventory to
placement which were only being used in the non-update_provider_tree
code paths of the resource tracker's update routine. Those code paths
had already been retrofitted to produce a placement-shaped inventory
update_from_provider_tree gives us another way to flush these inventory
This patch simply takes the inventory object produced by the
get_inventory() and update_compute_node() code paths and updates the
provider tree object in the same fashion as update_provider_tree does.
So now all three code paths can commonly invoke
And we can get rid of a ton of redundant code in the report client.
This includes the former incarnation of set_inventory_for_provider; so
we rename the artist formerly known as _set_inventory_for_provider to
match its brethren, set_traits_for_provider and
Conductor RPC calls the scheduler to get hosts during
server create, which in a multi-create request with a
lot of servers and the default rpc_response_timeout, can
trigger a MessagingTimeout. Due to the old
retry_select_destinations decorator, conductor will retry
the select_destinations RPC call up to max_attempts times,
so thrice by default. This can clobber the scheduler and
placement while the initial scheduler worker is still
trying to process the beefy request and allocate resources
This has been recreated in a devstack test patch  and
shown to fail with 1000 instances in a single request with
the default rpc_response_timeout of 60 seconds. Changing the
rpc_response_timeout to 300 avoids the MessagingTimeout and
Since Rocky we have the long_rpc_timeout config option which
defaults to 1800 seconds. The RPC client can thus be changed
to heartbeat the scheduler service during the RPC call every
$rpc_response_timeout seconds with a hard timeout of
$long_rpc_timeout. That change is made here.
As a result, the problematic retry_select_destinations
decorator is also no longer necessary and removed here. That
decorator was added in I2b891bf6d0a3d8f45fd98ca54a665ae78eab78b3
and was a hack for scheduler high availability where a
MessagingTimeout was assumed to be a result of the scheduler
service dying so retrying the request was reasonable to hit
another scheduler worker, but is clearly not sufficient
in the large multi-create case, and long_rpc_timeout is a
better fit for that HA type scenario to heartbeat the scheduler
Things have changed here on Walton's Mountain since LazyLoad was
introduced . It seems to have been created to avoid a circular
import, but as this patch should attest, that's no longer an issue.
Why change this now, besides removing weird and complicated code?
Because a subsequent patch needs to access a *property* of the report
client from the compute manager. As written, LazyLoad only lets you get
at *methods* (as functools.partial). There are other ways to solve that
while preserving the deferred import, but if this works, it's the better
Add the 'X-Openstack-Request-Id' header in the request of DELETE.
When deleteing resource provider inventories, the header is added.
Subsequent patches will add the header in the other cases.
This changes select_destinations() on the scheduler side to optionally
return Selection objects and alternates. The RPC signature doesn't
change in this patch, though, so everything on the conductor side
remains unchanged. The next patch in the series will make the actual RPC
SchedulerReportClient.set_inventory_for_provider and its SchedulerClient
wrapper now accept a parent_provider_uuid kwarg, which must be specified
for any provider that isn't a root. If the method winds up creating the
provider, and parent_provider_uuid is None (the default), the provider
is created as a root - this is the previous behavior. If
parent_provider_uuid is specified, and the method winds up creating the
provider, it is created as a child of the provider indicated.
When the RequestSpec object was created, it was assumed that when the
request was for more than one instance the scheduler would not need to
know the UUIDs for the individual instances and so it was agreed to
only pass one instance UUID. If, however, we want the scheduler to be
able to claim resources on the selected host, we will need to know the
instance UUID, which will be the consumer_id of the allocation.
This patch adds a new RPC parameter 'instance_uuids' that will be passed
to the scheduler. The next patch in the series adds the logic to the
scheduler to use this new field when selecting hosts.
Co-Authored-By: Sylvain Bauza <firstname.lastname@example.org>
Partially-Implements: blueprint placement-claims
Adds a new get_inventory() method to the virt driver API for returning a
dict of inventory records in a format that the placement API
We also move the ComputeNode.save() call out of the scheduler reporting
client and into the resource tracker. The resource tracker's _update()
method now attempts to call the new get_inventory() virt driver method
and falls back on the existing update_resource_stats() (renamed to
update_compute_node() in this patch) method when get_inventory() is not
The next patch implements get_inventory() for the Ironic virt driver.
Now, we can hydrate the RequestSpec object directly in the conductor and modify
the Scheduler client to primitive it before calling the RPC API.
Later changes will focus on modifying the scheduler.utils methods to play with
RequestSpec object (and possibly kick build_request_spec) but this can be done
on separate changes for cleaning that up.
NOTE: There is an ugly hack hidden in that change because of a bug
in oslo.messaging which doesn't accept datetime type for values. That will be
removed right in the next patch of the branch. Yeah, I know it's bad but it's
Partially-Implements: blueprint request-spec-object-mitaka
This patch converts the ResourceTracker compute_node property
to be a ComputeNode object. A number of fields automatically
take care of mapping their values to a db format, so some of the
code creating json strings goes away with this change.
The scheduler client report code is simplified by the change
to use a ComputeNode object.
Note that this change naturally required modification to a
number of tests in test_tracker, test_resource_tracker and
test_client to cater for objects instead of dicts. Some of these
tests were using incorrect values or arbitrary key names that do
not exist as ComputeNode fields, so they had to be corrected to
conform to the type checking of the ComputeNode object.
part of blueprint make-resource-tracker-use-objects
The Scheduler needs to receive updates from compute whenever there are
changes to any instance so that it can update its view of the instances
on each compute node. This adds the required RPC methods for updates
(new and resized instances) and deletes, as well as a method for sending
sync information to verify that the scheduler and compute views of the
instances are the same.
Partially-Implements: blueprint isolate-scheduler-db
Now that we provided a Scheduler RPC API method for updating and deleting
aggregates, we can add the methods to the Scheduler client too.
Partially-Implements: blueprint isolate-scheduler-db
The oslo team is recommending everyone to switch to the
non-namespaced versions of libraries. Updating the hacking
rule to include a check to prevent oslo.* import from
creeping back in.
This commit includes:
- using oslo_utils instead of oslo.utils
- using oslo_serialization instead of oslo.serialization
- using oslo_db instead of oslo.db
- using oslo_i18n instead of oslo.i18n
- using oslo_middleware instead of oslo.middleware
- using oslo_config instead of oslo.config
- using oslo_messaging instead of "from oslo import messaging"
- using oslo_vmware instead of oslo.vmware
In case the nova-scheduler service dies, try sending the message
once again. The message will be picked up by another server or
the same one when it restarts.
oslo.utils library now provides the functionality previously in
oslo-incubator's excutils, importutils, network_utils, strutils
timeutils, units etc. Some modules already moved to oslo.utils
will still be around since other code in nova/openstack/common/
are using it and will be removed in a subsequent commit.
It was defined in the spec that the scheduler will provide a clear
interface for all scheduler API. Select_destinations() so needs
to be added in the client library.
Implements blueprint scheduler-lib