nova/nova/tests/unit/scheduler/client
Balazs Gibizer 1b661c2669 Reduce gen conflict in COMPUTE_STATUS_DISABLED handling
The COMPUTE_STATUS_DISABLED trait is supposed to be added to the compute
RP when the compute service is disabled, and the trait is supposed to be
removed when the service is enabled again. However adding and removing
traits is prone to generation conflict in placement. The original
implementation of blueprint pre-filter-disabled-computes noticed this
and prints a detailed warning message while the API operation succeeds.
We can ignore the conflict this way because the periodic
update_available_resource() call will re-sync the traits later.

Still this gives human noticeable time window where the trait and the
service state are not in sync.

Setting the compute service disable is the smaller problem as the
scheduler still uses the ComputeFilter that filters the computes based
on the service api. So during the enable->disable race window we only
lose scheduling performance as the placement filter is inefficient.

In case of setting the compute service to enabled the race is more
visible as the placement pre_filter will filter out the compute that
is enable by the admin until the re-sync happens. If the conflict would
only happen due to high load on the given compute the such delay could
be explained by the load itself. However conflict can happen simply due
to a new instance boot on the compute.

Fortunately the solution is easy and cheap. The current service state
handler code path has already queried placement about the existing traits
on the compute RP and therefore it receives the current RP generation as
well. But it does not really use this information but instead rely on
the potentially stale provide_tree cache.

This patch uses the much fresher RP generation known by the service state
handling code instead of the potentially stale provider_tree cache.

The change in the notification test is also due to the fixed behavior.
The test disables the compute. Until now this caused that the
FilterScheduler detected that there is no valid host. Now it is already
detect by the scheduler manager based on the empty placement response.
This causes now that that the FilterScheduler is not called and
therefore the select_destination.start notification is not sent.

Closes-Bug: #1886418

Change-Id: Ib3c455bf21f33923bb82e3f5c53035f6722480d3
2020-07-10 17:38:13 +02:00
..
__init__.py rt: isolate report and query sched client tests 2016-08-21 20:43:17 -04:00
test_query.py Use uuidsentinel from oslo.utils 2018-09-05 09:08:54 -05:00
test_report.py Reduce gen conflict in COMPUTE_STATUS_DISABLED handling 2020-07-10 17:38:13 +02:00