This patch extends the HostState object with an allocation_candidates
list populated by the scheduler manager. Also this changes the generic
scheduler logic to allocate the candidate of the selected host based on
the candidates in the host state.
So after this patch scheduler filters can be extended to filter the
allocation_candidates list of the HostState object while processing a
host and restrict which candidate can be allocated if the host passes
the all the filters. Potentially all candidates can be removed by
multiple consecutive filters making the host as a non viable scheduling
target.
blueprint: pci-device-tracking-in-placement
Change-Id: Id0afff271d345a94aa83fc886e9c3231c3ff2570
We have many places where we implement singleton behavior for the
placement client. This unifies them into a single place and
implementation. Not only does this DRY things up, but may cause us
to initialize it fewer times and also allows for emitting a common
set of error messages about expected failures for better
troubleshooting.
Change-Id: Iab8a791f64323f996e1d6e6d5a7e7a7c34eb4fb3
Related-Bug: #1846820
This adds a force kwarg to delete_allocation_for_instance which
defaults to True because that was found to be the most common use case
by a significant margin during implementation of this patch.
In most cases, this method is called when we want to delete the
allocations because they should be gone, e.g. server delete, failed
build, or shelve offload. The alternative in these cases is the caller
could trap the conflict error and retry but we might as well just force
the delete in that case (it's cleaner).
When force=True, it will DELETE the consumer allocations rather than
GET and PUT with an empty allocations dict and the consumer generation
which can result in a 409 conflict from Placement. For example, bug
1836754 shows that in one tempest test that creates a server and then
immediately deletes it, we can hit a very tight window where the method
GETs the allocations and before it PUTs the empty allocations to remove
them, something changes which results in a conflict and the server
delete fails with a 409 error.
It's worth noting that delete_allocation_for_instance used to just
DELETE the allocations before Stein [1] when we started taking consumer
generations into account. There was also a related mailing list thread
[2].
Closes-Bug: #1836754
[1] I77f34788dd7ab8fdf60d668a4f76452e03cf9888
[2] http://lists.openstack.org/pipermail/openstack-dev/2018-August/133374.html
Change-Id: Ife3c7a5a95c5d707983ab33fd2fbfc1cfb72f676
There's only one driver now, which means there isn't really a driver at
all. Move the code into the manager altogether and avoid a useless layer
of abstraction.
Change-Id: I609df5b707e05ea70c8a738701423ca751682575
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
There are no longer any custom filters. We don't need the abstract base
class. Merge the code in and give it a more useful 'SchedulerDriver'
name.
Change-Id: Id08dafa72d617ca85e66d50b3c91045e0e8723d0
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Follow-up on change I64dc67e2bacd7a6c86153db5ae983dfb54bd40eb by
removing additional code paths that are no longer relevant following the
removal of the 'USES_ALLOCATION_CANDIDATES' option. This is kept
separate from the aforementioned change to help keep both changes
readable.
Change-Id: I1d2b51f5dd2ca75eb565ca5242cfdb938868bff9
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
There is only one scheduler driver now. This variable is no longer
necessary.
We remove a couple of test that validated behavior for scheduler drivers
that didn't support allocation candidates (there are none) and update
the test for the 'nova-manage placement heal_allocations' command to
drop allocations instead of relying on them not being created.
Change-Id: I64dc67e2bacd7a6c86153db5ae983dfb54bd40eb
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
As the help text for this option stated:
Currently there are no in-tree scheduler driver [sic] that use this
option.
Now that we no longer support out-of-tree drivers, it's time to remove
this. Do so.
Change-Id: Ib40c25db2c16373677ff32e4e95292fbab498751
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
We deprecated this functionality in Ussuri and can now remove it. It's
highly unlikely that there exists a functioning alternative to this
scheduler and it's not something we can really support nowadays.
Change-Id: I546d3d329a69acaad3ada48ccbfddf3a274b6ce2
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Replace six.text_type with str.
A subsequent patch will replace other six.text_type.
Change-Id: I23bb9e539d08f5c6202909054c2dd49b6c7a7a0e
Implements: blueprint six-removal
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
We don't need this and it's blocking us from removing pluggable
scheduler drivers. Remove it.
Change-Id: I61c1a47559645c41089747cc81270848b58b68f9
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Map 'hw:cpu_policy' and 'hw:cpu_thread_policy' as follows:
hw:cpu_policy
dedicated -> resources:PCPU=${flavor.vcpus}
shared -> resources:VCPU=${flavor.vcpus}
hw:cpu_thread_policy
isolate -> trait:HW_CPU_HYPERTHREADING:forbidden
require -> trait:HW_CPU_HYPERTHREADING:required
prefer -> (none, handled later during scheduling)
Ditto for the 'hw_cpu_policy' and 'hw_cpu_thread_policy' image metadata
equivalents.
In addition, increment the requested 'resources:PCPU' by 1 if the
'hw:emulator_threads_policy' extra spec is present and set to 'isolate'.
The scheduler will attempt to get PCPUs allocations and fall back to
VCPUs if that fails. This is okay because the NUMA fitting code from the
'hardware' module used by both the 'NUMATopology' filter and libvirt
driver protects us. That code doesn't know anything about PCPUs or VCPUs
but rather cares about the 'NUMATopology.pcpuset' field, (starting in
change I492803eaacc34c69af073689f9159449557919db), which can be set to
different values depending on whether this is Train with new-style
config, Train with old-style config, or Stein:
- For Train compute nodes with new-style config, 'NUMATopology.pcpuset'
will be explictly set to the value of '[compute] cpu_dedicated_set'
or, if only '[compute] cpu_dedicated_set' is configured, 'None' (it's
nullable) by the virt driver so the calls to
'hardware.numa_fit_instance_to_host' in the 'NUMATopologyFilter' or
virt driver will fail if it can't actually fit.
- For Train compute nodes with old-style config, 'NUMATopology.pcpuset'
will be set to the same value as 'NUMATopology.cpuset' by the virt
driver.
- For Stein compute nodes, 'NUMATopology.pcpuset' will be unset and
we'll detect this in 'hardware.numa_fit_instance_to_host' and simply
set it to the same value as 'NUMATopology.cpuset'.
Part of blueprint cpu-resources
Change-Id: Ie38aa625dff543b5980fd437ad2febeba3b50079
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
If the instances per host are not cached in the HostManager
we lookup the HostMapping per candidate compute node during
each scheduling request to get the CellMapping so we can target
that cell database to pull the instance uuids on the given host.
For example, if placement returned 20 compute node allocation
candidates and we don't have the instances cached for any of those,
we'll do 20 queries to the API DB to get host mappings.
We can improve this by caching the host to cell uuid after the first
lookup for a given host and then after that, get the CellMapping
from the cells cache (which is a dict, keyed by cell uuid, to the
CellMapping for that cell).
Change-Id: Ic6b1edfad2e384eb32c6942edc522ee301123cbc
Related-Bug: #1737465
When the 'nova-manage cellv2 discover_hosts' command is run in parallel
during a deployment, it results in simultaneous attempts to map the
same compute or service hosts at the same time, resulting in
tracebacks:
"DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u\"Duplicate
entry 'compute-0.localdomain' for key 'uniq_host_mappings0host'\")
[SQL: u'INSERT INTO host_mappings (created_at, updated_at, cell_id,
host) VALUES (%(created_at)s, %(updated_at)s, %(cell_id)s,
%(host)s)'] [parameters: {'host': u'compute-0.localdomain',
%'cell_id': 5, 'created_at': datetime.datetime(2019, 4, 10, 15, 20,
%50, 527925), 'updated_at': None}]
This adds more information to the command help and adds a warning
message when duplicate host mappings are detected with guidance about
how to run the command. The command will return 2 if a duplicate host
mapping is encountered and the documentation is updated to explain
this.
This also adds a warning to the scheduler periodic task to recommend
enabling the periodic on only one scheduler to prevent collisions.
We choose to warn and stop instead of ignoring DBDuplicateEntry because
there could potentially be a large number of parallel tasks competing
to insert duplicate records where only one can succeed. If we ignore
and continue to the next record, the large number of tasks will
repeatedly collide in a tight loop until all get through the entire
list of compute hosts that are being mapped. So we instead stop the
colliding task and emit a message.
Closes-Bug: #1824445
Change-Id: Ia7718ce099294e94309103feb9cc2397ff8f5188
This patch retrieves target host info from RequestSpec object,
looks up the database to translate it into the compute node uuid,
and pass it to the new `RequestGroup.in_tree` field.
Change-Id: If7ea02df42d220c5042947efdef4777509492a0b
Blueprint: use-placement-in-tree
A step toward getting rid of the SchedulerClient intermediary, this
patch removes the reportclient member from SchedulerClient, instead
instantiating SchedulerReportClient directly wherever it's needed.
Change-Id: I14d1a648843c6311a962aaf99a47bb1bebf7f5ea
If the placement service returns no allocation candidates, the
log message is done in DEBUG leaving no information for the
operator to have visiblity on why the NoValidHost was raised.
If we do get something from the placement service that results
in 0 compute nodes after going through filters, we do log a
message in INFO. This is to add consistency for a message
in INFO if we have no allocation candidates.
Closes-Bug: #1794811
Change-Id: Ief2277b3a973dd2f947f354f844ce1c6c4d6026a
Not having enough resources for creating the instance is a lot more
probable reason for no allocation candidates than compute node is just
starting up. This patch changes the log message to state both reasons.
Change-Id: I5d0cefbf833e797d560e115148beae35b0c48959
Closes-Bug: #1782309
If the instance is rebuilt with a different image in the same
host, we don't need to call placement because there is no change
in resource consumption.
Change-Id: Ie252271ecfd38a0a1c61c26e323cc03869889f0a
Closes-Bug: #1750623
This by default would be scheduling to all cells since all cells
would be enabled at the time of creation unless specified otherwise.
Since the list of enabled cells are stored as a global cache on the
host_manager, a reset() handler for the SIGHUP signal has also been
added in the scheduler. Hence upon every create-cell/enable-cell/
disable-cell operation the scheduler has to be signaled so that the
cache is refreshed.
Co-Authored-By: Dan Smith <dms@danplanet.com>
Implements blueprint cell-disable
Change-Id: I6a9007d172b55238d02da8046311f8dc954703c5
This adds a pre-placement step where we can call a number of modular
"request filters" to modify the request_spec before we move on to
construct the call to placement to get allocation candidates.
No filters are added at this time, this just adds the infrastructure.
Related to blueprint placement-req-filter
Change-Id: I1535158a0dbd4a8527bb987e085e9391e5b0fde4
Add the 'X-Openstack-Request-Id' header
in the request of GET in SchedulerReportClient.
Change-Id: I306ac6f5c6b67d77d91a7ba24d4d863ab3e1bf5c
Closes-Bug: #1734625
In the placement API version 1.12, the dict format is consistent format in
the APIs `PUT /allocations/{consumer_uuid}`, `GET /allocations/{consumer_uuid}`
and `GET /allocation_candidates`. This patch switches to this consistent
format in the claim_resources method.
There is one upgrade scene which needs to be care. Since we send alternative
hosts to the compute node as Selection object. The Selection object includes
the json dict of allocation object. That means if a try of instance building
happened in the middle of upgrade, the old format allocation json dict will
send back to the new version cell conductor. So the claim_resources method
still needs to support old version allocation json format.
Change-Id: I4314ade528c9cb69bbf6c55d31566e1e188fef09
The earlier patches in the series generated alternates and Selection
objects, and modified the RPC calls to send them to the conductor. This
patch has the conductor pass these host_lists to the compute for the
build process, and, if the build fails, has the compute pass the
host_list back to the conductor.
Also fixes a bug in the scheduler manager exposed by this change
when using the CachingScheduler in a reschedule functional test.
Blueprint: return-alternate-hosts
Change-Id: Iae904afb6cb4fcea8bb27741d774ffbe986a5fb4
This changes the RPC call for select_destinations() as made by the
conductor. The previous patch added the logic on the scheduler side;
this patch changes the conductor side to use the two new parameters that
flag the new behaviors for Selection objects and alternate hosts.
Blueprint: return-alternate-hosts
Change-Id: I70b11dd489d222be3d70733355bfe7966df556aa
This changes select_destinations() on the scheduler side to optionally
return Selection objects and alternates. The RPC signature doesn't
change in this patch, though, so everything on the conductor side
remains unchanged. The next patch in the series will make the actual RPC
change.
Blueprint: return-alternate-hosts
Change-Id: I03b95a2106624c2ea24835814ca38e954ec7a997
With the RPC change coming in later patches in the series, it simply
makes more sense to have the method for converting a Selection object to
the older host_state dict format in the object itself.
Blueprint: return-alternate-hosts
Change-Id: I2299c71e034299d63c267bc4cbe6360858445779
This changes the returned value from the scheduler driver's
select_destinations() to a list of lists of Selection objects. It
doesn't actually change the returned value from the scheduler manager to
the conductor; that will be done in the next patch in the series, as
that will require an RPC change.
During review it was noticed that the signature for the abstract
scheduler driver class was not updated when 'alloc_reqs_by_rp_uuid'
parameter was added back in e041fddeb0, so
I've updated it here to make all driver signatures the same.
Blueprint: return-alternate-hosts
Change-Id: I9f864455c69e1355a3cf06d7ba8b98fa3bcf619c
Quota reservations were removed in the Pike release, but there is still a
periodic task in the scheduler manager that runs every minute to expire
reservations, which won't actually do anything.
This patch remove this periodic task and related codes.
Change-Id: Idae069e8cf6ce69e112de08a22c94b6b590f9a69
Closes-bug: #1719048
Placement took over the role of the CoreFilter, RamFilter and DiskFilter
from the FilterScheduler. Therefore if placement returns no allocation
candidates for a request then scheduling should be stopped as this means
there is not enough VCPU, MEMORY_MB, or DISK_GB available in any compute
node for the request.
Change-Id: If20a20e5cce7ab490998643e32556a1016646b07
Closes-Bug: #1708637
For scheduler drivers that use allocation candidates, creates a dict,
keyed by resource provider UUID, of lists of allocation request JSON
objects that may be used to attempt resource claims for hosts selected
by the driver. This patch prepares the foundation for those resource
claims by simply reworking the calling interface of the driver
select_destinations() method to accept this dict of allocation request
lists.
Change-Id: Icaa5d44bb52894f509b95f4d8e69aab8bd2b31f2
blueprint: placement-claims
The get_allocation_candidates method is decorated with the safe_connect
decorator that handles any failures trying to connect to the Placement
service. If keystoneauth raises an exception, safe_connect will log it
and return None. The select_destinations() method in the SchedulerManager
needs to handle the None case so it doesn't assume a tuple is coming back
which would result in a TypeError.
Change-Id: Iffd72f51f25a9e874eaacf374d80794675236ac1
Closes-Bug: #1705141
This patch adds log info in nova-scheduler to show
whether and when scheduler begins to handle schedule
task for instances. We always want to know the very
start point in order to debug, analyse and optimize
the schedule process.
Change-Id: Ibe5c49813d85f26870dea1d971aeeb0c7ff7a3cd
The SchedulerManager.select_destinations method calls
_host_state_obj_to_dict to convert a list of selected
HostState objects to dicts before converting to primitives
and sending over RPC.
The problem is the NUMATopologyFilter can set the
HostState.limits['numa_topology'] field to an instance of
a NUMATopologyLimits versioned object. That was not being
converted to a primitive before being sent back over RPC.
The jsonutils.to_primitive() method will simply return an
object that it does not know how to convert to a primitive.
As of change If9e8dd5cc2634168910d5f9f8d9302aeefa16097, however,
to_primitive() will now raise a ValueError. This was failing
in nova functional tests in I76a90a57070c5184e893a79406b5d1784fcc969f.
We need to convert the NUMATopologyLimits to a primitive before
calling jsonutils.to_primitive() to avoid the ValueError. This
change explicitly only cares about this one known value and does
not attempt to do this generically for out of tree filters or
scheduler drivers.
Change-Id: I01e4cc2ca9d193b64023bc7df04ef878e68d46b8
Closes-Bug: #1593641
This patch replaces the scheduler's use of GET
/resource_providers?resources=XXX with GET
/allocation_candidates?resources=XXX.
In doing so, we move the interaction with the placement API out of the
scheduler driver interface and up into the scheduler manager. This
allows us to make fewer changes to the underlying HostManager and
SchedulerDriver interfaces and isolate communication with the placement
API in a single place.
The provider_summaries part of the response from GET
/allocation_candidates is used to generate the UUIDs that winnow the
number of compute nodes retrieved by the filter scheduler during
scheduling. Following patches will add in support for actually doing
the claim from the scheduler against one or more resources providers by
examining the allocation_requests part of the HTTP response and picking
one that contains the host the scheduler picked during its _schedule()
loop.
Change-Id: I1c0bd2987dcbc38f23b71db2bc8e3267f85168c8
blueprint: placement-allocation-requests
Refactors the scheduler *driver* class select_destinations() method to
return a sorted list of HostState objects instead of a list of dicts.
This allows us to eventually use these HostState objects in our claim
logic in future patches.
Change-Id: I8f173fc36af249150e656f4a611c5ea6f2141ae3
blueprint: placement-allocation-requests
When the RequestSpec object was created, it was assumed that when the
request was for more than one instance the scheduler would not need to
know the UUIDs for the individual instances and so it was agreed to
only pass one instance UUID. If, however, we want the scheduler to be
able to claim resources on the selected host, we will need to know the
instance UUID, which will be the consumer_id of the allocation.
This patch adds a new RPC parameter 'instance_uuids' that will be passed
to the scheduler. The next patch in the series adds the logic to the
scheduler to use this new field when selecting hosts.
Co-Authored-By: Sylvain Bauza <sbauza@redhat.com>
Partially-Implements: blueprint placement-claims
Change-Id: I44ebdb3e29db950bf2ad0e6b1dbfdecd1ca03530
Since cellsv2 has required host mappings to be present, we have lost
our automatic host registration behavior. This is a conscious choice,
given that we do not want to have upcalls from the cells to the API
database. This adds a periodic task to have the scheduler do the
discovery routine, which for small deployments may be an acceptable
amount of background overhead in exchange for the automatic behavior.
There is also probably some amount of performance improvement that
can be added to the host discovery routine to make it less of an
issue. However, just generalizing the existing routine and letting
it run periodically gives us some coverage, given where we are in
the current cycle.
Related to blueprint cells-scheduler-integration
Change-Id: Iab5f172cdef35645bae56b9b384e032f3160e826
Move all scheduler options into their one of two groups. Many of the
options are simply renamed to remove their 'scheduler_' prefix, with
some exceptions:
* scheduler_default_filters -> enabled_filters
* scheduler_baremetal_default_filters -> baremetal_enabled_filters
* scheduler_driver_task_period -> periodic_task_interval
* scheduler_tracks_instance_changes -> track_instance_changes
Change-Id: I3f48e52815e80c99612bcd10cb53331a8c995fc3
Co-Authored-By: Stephen Finucane <sfinucan@redhat.com>
Implements: blueprint centralize-config-options-ocata
Support for specifying the scheduler driver via a full classpath was
deprecated in Mitaka, and with this patch is removed in Ocata.
Change-Id: I72e392aafa886ba19c874f1e0a0c95f6d1757ab9
Avoid having to configure the full class path of scheduler driver,
change to load by stevedore driver plugin using entrypoints.
Change 'scheduler_driver' to use entrypoint with the namespace
'nova.scheduler.driver' in 'setup.cfg'. Meanwhile, still maintain the
compatibility for class path configuration until the next major release.
Change all related tests with flag 'scheduler_driver' to use stevedore
entrypoint.
UpgradeImpact - see the reno file attached.
Implements: blueprint scheduler-driver-use-stevedore
Change-Id: I8c169e12d9bfacdbdb1dadf68b8a1fa98c5ea5bc
Since we now have a RequestSpec object, we can directly provide it thru
the RPC API select_destinations(). Since the conductor also uses the
RequestSpec object, we just need to have a RPC compatibility check to
see whether we can directly send the object or backport it to dicts if
the scheduler is old.
Note: As mox was in use for the RPC API tests, I replaced it by mock
for cleaning up the test helper method.
Implements blueprint request-spec-object-mitaka
Change-Id: Ifd3289bf9eccab7e47cd00055b716abb8e6eb965
In some modules the global LOG is not used any more. And the import
of logging is not used. This patch removes the unused logging import
and LOG vars.
Change-Id: I28572c325f8c31ff38161010047bba00c5d5b833
This change moves all of the configuration options previously defined in
nova/scheduler to the new centralized nova/conf/scheduler directory.
A subsequent patch will then improve the help texts.
Blueprint centralize-config-options
Change-Id: I08d50e873f71601a26ce768feac635243d0570f4