Propose Isolate Scheduler DB for Aggregates
Defines a clear way on how to get Aggregate information from the filters Implements: blueprint isolate-scheduler-db Change-Id: I489ab606341ed406024fff0c7e302fc158d20be2
This commit is contained in:
committed by
Joe Gordon
parent
e6fde1e702
commit
f942d221d2
249
specs/kilo/approved/isolate-scheduler-db-aggregates.rst
Normal file
249
specs/kilo/approved/isolate-scheduler-db-aggregates.rst
Normal file
@@ -0,0 +1,249 @@
|
|||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
=========================================
|
||||||
|
Isolate Scheduler Database for Aggregates
|
||||||
|
=========================================
|
||||||
|
|
||||||
|
https://blueprints.launchpad.net/nova/+spec/isolate-scheduler-db
|
||||||
|
|
||||||
|
We want to split out nova-scheduler into gantt. To do this, this blueprint is
|
||||||
|
the second stage after scheduler-lib split. These two blueprints are
|
||||||
|
independent however.
|
||||||
|
|
||||||
|
In this blueprint, we need to isolate all accesses to the database that
|
||||||
|
Scheduler is doing and refactor code (manager, filters,
|
||||||
|
weighters) so that scheduler is only internally accessing scheduler-related
|
||||||
|
tables or resources.
|
||||||
|
|
||||||
|
Note : this spec is only targeting changes to the Aggregates-related filters.
|
||||||
|
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
When making decisions involving information about an aggregate, the scheduler
|
||||||
|
accesses the Nova DB's aggregates table either directly or indirectly via
|
||||||
|
nova.objects.AggregateList. In order for the split of the scheduler to be
|
||||||
|
clean, any access by the Nova scheduler to tables that will stay in the Nova DB
|
||||||
|
(i.e. aggregates table) must be refactored so that the scheduler has an API
|
||||||
|
method that allows nova-conductor or other services to update the scheduler's
|
||||||
|
view of aggregate information.
|
||||||
|
|
||||||
|
Below is the summary of all filters impacted by that proposal
|
||||||
|
|
||||||
|
* AggregateImagePropertiesIsolation,
|
||||||
|
* AggregateInstanceExtraSpecsFilter,
|
||||||
|
* AggregateMultiTenancyIsolation,
|
||||||
|
* AvailabilityZoneFilter,
|
||||||
|
* AggregateCoreFilter (calls n.objects.aggregate.AggregateList.get_by_host)
|
||||||
|
* AggregateRamFilter (calls n.objects.aggregate.AggregateList.get_by_host)
|
||||||
|
* AggregateTypeAffinityFilter (calls
|
||||||
|
n.objects.aggregate.AggregateList.get_by_host)
|
||||||
|
|
||||||
|
|
||||||
|
Use Cases
|
||||||
|
----------
|
||||||
|
|
||||||
|
N/A, this is a refactoring effort.
|
||||||
|
|
||||||
|
Project Priority
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
This blueprint is part of the 'scheduler' refactoring effort identified as a
|
||||||
|
priority for Kilo.
|
||||||
|
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
The strategy will consist in updating the scheduler each time a change comes
|
||||||
|
to an Aggregate (adding or removing a host or changing metadata).
|
||||||
|
|
||||||
|
As the current Scheduler design scales with the number of requests (for each
|
||||||
|
request, a new HostState object is generated using get_all_host_states method
|
||||||
|
in the HostManager module), we can't hardly ask the Scheduler to update a DB
|
||||||
|
each time a new compute comes in an aggregate. It would then create a new
|
||||||
|
paradigm where the Scheduler would scale with the number of computes added
|
||||||
|
to aggregates and which could create some race conditions.
|
||||||
|
|
||||||
|
Instead, we propose to create an in-memory view of all the aggregates in the
|
||||||
|
Scheduler which would be populated when the scheduler is starting by calling
|
||||||
|
the Nova Aggregates API and leave the filters access these objects instead of
|
||||||
|
calling by themselves the Nova aggregates DB table indirectly.
|
||||||
|
Updates to the Aggregates which are done using the
|
||||||
|
``nova.compute.api.AggregateAPI`` will also call the Scheduler RPC API to ask
|
||||||
|
the Scheduler to update the relevant view.
|
||||||
|
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
Obviously, the main concern is about duplicating aggregates information and the
|
||||||
|
potential race conditions that can occur. In our humble opinion, duplicating
|
||||||
|
the information in the Scheduler memory is a small price to pay for making sure
|
||||||
|
that the Scheduler could one day live by its own.
|
||||||
|
|
||||||
|
A corollary would be to consider that if duplication is not good, then the
|
||||||
|
Scheduler should fully *own* the Aggregates table. Consequently, all the calls
|
||||||
|
in the nova.compute.api.AggregatesAPI would be treated as "external" calls and
|
||||||
|
once the Scheduler would be splitted out, the Aggregates would no longer reside
|
||||||
|
in Nova.
|
||||||
|
|
||||||
|
Another mid-term approach would be to envisage a second service for the
|
||||||
|
Scheduler (like nova-scheduler-updater - still very bad at naming...) which
|
||||||
|
would accept RPC API calls and write the Scheduler DB separatly from the
|
||||||
|
nova-scheduler service which would actually be treated like a "nova-api"-ish
|
||||||
|
thing because we could consider that the warmup period for the Scheduler for
|
||||||
|
populating the relative HostState informations could be problematic and we
|
||||||
|
could prefer to persist all these objects into the Scheduler DB.
|
||||||
|
|
||||||
|
Finally, we definitely are against calling Aggregates API from the Scheduler
|
||||||
|
each time a filter needs information because it doesn't scale.
|
||||||
|
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
None, we only create an in-memory object which won't be persisted.
|
||||||
|
|
||||||
|
|
||||||
|
REST API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Notifications impact
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
None. The atomicity of the operation (adding/modifying an Aggregate) remains
|
||||||
|
identical, we don't want to add 2 notifications for the same operation.
|
||||||
|
|
||||||
|
|
||||||
|
Other end user impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
Accesses should be done against a memory object instead of accessing the DB,
|
||||||
|
so we definitely expect better access times and scalability should be improved.
|
||||||
|
|
||||||
|
|
||||||
|
Other deployer impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Developer impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Ideally:
|
||||||
|
|
||||||
|
* Filters should no longer place calls to other bits of code except Scheduler.
|
||||||
|
This will be done by modifying Scheduler component to proxy conductor calls
|
||||||
|
to a Singleton which will refuse anything but scheduler-related objects.
|
||||||
|
See footnote [1] as example. As said above, we will still provide a failback
|
||||||
|
mode for Kilo release in order to have compatibility with N-1 release.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
|
||||||
|
Here, we propose to set the collection of ``nova.objects.Aggregate`` objects
|
||||||
|
by calling ``nova.objects.AggregateList.get_all()`` during the initialization
|
||||||
|
of ``nova.scheduler.host_state.HostManager`` as an attribute to HostManager.
|
||||||
|
|
||||||
|
In order to access the list of aggregates than an host belongs to, we plan
|
||||||
|
to add a list of references to the corresponding Aggregate objects as an
|
||||||
|
extra attribute of ``nova.scheduler.host_state.HostState`` during that
|
||||||
|
initialization phase.
|
||||||
|
|
||||||
|
|
||||||
|
The second phase would consist to provide updates to that caching system
|
||||||
|
by amending the Scheduler RPC API by adding a new
|
||||||
|
update_aggregate() method, which nova.scheduler.client would expose it too.
|
||||||
|
|
||||||
|
The update_aggregate() method would take only one argument, a
|
||||||
|
``nova.objects.Aggregate`` object and would properly update the
|
||||||
|
``HostManager.aggregates`` attribute so that the ``HostState.aggregates``
|
||||||
|
reference would implicetely be updated.
|
||||||
|
|
||||||
|
Every time that an Aggregate would be updated, we would hook the existing
|
||||||
|
nova.compute.api.AggregateAPI class and each method in it by adding another
|
||||||
|
call to nova.scheduler.client which would RPC fanout the call to all
|
||||||
|
nova-scheduler services.
|
||||||
|
|
||||||
|
Once all of that would be done, filters would just have to look into
|
||||||
|
HostState.aggregates to access all aggregate information (incl. metadata)
|
||||||
|
related to the aggregates the host belongs to.
|
||||||
|
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
sylvain-bauza
|
||||||
|
|
||||||
|
Other contributors:
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
* Instanciate HostManager.aggregates and HostState.aggregates
|
||||||
|
when scheduler is starting
|
||||||
|
|
||||||
|
* Add update_aggregate() method to the Scheduler RPC API and bump a version
|
||||||
|
|
||||||
|
* Create nova.scheduler.client method for update_aggregate()
|
||||||
|
|
||||||
|
* Modify nova.api.AggregateAPI methods to call the scheduler client method
|
||||||
|
|
||||||
|
* Modify filters so they can look to HostState
|
||||||
|
|
||||||
|
* Modify scheduler entrypoint to block conductor accesses to Aggregates
|
||||||
|
(once Lxxx release development will be open)
|
||||||
|
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
Covered by existing tempest tests and CIs.
|
||||||
|
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
* https://etherpad.openstack.org/p/icehouse-external-scheduler
|
||||||
|
|
||||||
|
* http://eavesdrop.openstack.org/meetings/gantt/2014/gantt.2014-03-18-15.00.html
|
||||||
|
|
||||||
|
[1] http://git.openstack.org/cgit/openstack/nova/commit/?id=e5cbbcfc6a5fa31565d21e6c0ea260faca3b253d
|
||||||
Reference in New Issue
Block a user