Propose Isolate Scheduler DB for Aggregates

Defines a clear way on how to get Aggregate information from the filters Implements: blueprint isolate-scheduler-db Change-Id: I489ab606341ed406024fff0c7e302fc158d20be2
2014-04-23 18:26:55 +02:00
parent e6fde1e702
commit f942d221d2
1 changed files with 249 additions and 0 deletions
--- a/specs/kilo/approved/isolate-scheduler-db-aggregates.rst
+++ b/specs/kilo/approved/isolate-scheduler-db-aggregates.rst
@@ -0,0 +1,249 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+=========================================
+Isolate Scheduler Database for Aggregates
+=========================================
+
+https://blueprints.launchpad.net/nova/+spec/isolate-scheduler-db
+
+We want to split out nova-scheduler into gantt. To do this, this blueprint is
+the second stage after scheduler-lib split. These two blueprints are
+independent however.
+
+In this blueprint, we need to isolate all accesses to the database that
+Scheduler is doing and refactor code (manager, filters,
+weighters) so that scheduler is only internally accessing scheduler-related
+tables or resources.
+
+Note : this spec is only targeting changes to the Aggregates-related filters.
+
+
+Problem description
+===================
+
+When making decisions involving information about an aggregate, the scheduler
+accesses the Nova DB's aggregates table either directly or indirectly via
+nova.objects.AggregateList. In order for the split of the scheduler to be
+clean, any access by the Nova scheduler to tables that will stay in the Nova DB
+(i.e. aggregates table) must be refactored so that the scheduler has an API
+method that allows nova-conductor or other services to update the scheduler's
+view of aggregate information.
+
+Below is the summary of all filters impacted by that proposal
+
+  * AggregateImagePropertiesIsolation,
+  * AggregateInstanceExtraSpecsFilter,
+  * AggregateMultiTenancyIsolation,
+  * AvailabilityZoneFilter,
+  * AggregateCoreFilter (calls n.objects.aggregate.AggregateList.get_by_host)
+  * AggregateRamFilter (calls n.objects.aggregate.AggregateList.get_by_host)
+  * AggregateTypeAffinityFilter (calls
+    n.objects.aggregate.AggregateList.get_by_host)
+
+
+Use Cases
+----------
+
+N/A, this is a refactoring effort.
+
+Project Priority
+-----------------
+
+This blueprint is part of the 'scheduler' refactoring effort identified as a
+priority for Kilo.
+
+
+Proposed change
+===============
+
+The strategy will consist in updating the scheduler each time a change comes
+to an Aggregate (adding or removing a host or changing metadata).
+
+As the current Scheduler design scales with the number of requests (for each
+request, a new HostState object is generated using get_all_host_states method
+in the HostManager module), we can't hardly ask the Scheduler to update a DB
+each time a new compute comes in an aggregate. It would then create a new
+paradigm where the Scheduler would scale with the number of computes added
+to aggregates and which could create some race conditions.
+
+Instead, we propose to create an in-memory view of all the aggregates in the
+Scheduler which would be populated when the scheduler is starting by calling
+the Nova Aggregates API and leave the filters access these objects instead of
+calling by themselves the Nova aggregates DB table indirectly.
+Updates to the Aggregates which are done using the
+``nova.compute.api.AggregateAPI`` will also call the Scheduler RPC API to ask
+the Scheduler to update the relevant view.
+
+
+Alternatives
+------------
+
+Obviously, the main concern is about duplicating aggregates information and the
+potential race conditions that can occur. In our humble opinion, duplicating
+the information in the Scheduler memory is a small price to pay for making sure
+that the Scheduler could one day live by its own.
+
+A corollary would be to consider that if duplication is not good, then the
+Scheduler should fully *own* the Aggregates table. Consequently, all the calls
+in the nova.compute.api.AggregatesAPI would be treated as "external" calls and
+once the Scheduler would be splitted out, the Aggregates would no longer reside
+in Nova.
+
+Another mid-term approach would be to envisage a second service for the
+Scheduler (like nova-scheduler-updater - still very bad at naming...) which
+would accept RPC API calls and write the Scheduler DB separatly from the
+nova-scheduler service which would actually be treated like a "nova-api"-ish
+thing because we could consider that the warmup period for the Scheduler for
+populating the relative HostState informations could be problematic and we
+could prefer to persist all these objects into the Scheduler DB.
+
+Finally, we definitely are against calling Aggregates API from the Scheduler
+each time a filter needs information because it doesn't scale.
+
+
+Data model impact
+-----------------
+
+None, we only create an in-memory object which won't be persisted.
+
+
+REST API impact
+---------------
+
+None
+
+Security impact
+---------------
+
+None
+
+
+Notifications impact
+--------------------
+
+None. The atomicity of the operation (adding/modifying an Aggregate) remains
+identical, we don't want to add 2 notifications for the same operation.
+
+
+Other end user impact
+---------------------
+
+None
+
+Performance Impact
+------------------
+
+Accesses should be done against a memory object instead of accessing the DB,
+so we definitely expect better access times and scalability should be improved.
+
+
+Other deployer impact
+---------------------
+
+None
+
+
+Developer impact
+----------------
+
+Ideally:
+
+* Filters should no longer place calls to other bits of code except Scheduler.
+  This will be done by modifying Scheduler component to proxy conductor calls
+  to a Singleton which will refuse anything but scheduler-related objects.
+  See footnote [1] as example. As said above, we will still provide a failback
+  mode for Kilo release in order to have compatibility with N-1 release.
+
+
+
+Implementation
+==============
+
+
+Here, we propose to set the collection of ``nova.objects.Aggregate`` objects
+by calling ``nova.objects.AggregateList.get_all()`` during the initialization
+of ``nova.scheduler.host_state.HostManager`` as an attribute to HostManager.
+
+In order to access the list of aggregates than an host belongs to, we plan
+to add a list of references to the corresponding Aggregate objects as an
+extra attribute of ``nova.scheduler.host_state.HostState`` during that
+initialization phase.
+
+
+The second phase would consist to provide updates to that caching system
+by amending the Scheduler RPC API by adding a new
+update_aggregate() method, which nova.scheduler.client would expose it too.
+
+The update_aggregate() method would take only one argument, a
+``nova.objects.Aggregate`` object and would properly update the
+``HostManager.aggregates`` attribute so that the ``HostState.aggregates``
+reference would implicetely be updated.
+
+Every time that an Aggregate would be updated, we would hook the existing
+nova.compute.api.AggregateAPI class and each method in it by adding another
+call to nova.scheduler.client which would RPC fanout the call to all
+nova-scheduler services.
+
+Once all of that would be done, filters would just have to look into
+HostState.aggregates to access all aggregate information (incl. metadata)
+related to the aggregates the host belongs to.
+
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  sylvain-bauza
+
+Other contributors:
+  None
+
+
+Work Items
+----------
+
+* Instanciate HostManager.aggregates and HostState.aggregates
+  when scheduler is starting
+
+* Add update_aggregate() method to the Scheduler RPC API and bump a version
+
+* Create nova.scheduler.client method for update_aggregate()
+
+* Modify nova.api.AggregateAPI methods to call the scheduler client method
+
+* Modify filters so they can look to HostState
+
+* Modify scheduler entrypoint to block conductor accesses to Aggregates
+  (once Lxxx release development will be open)
+
+
+Dependencies
+============
+
+None
+
+
+Testing
+=======
+
+Covered by existing tempest tests and CIs.
+
+
+Documentation Impact
+====================
+
+None
+
+
+References
+==========
+
+* https://etherpad.openstack.org/p/icehouse-external-scheduler
+
+* http://eavesdrop.openstack.org/meetings/gantt/2014/gantt.2014-03-18-15.00.html
+
+[1] http://git.openstack.org/cgit/openstack/nova/commit/?id=e5cbbcfc6a5fa31565d21e6c0ea260faca3b253d