Servicegroup foundational refactoring for Control Plane

At present, there are various interfaces through which services data can be manipulated - admin interface(nova-manage), extensions (contrib/services.py), etc. Every interface relies on the servicegroup layer API is_up() to get details about service live-ness. The proposal is keep service data in nova database nova.services table and fetch the live-ness information from the configured servicegroup(SG) driver. Liveness will be a combination of service live-ness and RPC live-ness, where the later will be computed based on information in nova.services. Previously-approved: liberty Implements: blueprint servicegroup-api-control-plane Change-Id: I3cae9495373c4334fedb18b64d907f3a6ac5ab92
2015-09-10 17:45:09 -07:00
parent b492942744
commit 0090754ebf
1 changed files with 261 additions and 0 deletions
--- a/specs/mitaka/approved/servicegroup-api-control-plane.rst
+++ b/specs/mitaka/approved/servicegroup-api-control-plane.rst
@@ -0,0 +1,261 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+=======================================================
+Servicegroup foundational refactoring for Control Plane
+=======================================================
+
+https://blueprints.launchpad.net/nova/+spec/servicegroup-api-control-plane
+
+At present, there are various interfaces through which services data can
+be manipulated - admin interface(nova-manage), extensions
+(contrib/services.py), etc. Every interface relies on the servicegroup
+layer API is_up() to get details about service liveness. The proposal
+is keep service data in nova database nova.services table and fetch
+the liveness information from the configured servicegroup(SG) driver.
+Liveness will be a combination of service liveness and RPC liveness,
+where the latter will be computed based on information in nova.services.
+
+
+Problem description
+===================
+
+Nova's way for determining service liveness is not pluggable. In
+its current state of art, the services information is stored in nova
+database in nova.services tables. Whereas, the service liveness
+information is computed by the is_up() call. This is_up() call is
+implemented depending on what backend servicegroup driver is chosen.
+
+Right now other SG drivers are not functional and need to be
+revamped to allow them be involved in giving details about service
+liveness. That will be covered as part of separate spec.
+
+The scope of this spec is limited to :-
+
+1. Making sure the 2 separate interfaces mentioned above namely the
+REST API interface and admin interface use the servicegroup layer API
+while fetching service liveness.
+
+2. Service.is_up will be an attribute for the Service object which
+will be computed as a combination of service liveness and rpc
+liveness. Service liveness will be implemented by the respective
+SG driver and depending whether the service is up/down a boolean will
+be returned. Whereas to check rpc liveness Nova will still rely on
+the nova.services table stored in Nova database.
+
+Use Cases
+---------
+
+This is a refactoring effort and can be applicable for following
+usecases:
+
+1. As an operator, I want to use zookeeper to achieve quick detection
+of service outages. Zookeeper servicegroup driver will be used to report
+service liveness although service data will still reside in
+nova.services.
+
+2. As an operator, I want to be sure that when Nova service
+delete (REST api) is invoked the service record from respective backend
+either Zookeeper or Memcache is removed. A SG api to leave() the group, which
+will be added as part of this change, will be invoked.
+
+Deployment using Database servicegroup driver to manage service
+liveness will remain the same apart from including the logic to include
+is_up as service object attribute and computing as proposed in the
+"Proposed change" section.
+
+Project Priority
+----------------
+
+None
+
+Proposed change
+===============
+
+1. Proposed change is to fetch service data from DB and verify service
+liveness using configured SG driver. The details of how the service
+liveness will be configured by each driver is upto the implementation
+details of each SG driver. Point #3.2.2 has details for how the Zookeeper
+SG will compute it.
+
+2. By storing the Service data in the database but the service liveness
+information can be managed by the respective servicegroup driver
+configured. Also, we have things like service version, which will require
+some efficient version calculations in order to drive things like compute
+rpc version pinning and object backporting. That can be done efficiently
+if the services data is stored in database (as database supports max/min
+functionality) as opposed to storing in Zookeeper or Memcache.
+
+3. Change for the service liveness API we can have two options like this :
+
+    def service_is_up(self, member):
+    """Check if the given member is up."""
+
+        A] For DB SG driver:
+
+          #1. Check RPC liveness using the updated_at attribute
+              in nova.services table.
+
+          #2. Check Service liveness depending upon the SG database
+              drivers is_up() method.
+
+        B] For Zookeeper/Memcache SG drivers:
+
+          The Service object will be the interface by which we determine
+          whether a service is up or down. It will necessarily look up the
+          updated_at stamp like it does now, and will optionally consult an
+          external interface (such as zookeeper, memcache) through a defined
+          interface. The external interface depends on the kind of SG driver
+          configured as part of CONF.servicegroup_driver. If either indication
+          results in a "down" verdict, the service will be considered
+          down. Please note both the steps below will be needed to detect
+          service liveness.
+
+          #1. Check RPC liveness using the updated_at attribute
+              in nova.services table.
+
+          #2. Check Service liveness depending upon the Zookeeper
+              SG driver or Memcache SG driver is_up() method.
+
+              #2.1: If Znode for the compute host has not joined
+                    the topic 'compute' then the nova-compute service
+                    is not running on this compute host. The details
+                    on how the Zookeeper ephemeral znode will maintain
+                    the service representational state and how to migrate
+                    from existing database servicegroup driver to
+                    zookeeper/memcache SG driver has been covered
+                    in https://review.openstack.org/#/c/138607.
+
+4. A SG api to leave a group need to be introduced to the SG layer
+and will be implemented by backend drivers. The drivers that don't need
+the leave functionality will not provide any additional logic to
+free up the service record associated with the service. For example
+the znodes used to keep track of service when using Zookeeper SG driver
+are ephemeral which means that they will be automatically deleted
+when the service is deleted. But for other backends like memcache which
+are key/value stores the record needs to be cleared off explicitly. The
+api at the SG layer might look like :
+
+    def leave(self, group_id, member)
+
+As mentioned above depending on the driver used this can be already supported
+if not need to explicitly call out to clean up the service entry.
+
+5. The call will now just invoke service.is_up to check rpc liveness and
+service liveness. Whereas at the object layer is_up will be computed as
+a combination of when the service record was last updated which will give
+details about RPC liveness and querying the respective CONF.sg_driver
+for service liveness.
+
+Alternatives
+------------
+
+None
+
+Data model impact
+-----------------
+
+None
+
+
+REST API impact
+---------------
+
+None
+
+
+Security impact
+---------------
+
+None
+
+
+Notifications impact
+--------------------
+
+None
+
+
+Other end user impact
+---------------------
+
+None
+
+
+Performance Impact
+------------------
+
+None
+
+
+Other deployer impact
+---------------------
+
+None
+
+
+Developer impact
+----------------
+
+Service liveness is fetched from configured SG driver where as
+service details will be fetched from nova database nova.services
+tables. RPC liveness will also be computed based on the data in
+nova.services table.
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  vilobhmm
+
+Other contributors:
+  jaypipes, harlowja
+
+
+Work Items
+----------
+
+- Introduce an additional attribute is_up to nova.objects.service.
+- Fix admin interface, nova-manage where a service is_up/is_down
+  will depend on the combination of service liveness depending on
+  what SG driver is configured and RPC liveness computed based of
+  information stored in nova.services table.
+- Introduce leave() API at the SG layer to make sure when a service
+  is deleted in situations where service liveness is maintained by
+  backends other than db, the znode or the associated structure for
+  the service is freed up.
+
+Dependencies
+============
+
+None
+
+
+Testing
+=======
+
+1. Existing unit tests will be updated to make sure the services
+   data is fetched from nova.services tables and service liveness
+   using servicegroup API.
+
+
+Documentation Impact
+====================
+
+None
+
+
+References
+==========
+
+- http://lists.openstack.org/pipermail/openstack-dev/2015-May/063602.html
+- https://review.openstack.org/#/c/138607
+- http://lists.openstack.org/pipermail/openstack-dev/2015-September/075267.html
+
+.. _etherpad: https://etherpad.openstack.org/p/servicegroup-refactoring