Servicegroup foundational refactoring for Control Plane

At present, there are various interfaces through which services data
can  be manipulated - admin interface(nova-manage), extensions
(contrib/services.py), etc. Every interface relies on the servicegroup
layer API is_up() to get details about service live-ness. The proposal
is keep service data in nova database nova.services table and fetch
the live-ness information from the configured servicegroup(SG) driver.
Liveness will be a combination of service live-ness and RPC live-ness,
where the later will be computed based on information in nova.services.

Previously-approved: liberty
Implements: blueprint servicegroup-api-control-plane

Change-Id: I3cae9495373c4334fedb18b64d907f3a6ac5ab92
This commit is contained in:
Vilobh Meshram
2015-09-10 17:45:09 -07:00
parent b492942744
commit 0090754ebf

View File

@@ -0,0 +1,261 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=======================================================
Servicegroup foundational refactoring for Control Plane
=======================================================
https://blueprints.launchpad.net/nova/+spec/servicegroup-api-control-plane
At present, there are various interfaces through which services data can
be manipulated - admin interface(nova-manage), extensions
(contrib/services.py), etc. Every interface relies on the servicegroup
layer API is_up() to get details about service liveness. The proposal
is keep service data in nova database nova.services table and fetch
the liveness information from the configured servicegroup(SG) driver.
Liveness will be a combination of service liveness and RPC liveness,
where the latter will be computed based on information in nova.services.
Problem description
===================
Nova's way for determining service liveness is not pluggable. In
its current state of art, the services information is stored in nova
database in nova.services tables. Whereas, the service liveness
information is computed by the is_up() call. This is_up() call is
implemented depending on what backend servicegroup driver is chosen.
Right now other SG drivers are not functional and need to be
revamped to allow them be involved in giving details about service
liveness. That will be covered as part of separate spec.
The scope of this spec is limited to :-
1. Making sure the 2 separate interfaces mentioned above namely the
REST API interface and admin interface use the servicegroup layer API
while fetching service liveness.
2. Service.is_up will be an attribute for the Service object which
will be computed as a combination of service liveness and rpc
liveness. Service liveness will be implemented by the respective
SG driver and depending whether the service is up/down a boolean will
be returned. Whereas to check rpc liveness Nova will still rely on
the nova.services table stored in Nova database.
Use Cases
---------
This is a refactoring effort and can be applicable for following
usecases:
1. As an operator, I want to use zookeeper to achieve quick detection
of service outages. Zookeeper servicegroup driver will be used to report
service liveness although service data will still reside in
nova.services.
2. As an operator, I want to be sure that when Nova service
delete (REST api) is invoked the service record from respective backend
either Zookeeper or Memcache is removed. A SG api to leave() the group, which
will be added as part of this change, will be invoked.
Deployment using Database servicegroup driver to manage service
liveness will remain the same apart from including the logic to include
is_up as service object attribute and computing as proposed in the
"Proposed change" section.
Project Priority
----------------
None
Proposed change
===============
1. Proposed change is to fetch service data from DB and verify service
liveness using configured SG driver. The details of how the service
liveness will be configured by each driver is upto the implementation
details of each SG driver. Point #3.2.2 has details for how the Zookeeper
SG will compute it.
2. By storing the Service data in the database but the service liveness
information can be managed by the respective servicegroup driver
configured. Also, we have things like service version, which will require
some efficient version calculations in order to drive things like compute
rpc version pinning and object backporting. That can be done efficiently
if the services data is stored in database (as database supports max/min
functionality) as opposed to storing in Zookeeper or Memcache.
3. Change for the service liveness API we can have two options like this :
def service_is_up(self, member):
"""Check if the given member is up."""
A] For DB SG driver:
#1. Check RPC liveness using the updated_at attribute
in nova.services table.
#2. Check Service liveness depending upon the SG database
drivers is_up() method.
B] For Zookeeper/Memcache SG drivers:
The Service object will be the interface by which we determine
whether a service is up or down. It will necessarily look up the
updated_at stamp like it does now, and will optionally consult an
external interface (such as zookeeper, memcache) through a defined
interface. The external interface depends on the kind of SG driver
configured as part of CONF.servicegroup_driver. If either indication
results in a "down" verdict, the service will be considered
down. Please note both the steps below will be needed to detect
service liveness.
#1. Check RPC liveness using the updated_at attribute
in nova.services table.
#2. Check Service liveness depending upon the Zookeeper
SG driver or Memcache SG driver is_up() method.
#2.1: If Znode for the compute host has not joined
the topic 'compute' then the nova-compute service
is not running on this compute host. The details
on how the Zookeeper ephemeral znode will maintain
the service representational state and how to migrate
from existing database servicegroup driver to
zookeeper/memcache SG driver has been covered
in https://review.openstack.org/#/c/138607.
4. A SG api to leave a group need to be introduced to the SG layer
and will be implemented by backend drivers. The drivers that don't need
the leave functionality will not provide any additional logic to
free up the service record associated with the service. For example
the znodes used to keep track of service when using Zookeeper SG driver
are ephemeral which means that they will be automatically deleted
when the service is deleted. But for other backends like memcache which
are key/value stores the record needs to be cleared off explicitly. The
api at the SG layer might look like :
def leave(self, group_id, member)
As mentioned above depending on the driver used this can be already supported
if not need to explicitly call out to clean up the service entry.
5. The call will now just invoke service.is_up to check rpc liveness and
service liveness. Whereas at the object layer is_up will be computed as
a combination of when the service record was last updated which will give
details about RPC liveness and querying the respective CONF.sg_driver
for service liveness.
Alternatives
------------
None
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
None
Developer impact
----------------
Service liveness is fetched from configured SG driver where as
service details will be fetched from nova database nova.services
tables. RPC liveness will also be computed based on the data in
nova.services table.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
vilobhmm
Other contributors:
jaypipes, harlowja
Work Items
----------
- Introduce an additional attribute is_up to nova.objects.service.
- Fix admin interface, nova-manage where a service is_up/is_down
will depend on the combination of service liveness depending on
what SG driver is configured and RPC liveness computed based of
information stored in nova.services table.
- Introduce leave() API at the SG layer to make sure when a service
is deleted in situations where service liveness is maintained by
backends other than db, the znode or the associated structure for
the service is freed up.
Dependencies
============
None
Testing
=======
1. Existing unit tests will be updated to make sure the services
data is fetched from nova.services tables and service liveness
using servicegroup API.
Documentation Impact
====================
None
References
==========
- http://lists.openstack.org/pipermail/openstack-dev/2015-May/063602.html
- https://review.openstack.org/#/c/138607
- http://lists.openstack.org/pipermail/openstack-dev/2015-September/075267.html
.. _etherpad: https://etherpad.openstack.org/p/servicegroup-refactoring