Servicegroup foundational refactoring for Control Plane
At present, there are various interfaces through which services data can be manipulated - admin interface(nova-manage), extensions (contrib/services.py), etc. Every interface relies on the servicegroup layer API is_up() to get details about service live-ness. The proposal is keep service data in nova database nova.services table and fetch the live-ness information from the configured servicegroup(SG) driver. Liveness will be a combination of service live-ness and RPC live-ness, where the later will be computed based on information in nova.services. Previously-approved: liberty Implements: blueprint servicegroup-api-control-plane Change-Id: I3cae9495373c4334fedb18b64d907f3a6ac5ab92
This commit is contained in:
261
specs/mitaka/approved/servicegroup-api-control-plane.rst
Normal file
261
specs/mitaka/approved/servicegroup-api-control-plane.rst
Normal file
@@ -0,0 +1,261 @@
|
|||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
=======================================================
|
||||||
|
Servicegroup foundational refactoring for Control Plane
|
||||||
|
=======================================================
|
||||||
|
|
||||||
|
https://blueprints.launchpad.net/nova/+spec/servicegroup-api-control-plane
|
||||||
|
|
||||||
|
At present, there are various interfaces through which services data can
|
||||||
|
be manipulated - admin interface(nova-manage), extensions
|
||||||
|
(contrib/services.py), etc. Every interface relies on the servicegroup
|
||||||
|
layer API is_up() to get details about service liveness. The proposal
|
||||||
|
is keep service data in nova database nova.services table and fetch
|
||||||
|
the liveness information from the configured servicegroup(SG) driver.
|
||||||
|
Liveness will be a combination of service liveness and RPC liveness,
|
||||||
|
where the latter will be computed based on information in nova.services.
|
||||||
|
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
Nova's way for determining service liveness is not pluggable. In
|
||||||
|
its current state of art, the services information is stored in nova
|
||||||
|
database in nova.services tables. Whereas, the service liveness
|
||||||
|
information is computed by the is_up() call. This is_up() call is
|
||||||
|
implemented depending on what backend servicegroup driver is chosen.
|
||||||
|
|
||||||
|
Right now other SG drivers are not functional and need to be
|
||||||
|
revamped to allow them be involved in giving details about service
|
||||||
|
liveness. That will be covered as part of separate spec.
|
||||||
|
|
||||||
|
The scope of this spec is limited to :-
|
||||||
|
|
||||||
|
1. Making sure the 2 separate interfaces mentioned above namely the
|
||||||
|
REST API interface and admin interface use the servicegroup layer API
|
||||||
|
while fetching service liveness.
|
||||||
|
|
||||||
|
2. Service.is_up will be an attribute for the Service object which
|
||||||
|
will be computed as a combination of service liveness and rpc
|
||||||
|
liveness. Service liveness will be implemented by the respective
|
||||||
|
SG driver and depending whether the service is up/down a boolean will
|
||||||
|
be returned. Whereas to check rpc liveness Nova will still rely on
|
||||||
|
the nova.services table stored in Nova database.
|
||||||
|
|
||||||
|
Use Cases
|
||||||
|
---------
|
||||||
|
|
||||||
|
This is a refactoring effort and can be applicable for following
|
||||||
|
usecases:
|
||||||
|
|
||||||
|
1. As an operator, I want to use zookeeper to achieve quick detection
|
||||||
|
of service outages. Zookeeper servicegroup driver will be used to report
|
||||||
|
service liveness although service data will still reside in
|
||||||
|
nova.services.
|
||||||
|
|
||||||
|
2. As an operator, I want to be sure that when Nova service
|
||||||
|
delete (REST api) is invoked the service record from respective backend
|
||||||
|
either Zookeeper or Memcache is removed. A SG api to leave() the group, which
|
||||||
|
will be added as part of this change, will be invoked.
|
||||||
|
|
||||||
|
Deployment using Database servicegroup driver to manage service
|
||||||
|
liveness will remain the same apart from including the logic to include
|
||||||
|
is_up as service object attribute and computing as proposed in the
|
||||||
|
"Proposed change" section.
|
||||||
|
|
||||||
|
Project Priority
|
||||||
|
----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
1. Proposed change is to fetch service data from DB and verify service
|
||||||
|
liveness using configured SG driver. The details of how the service
|
||||||
|
liveness will be configured by each driver is upto the implementation
|
||||||
|
details of each SG driver. Point #3.2.2 has details for how the Zookeeper
|
||||||
|
SG will compute it.
|
||||||
|
|
||||||
|
2. By storing the Service data in the database but the service liveness
|
||||||
|
information can be managed by the respective servicegroup driver
|
||||||
|
configured. Also, we have things like service version, which will require
|
||||||
|
some efficient version calculations in order to drive things like compute
|
||||||
|
rpc version pinning and object backporting. That can be done efficiently
|
||||||
|
if the services data is stored in database (as database supports max/min
|
||||||
|
functionality) as opposed to storing in Zookeeper or Memcache.
|
||||||
|
|
||||||
|
3. Change for the service liveness API we can have two options like this :
|
||||||
|
|
||||||
|
def service_is_up(self, member):
|
||||||
|
"""Check if the given member is up."""
|
||||||
|
|
||||||
|
A] For DB SG driver:
|
||||||
|
|
||||||
|
#1. Check RPC liveness using the updated_at attribute
|
||||||
|
in nova.services table.
|
||||||
|
|
||||||
|
#2. Check Service liveness depending upon the SG database
|
||||||
|
drivers is_up() method.
|
||||||
|
|
||||||
|
B] For Zookeeper/Memcache SG drivers:
|
||||||
|
|
||||||
|
The Service object will be the interface by which we determine
|
||||||
|
whether a service is up or down. It will necessarily look up the
|
||||||
|
updated_at stamp like it does now, and will optionally consult an
|
||||||
|
external interface (such as zookeeper, memcache) through a defined
|
||||||
|
interface. The external interface depends on the kind of SG driver
|
||||||
|
configured as part of CONF.servicegroup_driver. If either indication
|
||||||
|
results in a "down" verdict, the service will be considered
|
||||||
|
down. Please note both the steps below will be needed to detect
|
||||||
|
service liveness.
|
||||||
|
|
||||||
|
#1. Check RPC liveness using the updated_at attribute
|
||||||
|
in nova.services table.
|
||||||
|
|
||||||
|
#2. Check Service liveness depending upon the Zookeeper
|
||||||
|
SG driver or Memcache SG driver is_up() method.
|
||||||
|
|
||||||
|
#2.1: If Znode for the compute host has not joined
|
||||||
|
the topic 'compute' then the nova-compute service
|
||||||
|
is not running on this compute host. The details
|
||||||
|
on how the Zookeeper ephemeral znode will maintain
|
||||||
|
the service representational state and how to migrate
|
||||||
|
from existing database servicegroup driver to
|
||||||
|
zookeeper/memcache SG driver has been covered
|
||||||
|
in https://review.openstack.org/#/c/138607.
|
||||||
|
|
||||||
|
4. A SG api to leave a group need to be introduced to the SG layer
|
||||||
|
and will be implemented by backend drivers. The drivers that don't need
|
||||||
|
the leave functionality will not provide any additional logic to
|
||||||
|
free up the service record associated with the service. For example
|
||||||
|
the znodes used to keep track of service when using Zookeeper SG driver
|
||||||
|
are ephemeral which means that they will be automatically deleted
|
||||||
|
when the service is deleted. But for other backends like memcache which
|
||||||
|
are key/value stores the record needs to be cleared off explicitly. The
|
||||||
|
api at the SG layer might look like :
|
||||||
|
|
||||||
|
def leave(self, group_id, member)
|
||||||
|
|
||||||
|
As mentioned above depending on the driver used this can be already supported
|
||||||
|
if not need to explicitly call out to clean up the service entry.
|
||||||
|
|
||||||
|
5. The call will now just invoke service.is_up to check rpc liveness and
|
||||||
|
service liveness. Whereas at the object layer is_up will be computed as
|
||||||
|
a combination of when the service record was last updated which will give
|
||||||
|
details about RPC liveness and querying the respective CONF.sg_driver
|
||||||
|
for service liveness.
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
REST API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Notifications impact
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Other end user impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Other deployer impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Developer impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Service liveness is fetched from configured SG driver where as
|
||||||
|
service details will be fetched from nova database nova.services
|
||||||
|
tables. RPC liveness will also be computed based on the data in
|
||||||
|
nova.services table.
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
vilobhmm
|
||||||
|
|
||||||
|
Other contributors:
|
||||||
|
jaypipes, harlowja
|
||||||
|
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
- Introduce an additional attribute is_up to nova.objects.service.
|
||||||
|
- Fix admin interface, nova-manage where a service is_up/is_down
|
||||||
|
will depend on the combination of service liveness depending on
|
||||||
|
what SG driver is configured and RPC liveness computed based of
|
||||||
|
information stored in nova.services table.
|
||||||
|
- Introduce leave() API at the SG layer to make sure when a service
|
||||||
|
is deleted in situations where service liveness is maintained by
|
||||||
|
backends other than db, the znode or the associated structure for
|
||||||
|
the service is freed up.
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
1. Existing unit tests will be updated to make sure the services
|
||||||
|
data is fetched from nova.services tables and service liveness
|
||||||
|
using servicegroup API.
|
||||||
|
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
- http://lists.openstack.org/pipermail/openstack-dev/2015-May/063602.html
|
||||||
|
- https://review.openstack.org/#/c/138607
|
||||||
|
- http://lists.openstack.org/pipermail/openstack-dev/2015-September/075267.html
|
||||||
|
|
||||||
|
.. _etherpad: https://etherpad.openstack.org/p/servicegroup-refactoring
|
||||||
Reference in New Issue
Block a user