diff --git a/specs/etcd-coordination.rst b/specs/etcd-coordination.rst new file mode 100644 index 0000000..ee0dbc8 --- /dev/null +++ b/specs/etcd-coordination.rst @@ -0,0 +1,175 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +======================================== +Incorporate ETCD as service coordination +======================================== + +https://storyboard.openstack.org/#!/story/2001842 + +This spec is part of the ironic-inspector HA work. To further split the +inspector service, this spec proposes to introduce etcd as the base service +for the coordination between ironic-inspector api and conductor services. + +Problem description +=================== + +From the previous work, the single process ironic-inspector is logically +splitted into two services both running under ``oslo.service``, namely +``ironic_inspector`` and ``ironic-inspector-conductor``. + +To split two services into two processes, we need to address existing +functional test issue before we can split two services into respective +executables. Currently the functional test uses fake messaging driver +which only works for single process, we can either add rabbitmq support +for functional test env or introduce another messaging mechanism like +``json-rpc``, but the first solution is not desirable. + +Even when services are splitted, we are facing the challenge of service +coordination, for multiple inspector conductor services, we need a way to +prevent the racing of concurrent operation on the same node, or to choose +which inspector conductor should the request be delivered to. + + +Proposed change +=============== + +As etcd is already a base service for the OpenStack platform, the spec +proposes to add ``python-etcd3`` and ``tooz`` as project requirements for the +service coordination. ``tooz`` provides several feature encapsulations like +group management, locking, etc. Group management is only implemented for ETCD +API v3, thus ``python-etcd3`` is required. + +All proposed work is implemented with tooz interfaces. Each service will +create a coordinator and keep heartbeating, the example workflow for +ironic-inspector API service: + +#. Create a coordinator with hostname +#. Create a group "ironic-inspector-service-group", bypass if the group + already exists. +#. Query query group members upon API request, randomly pick one conductor, + generate topic according to hostname and send rpc request. + +The example workflow for ironic-inspector conductor service: + +#. Create a coordinator with hostname +#. Join group "ironic-inspector-service-group", create and join if the + group does not exist. +#. Leaving group explicitly when service is shutdown. + +There is no distributed locking support for ironic-inspector, this spec will +introduce an abstract lock layer, and implement locking support based on tooz. + + +Alternatives +------------ + +Though it's totally workable to utilize database as the the coordination +source just like ironic, it would be much lighter if implemented with tooz. +tooz also supports multiple backends, which brings more possibilities in +deployement. + +Data model impact +----------------- + +None. + +HTTP API impact +--------------- + +None. + +Client (CLI) impact +------------------- + +None. + +Ironic python agent impact +-------------------------- + +None. + +Ironic impact +------------- + +None. + +Performance and scalability impact +---------------------------------- + +There should be no obvious performance and scalability impact before services +are actually splitted. + +Security impact +--------------- + +None. + +Deployer impact +--------------- + +A new configuration section ``etcd`` with options below will be added to +support etcd operation: + +* ``host`` and ``port``: specify the etcd service endpoint. +* ``ca_cert``, ``cert_key`` and ``cert_cert``: specify SSL related + authentication. +* ``timeout``: connection timeout per request. +* ``user`` and ``password``: the username and password if etcd authentication + is required. +* ``group_path``: the name of service group used to coordinate inspector + services, it can be a key path, a key prefix or both. By default, the value + will be ``/openstack/ironic-inspector/service-group``. +* ``lock_prefix``: a string prefix for a lock name, for example, locking a node + ``fake-node-uuid`` with prefix ``ironic-inspector`` will have a lock name of + ``ironic-inspector.fake-node-uuid`` passed to tooz. + + +Developer impact +---------------- + +None. + +Upgrades and Backwards Compatibility +------------------------------------ + +After this spec is implemented, etcd v3 will be a mandatory requirement for +inspector service working properly. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + kaifeng - kaifeng.w@gmail.com + +Other contributors: + None + +Work Items +---------- + +Implement proposed work. + + +Dependencies +============ + +``python-etcd3`` and ``tooz`` are required library support. +There should be a etcd v3 service running in the same cloud. + +Testing +======= + +Will be covered by unittest and bifrost. + +References +========== + +https://docs.openstack.org/tooz/latest/user/index.html +