diff --git a/specs/bay-drivers.rst b/specs/bay-drivers.rst new file mode 100644 index 0000000000..57bc7fcef8 --- /dev/null +++ b/specs/bay-drivers.rst @@ -0,0 +1,344 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +====================================== +Container Orchestration Engine drivers +====================================== + +Launchpad blueprint: + +https://blueprints.launchpad.net/magnum/+spec/bay-drivers + +Container Orchestration Engines (COEs) are different systems for managing +containerized applications in a clustered environment, each having their own +conventions and ecosystems. Three of the most common, which also happen to be +supported in Magnum, are: Docker Swarm, Kubernetes, and Mesos. In order to +successfully serve developers, Magnum needs to be able to provision and manage +access to the latest COEs through its API in an effective and scalable way. + + +Problem description +=================== + +Magnum currently supports the three most popular COEs, but as more emerge and +existing ones change, it needs an effective and scalable way of managing +them over time. + +One of the problems with the current implementation is that COE-specific logic, +such as Kubernetes replication controllers and services, is situated in the +core Magnum library and made available to users through the main API. Placing +COE-specific logic in a core API introduces tight coupling and forces +operators to work with an inflexible design. + +By formalising a more modular and extensible architecture, Magnum will be +in a much better position to help operators and consumers satisfy custom +use-cases. + +Use cases +--------- + +1. Extensibility. Contributors and maintainers need a suitable architecture to + house current and future COE implementations. Moving to a more extensible + architecture, where core classes delegate to drivers, provides a more + effective and elegant model for handling COE differences without the need + for tightly coupled and monkey-patched logic. + + One of the key use cases is allowing operators to customise their + orchestration logic, such as modifying Heat templates or even using their + own tooling like Ansible. Moreover, operators will often expect to use a + custom distro image with lots of software pre-installed and many special + security requirements that is extremely difficult or impossible to do with + the current upstream templates. COE drivers solves these problems. + +2. Maintainability. Moving to a modular architecture will be easier to manage + in the long-run because the responsibility of maintaining non-standard + implementations is shifted into the operator's domain. Maintaining the + default drivers which are packaged with Magnum will also be easier and + cleaner since logic is now demarcated from core codebase directories. + +3. COE & Distro choice. In the community there has been a lot of discussion + about which distro and COE combination to support with the templates. + Having COE drivers allows for people or organizations to maintain + distro-specific implementations (e.g CentOS+Kubernetes). + +4. Addresses dependency concerns. One of the direct results of + introducing a driver model is the ability to give operators more freedom + about choosing how Magnum integrates with the rest of their OpenStack + platform. For example, drivers would remove the necessity for users to + adopt Barbican for secret management. + +5. Driver versioning. The new driver model allows operators to modify existing + drivers or creating custom ones, release new bay types based on the newer + version, and subsequently launch news bays running the updated + functionality. Existing bays which are based on older driver versions would + be unaffected in this process and would still be able to have lifecycle + operations performed on them. If one were to list their details from the + API, it would reference the old driver version. An operator can see which + driver version a bay type is based on through its ``driver`` value, + which is exposed through the API. + +Proposed change +=============== + +1. The creation of new directory at the project root: ``./magnum/drivers``. + Each driver will house its own logic inside its own directory. Each distro + will house its own logic inside that driver directory. For example, the + Fedora Atomic distro using Swarm will have the following directory + structure: + + :: + + drivers/ + swarm_atomic_v1/ + image/ + ... + templates/ + ... + api.py + driver.py + monitor.py + scale.py + template_def.py + version.py + + + The directory name should be a string which uniquely identifies the driver + and provides a descriptive reference. The driver version number and name are + provided in the manifest file and will be included in the bay metadata at + cluster build time. + + There are two workflows for rolling out driver updates: + + - if the change is relatively minor, they modify the files in the + existing driver directory and update the version number in the manifest + file. + + - if the change is significant, they create a new directory + (either from scratch or by forking). + + Further explanation of the three top-level files: + + - an ``image`` directory is *optional* and should contain documentation + which tells users how to build the image and register it to glance. This + directory can also hold artifacts for building the image, for instance + diskimagebuilder elements, scripts, etc. + + - a ``templates`` directory is *required* and will (for the forseeable + future) store Heat template YAML files. In the future drivers will allow + operators to use their own orchestration tools like Ansible. + + - ``api.py`` is *optional*, and should contain the API controller which + handles custom API operations like Kubernetes RCs or Pods. It will be + this class which accepts HTTP requests and delegates to the Conductor. It + should contain a uniquely named class, such as ``SwarmAtomicXYZ``, which + extends from the core controller class. The COE class would have the + opportunity of overriding base methods if necessary. + + - ``driver.py`` is *required*, and should contain the logic which maps + controller actions to COE interfaces. These include: ``bay_create``, + ``bay_update``, ``bay_delete``, ``bay_rebuild``, ``bay_soft_reboot`` and + ``bay_hard_reboot``. + + - ``version.py`` is *required*, and should contain the version number of + the bay driver. This is defined by a ``version`` attribute and is + represented in the ``1.0.0`` format. It should also include a ``Driver`` + attribute and should be a descriptive name such as ``swarm_atomic``. + + Due to the varying nature of COEs, it is up to the bay + maintainer to implement this in their own way. Since a bay is a + combination of a COE and an image, ``driver.py`` will also contain + information about the ``os_distro`` property which is expected to be + attributed to Glance image. + + - ``monitor.py`` is *optional*, and should contain the logic which monitors + the resource utilization of bays. + + - ``template_def.py`` is *required* and should contain the COE's + implementation of how orchestration templates are loaded and matched to + Magnum objects. It would probably contain multiple classes, such as + ``class SwarmAtomicXYZTemplateDef(BaseTemplateDefinition)``. + + - ``scale.py`` is *optional* per bay specification and should contain the + logic for scaling operations. + +2. Renaming the ``coe`` attribute of BayModel to ``driver``. Because this + value would determine which driver classes and orchestration templates to + load, it would need to correspond to the name of the driver as it is + registered with stevedore_ and setuptools entry points. + + During the lifecycle of an API operation, top-level Magnum classes (such as + a Bay conductor) would then delegate to the driver classes which have been + dynamically loaded. Validation will need to ensure that whichever value + is provided by the user is correct. + + By default, drivers are located under the main project directory and their + namespaces are accessible via ``magnum.drivers.foo``. But a use case that + needs to be looked at and, if possible, provided for is drivers which are + situated outside the project directory, for example in + ``/usr/share/magnum``. This will suit operators who want greater separation + between customised code and Python libraries. + +3. The driver implementations for the 3 current COE and Image combinations: + Docker Swarm Fedora, Kubernetes Fedora, Kubernetes CoreOS, and Mesos + Ubuntu. Any templates would need to be moved from + ``magnum/templates/{coe_name}`` to + ``magnum/drivers/{driver_name}/templates``. + +4. Removal of the following files: + + :: + + magnum/magnum/conductor/handlers/ + docker_conductor.py + k8s_conducter.py + +Design Principles +----------------- + +- Minimal, clean API without a high cognitive burden + +- Ensure Magnum's priority is to do one thing well, but allow extensibility + by external contributors + +- Do not force ineffective abstractions that introduce feature divergence + +- Formalise a modular and loosely coupled driver architecture that removes + COE logic from the core codebase + + +Alternatives +------------ + +This alternative relates to #5 of Proposed Change. Instead of having a +drivers registered using stevedore_ and setuptools entry points, an alternative +is to use the Magnum config instead. + + +Data model impact +----------------- + +Since drivers would be implemented for the existing COEs, there would be +no loss of functionality for end-users. + + +REST API impact +--------------- + +Attribute change when creating and updating a BayModel (``coe`` to +``driver``). This would occur before v1 of the API is frozen. + +COE-specific endpoints would be removed from the core API. + + +Security impact +--------------- + +None + + +Notifications impact +-------------------- + +None + + +Other end user impact +--------------------- + +There will be deployer impacts because deployers will need to select +which drivers they want to activate. + + +Performance Impact +------------------ + +None + + + +Other deployer impact +--------------------- + +In order to utilize new functionality and bay drivers, operators will need +to update their installation and configure bay models to use a driver. + + +Developer impact +---------------- + +Due to the significant impact on the current codebase, a phased implementation +approach will be necessary. This is defined in the Work Items section. + +Code will be contributed for COE-specific functionality in a new way, and will +need to abide by the new architecture. Documentation and a good first +implementation will play an important role in helping developers contribute +new functionality. + + +Implementation +============== + + +Assignee(s) +----------- + +Primary assignee: +murali-allada + +Other contributors: +jamiehannaford +strigazi + + +Work Items +---------- + +1. New ``drivers`` directory + +2. Change ``coe`` attribute to ``driver`` + +3. COE drivers implementation (swarm-fedora, k8s-fedora, k8s-coreos, + mesos-ubuntu). Templates should remain in directory tree until their + accompanying driver has been implemented. + +4. Delete old conductor files + +5. Update client + +6. Add documentation + +7. Improve user experience for operators of forking/creating new + drivers. One way we could do this is by creating new client commands or + scripts. This is orthogonal to this spec, and will be considered after + its core implementation. + +Dependencies +============ + +None + + +Testing +======= + +Each commit will be accompanied with unit tests, and Tempest functional tests. + + +Documentation Impact +==================== + +A set of documentation for this architecture will be required. We should also +provide a developer guide for creating a new bay driver and updating existing +ones. + + +References +========== + +`Using Stevedore in your Application +`_. + +.. _stevedore: http://docs.openstack.org/developer/stevedore/