Deploy cloud hypervisor type
Proposes a method for varying the hypervisor type used by Nova in the deploy cloud to deploy overcloud services from baremetal/ironic to any other hypervisor type supported by Nova. In particular, containers and Docker make this deployment attractive due to their lightweight nature. Change-Id: I7d153b0341e148274590d3d049cf0e2141369e58
This commit is contained in:
parent
35c975a2b3
commit
75de7b236d
258
specs/juno/tripleo-juno-deploy-cloud-hypervisor-type.rst
Normal file
258
specs/juno/tripleo-juno-deploy-cloud-hypervisor-type.rst
Normal file
@ -0,0 +1,258 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
====================================
|
||||
TripleO Deploy Cloud Hypervisor Type
|
||||
====================================
|
||||
|
||||
# TODO: file the actual blueprint...
|
||||
https://blueprints.launchpad.net/tripleo/+spec/tripleo-deploy-cloud-hypervisor-type
|
||||
|
||||
The goal of this spec is to detail how the TripleO deploy cloud type could be
|
||||
varied from just baremetal to baremetal plus other hypervisors to deploy
|
||||
Overcloud services.
|
||||
|
||||
Linux kernel containers make this approach attractive due to the lightweight
|
||||
nature that services and process can be virtualized and isolated, so it seems
|
||||
likely that libvirt+lxc and Docker would be likely targets. However we should
|
||||
aim to make this approach as agnostic as possible for those deployers who may
|
||||
wish to use any Nova driver, such as libvirt+kvm.
|
||||
|
||||
Problem Description
|
||||
===================
|
||||
|
||||
The overcloud control plane is generally lightly loaded and allocation of
|
||||
entire baremetal machines to it is wasteful. Also, when the Overcloud services
|
||||
are running entirely on baremetal they take longer to upgrade and rollback.
|
||||
|
||||
Proposed Change
|
||||
===============
|
||||
|
||||
We should support any Nova virtualization type as a target for Overcloud
|
||||
services, as opposed to using baremetal nodes to deploy overcloud images.
|
||||
Containers are particularly attractive because they are lightweight, easy to
|
||||
upgrade/rollback and offer similar isolation and security as full VM's. For the
|
||||
purpose of this spec, the alternate Nova virtualization target for the
|
||||
Overcloud will be referred to as alt-hypervisor. alt-hypervisor could be
|
||||
substituted with libvirt+lxc, Docker, libvirt+kvm, etc.
|
||||
|
||||
At a minimum, we should support running each Overcloud service in isolation in
|
||||
its own alt-hypervisor instance in order to be as flexible as possible to deployer
|
||||
needs. We should also support combining services.
|
||||
|
||||
In order to make other alt-hypervisors available as deployment targets for the
|
||||
Overcloud, we need additional Nova Compute nodes/services configured to use
|
||||
alt-hypervisors registered with the undercloud Nova.
|
||||
|
||||
Additionally, the undercloud must still be running a Nova compute with the
|
||||
ironic driver in order to allow for scaling itself out to add additional
|
||||
undercloud compute nodes.
|
||||
|
||||
To accomplish this, we can run 2 Nova compute processes on each undercloud
|
||||
node. One configured with Nova+Ironic and one configured with
|
||||
Nova+alt-hypervisor. For the straight baremetal deployment, where an alternate
|
||||
hypervisor is not desired, the additional Nova compute process would not be
|
||||
included. This would be accomplished via the standard inclusion/exclusion of
|
||||
elements during a diskimage-builder tripleo image build.
|
||||
|
||||
It will also be possible to build and deploy just an alt-hypervisor compute
|
||||
node that is registered with the Undercloud as an additional compute node.
|
||||
|
||||
To minimize the changes needed to the elements, we will aim to run a full init
|
||||
stack in each alt-hypervisor instance, such as systemd. This will allow all the
|
||||
services that we need to also be running in the instance (cloud-init,
|
||||
os-collect-config, etc). It will also make troubleshooting similar to the
|
||||
baremetal process in that you'd be able to ssh to individual instances, read
|
||||
logs, restart services, turn on debug mode, etc.
|
||||
|
||||
To handle Neutron network configuration for the Overcloud, the Overcloud
|
||||
neutron L2 agent will have to be on a provider network that is shared between
|
||||
the hypervisors. VLAN provider networks will have to be modeled in Neutron and
|
||||
connected to alt-hypervisor instances.
|
||||
|
||||
Overcloud compute nodes themselves would be deployed to baremetal nodes. These
|
||||
images would be made up of:
|
||||
* libvirt+kvm (assuming this is the hypervisor choice for the Overcloud)
|
||||
* nova-compute + libvirt+kvm driver (registered to overcloud control).
|
||||
* neutron-l2-agent (registered to overcloud control)
|
||||
An image with those contents is deployed to a baremetal node via nova+ironic
|
||||
from the undercloud.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Deployment from the seed
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
An alternative to having the undercloud deploy additional alt-hypervisor
|
||||
compute nodes would be to register additional baremetal nodes with the seed vm,
|
||||
and then describe an undercloud stack in a template that is the undercloud
|
||||
controller and its set of alt-hypervisor compute nodes. When the undercloud
|
||||
is deployed via the seed, all of the nodes are set up initially.
|
||||
|
||||
The drawback with that approach is that the seed is meant to be short-lived in
|
||||
the long term. So, it then becomes difficult to scale out the undercloud if
|
||||
needed. We could offer a hybrid of the 2 models: launch all nodes initially
|
||||
from the seed, but still have the functionality in the undercloud to deploy
|
||||
more alt-hypervisor compute nodes if needed.
|
||||
|
||||
The init process
|
||||
^^^^^^^^^^^^^^^^
|
||||
If running systemd in a container turns out to be problematic, it should be
|
||||
possible to run a single process in the container that starts just the
|
||||
OpenStack service that we care about. However that process would also need to
|
||||
do things like read Heat metadata. It's possible this process could be
|
||||
os-collect-config. This change would require more changes to the elements
|
||||
themselves however since they are so dependent on an init process currently in
|
||||
how they enable/restart services etc. It may be possible to replace os-svc-*
|
||||
with other tools that don't use systemd or upstart when you're building images
|
||||
for containers.
|
||||
|
||||
Security Impact
|
||||
---------------
|
||||
* We should aim for equivalent security when deploying to alt-hypervisor
|
||||
instances as we do when deploying to baremetal. To the best of our ability, it
|
||||
should not be possible to compromise the instance if an individual service is
|
||||
compromised.
|
||||
|
||||
* Since Overcloud services and Undercloud services would be co-located on the
|
||||
same baremetal machine, compromising the hypervisor and gaining access to the
|
||||
host is a risk to both the Undercloud and Overcloud. We should mitigate this
|
||||
risk to the best of our ability via things like SELinux, and removing all
|
||||
unecessary software/processes from the alt-hypervisor instances.
|
||||
|
||||
* Certain hypervisors are inherently more secure than others. libvirt+kvm uses
|
||||
virtualization and is much more secure then container based hypervisors such as
|
||||
libvirt+lxc and Docker which use namespacing.
|
||||
|
||||
Other End User Impact
|
||||
---------------------
|
||||
None. The impact of this change is limited to Deployers. End users should have
|
||||
no visibility into the actual infrastructure of the Overcloud.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
Ideally, deploying an overcloud to containers should result in a faster
|
||||
deployment than deploying to baremetal. Upgrading and downgrading the Overcloud
|
||||
should also be faster.
|
||||
|
||||
More images will have to be built via diskimage-builder however, which will
|
||||
take more time.
|
||||
|
||||
Other Deployer Impact
|
||||
---------------------
|
||||
The main impact to deployers will be the ability to use alt-hypervisors
|
||||
instances, such as containers if they wish. They also must understand how to
|
||||
use nova-baremetal/ironic on the undercloud to scale out the undercloud and add
|
||||
additional alt-hypervisor compute nodes if needed.
|
||||
|
||||
Additional space in the configured glance backend would also likely be needed
|
||||
to store additional images.
|
||||
|
||||
Developer Impact
|
||||
----------------
|
||||
* Developers working on TripleO will have the option of deploying to
|
||||
alt-hypervisor instances. This should make testing and developing on some
|
||||
aspects of TripleO easier due to the need for less vm's.
|
||||
|
||||
* More images will have to be built due to the greater potential variety with
|
||||
alt-hypervisor instances housing Overcloud services.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
james-slagle
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
tripleo-incubator
|
||||
^^^^^^^^^^^^^^^^^
|
||||
* document how to use an alternate hypervisor for the overcloud deployment
|
||||
** eventually, this could possibly be the default
|
||||
* document how to troubleshoot this type of deployment
|
||||
* need a user option or json property to describe if the devtest
|
||||
environment being set up should use an alternate hypervisor for the overcloud
|
||||
deployment or not. Consider using HEAT_ENV where appropriate.
|
||||
* load-image should be updated to add an additional optional argument that sets
|
||||
the hypervisor_type property on the loaded images in glance. The argument is
|
||||
optional and wouldn't need to be specified for some images, such as regular
|
||||
dib images that can run under KVM.
|
||||
* Document commands to setup-neutron for modeling provider VLAN networks.
|
||||
|
||||
tripleo-image-elements
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
* add new element for nova docker driver
|
||||
* add new element for docker registry (currently required by nova docker
|
||||
driver)
|
||||
* more hypervisor specific configuration files for the different nova compute
|
||||
driver elements
|
||||
** /etc/nova/compute/nova-kvm.conf
|
||||
** /etc/nova/compute/nova-baremetal.conf
|
||||
** /etc/nova/compute/nova-ironic.conf
|
||||
** /etc/nova/compute/nova-docker.conf
|
||||
* Separate configuration options per compute process for:
|
||||
** host (undercloud-kvm, undercloud-baremetal, etc).
|
||||
** state_path (/var/lib/nova-kvm, /var/lib/nova-baremetal, etc).
|
||||
* Maintain backwards compatibility in the elements by consulting both old and
|
||||
new heat metadata key namespaces.
|
||||
|
||||
tripleo-heat-templates
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
* Split out heat metadata into separate namespaces for each compute process
|
||||
configuration.
|
||||
* For the vlan case, update templates for any network modeling for
|
||||
alt-hypervisor instances so that those instances have correct interfaces
|
||||
attached to the vlan network.
|
||||
|
||||
diskimage-builder
|
||||
^^^^^^^^^^^^^^^^^
|
||||
* add ability where needed to build new image types for alt-hypervisor
|
||||
** Docker
|
||||
** libvirt+lxc
|
||||
* Document how to build images for the new types
|
||||
|
||||
Dependencies
|
||||
============
|
||||
For Docker support, this effort depends on continued development on the nova
|
||||
Docker driver. We would need to drive any missing features or bug fixes that
|
||||
were needed in that project.
|
||||
|
||||
For other drivers that may not be as well supported as libvirt+kvm, we will
|
||||
also have to drive missing features there as well if we want to support them,
|
||||
such as libvirt+lxc, openvz, etc.
|
||||
|
||||
This effort also depends on the provider resource templates spec (unwritten)
|
||||
that will be done for the template backend for Tuskar. That work should be done
|
||||
in such a way that the provider resource templates are reusable for this effort
|
||||
as well in that you will be able to create templates to match the images that
|
||||
you intend to create for your Overcloud deployment.
|
||||
|
||||
Testing
|
||||
=======
|
||||
We would need a separate set of CI jobs that were configured to deploy an
|
||||
Overcloud to each alternate hypervisor that TripleO intended to support well.
|
||||
|
||||
For Docker support specifically, CI jobs could be considered non-voting since
|
||||
they'd rely on a stackforge project which isn't officially part of OpenStack.
|
||||
We could potentially make this job voting if TripleO CI was enabled on the
|
||||
stackforge/nova-docker repo so that changes there are less likely to break
|
||||
TripleO deployments.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
We should update the TripleO specific docs in tripleo-incubator to document how
|
||||
to use an alternate hypervisor for an Overcloud deployment.
|
||||
|
||||
References
|
||||
==========
|
||||
Juno Design Summit etherpad: https://etherpad.openstack.org/p/juno-summit-tripleo-and-docker
|
||||
nova-docker driver: https://git.openstack.org/cgit/stackforge/nova-docker
|
||||
Docker: https://www.docker.io/
|
||||
Docker github: https://github.com/dotcloud/docker
|
Loading…
Reference in New Issue
Block a user