Merge "Bare metal provisioning without Nova and Glance"
This commit is contained in:
commit
558297f172
|
@ -0,0 +1,638 @@
|
||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
=======================================
|
||||||
|
Provision nodes without Nova and Glance
|
||||||
|
=======================================
|
||||||
|
|
||||||
|
https://blueprints.launchpad.net/tripleo/+spec/nova-less-deploy
|
||||||
|
|
||||||
|
Currently TripleO undercloud uses Heat, Nova, Glance, Neutron and Ironic for
|
||||||
|
provisioning bare metal machines. This blueprint proposes excluding Heat, Nova
|
||||||
|
and Glance from this flow, removing Nova and Glance completely from the
|
||||||
|
undercloud.
|
||||||
|
|
||||||
|
Problem Description
|
||||||
|
===================
|
||||||
|
|
||||||
|
Making TripleO workflows use Ironic directly to provision nodes has quite a few
|
||||||
|
benefits:
|
||||||
|
|
||||||
|
#. First and foremost, getting rid of the horrible "no valid hosts found"
|
||||||
|
exception. The scheduling will be much simpler and the errors will be
|
||||||
|
clearer.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
This and many other problems with using Nova in the undercloud come from
|
||||||
|
the fact that Nova is cloud-oriented software, while the undercloud is
|
||||||
|
more of a traditional installer. In the "pet vs cattle" metaphore, Nova
|
||||||
|
handles the "cattle" case, while the undercloud is the "pet" case.
|
||||||
|
|
||||||
|
#. Also important for the generic provisioner case, we'll be able to get rid of
|
||||||
|
Nova and Glance, reducing the memory footprint.
|
||||||
|
|
||||||
|
#. We'll get rid of pre-deploy validations that currently try to guess what
|
||||||
|
Nova scheduler will expect.
|
||||||
|
|
||||||
|
#. We'll be able to combine nodes deployed by Ironic with pre-deployed servers.
|
||||||
|
|
||||||
|
#. We'll become in charge of building the configdrive, potentially putting more
|
||||||
|
useful things there.
|
||||||
|
|
||||||
|
#. Hopefully, scale-up will be less error-prone.
|
||||||
|
|
||||||
|
Also in the future we may be able to:
|
||||||
|
|
||||||
|
#. Integrate things like building RAID on demand much easier.
|
||||||
|
|
||||||
|
#. Use introspection data in scheduling and provisioning decisions.
|
||||||
|
Particularly, we can automate handling root device hints.
|
||||||
|
|
||||||
|
#. Make Neutron optional and use static DHCP and/or *os-net-config*.
|
||||||
|
|
||||||
|
Proposed Change
|
||||||
|
===============
|
||||||
|
|
||||||
|
Overview
|
||||||
|
--------
|
||||||
|
|
||||||
|
This blueprint proposes removal replacing the triad Heat-Nova-Glance with
|
||||||
|
Ironic driven directly by Mistral. To avoid placing Ironic-specific code into
|
||||||
|
tripleo-common, a new library metalsmith_ has been developed and accepted into
|
||||||
|
the Ironic governance.
|
||||||
|
|
||||||
|
As part of the implementation, this blueprint proposes completely separting the
|
||||||
|
bare metal provisioning process from software configuration, including the CLI
|
||||||
|
level. This has two benefits:
|
||||||
|
|
||||||
|
#. Having a clear separation between two error-prone processes simplifies
|
||||||
|
debugging for operators.
|
||||||
|
|
||||||
|
#. Reusing the existing *deployed-server* workflow simplifies the
|
||||||
|
implementation.
|
||||||
|
|
||||||
|
In the distant future, the functionality of metalsmith_ may be moved into
|
||||||
|
Ironic API itself. In this case it will be phased out, while keeping the same
|
||||||
|
Mistral workflows.
|
||||||
|
|
||||||
|
Operator workflow
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
As noted in Overview_, the CLI/GUI workflow will be split into hardware
|
||||||
|
provisioning and software configuration parts (the former being optional).
|
||||||
|
|
||||||
|
#. In addition to existing Heat templates, a new file
|
||||||
|
baremetal_deployment.yaml_ will be populated by an operator with the bare
|
||||||
|
metal provisioning information.
|
||||||
|
|
||||||
|
#. Bare metal deployment will be conducted by a new CLI command or GUI
|
||||||
|
operation using the new `deploy_roles workflow`_::
|
||||||
|
|
||||||
|
openstack overcloud node provision \
|
||||||
|
-o baremetal_environment.yaml baremetal_deployment.yaml
|
||||||
|
|
||||||
|
This command will take the input from baremetal_deployment.yaml_, provision
|
||||||
|
requested bare metal machines and output a Heat environment file
|
||||||
|
baremetal_environment.yaml_ to use with the *deployed-server* feature.
|
||||||
|
|
||||||
|
#. Finally, the regular deployment is done, including the generated file::
|
||||||
|
|
||||||
|
openstack overcloud deploy \
|
||||||
|
<other cli arguments> \
|
||||||
|
-e baremetal_environment.yaml \
|
||||||
|
-e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-environment.yaml \
|
||||||
|
-e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-bootstrap-environment-centos.yaml \
|
||||||
|
-r /usr/share/openstack-tripleo-heat-templates/deployed-server/deployed-server-roles-data.yaml
|
||||||
|
|
||||||
|
For simplicity the two commands can be combined::
|
||||||
|
|
||||||
|
openstack overcloud deploy \
|
||||||
|
<other cli arguments> \
|
||||||
|
-b baremetal_deployment.yaml \
|
||||||
|
-e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-environment.yaml \
|
||||||
|
-e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-bootstrap-environment-centos.yaml \
|
||||||
|
-r /usr/share/openstack-tripleo-heat-templates/deployed-server/deployed-server-roles-data.yaml
|
||||||
|
|
||||||
|
The new argument ``--baremetal-deployment``/``-b`` will accept the
|
||||||
|
baremetal_deployment.yaml_ and do the deployment automatically.
|
||||||
|
|
||||||
|
Breakdown of the changes
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
This section describes the required changes in depth.
|
||||||
|
|
||||||
|
Image upload
|
||||||
|
~~~~~~~~~~~~
|
||||||
|
|
||||||
|
As Glance will no longer be used, images will have to be served from other
|
||||||
|
sources. Ironic supports HTTP and file sources from its images. For the
|
||||||
|
undercloud case, the file source seems to be the most straightforward, also the
|
||||||
|
*Edge* case may require using HTTP images.
|
||||||
|
|
||||||
|
To make both cases possible, the ``openstack overcloud image upload`` command
|
||||||
|
will now copy the three overcloud images (``overcloud-full.qcow2``,
|
||||||
|
``overcloud-full.kernel`` and ``overcloud-full.ramdisk``) to
|
||||||
|
``/var/lib/ironic/httpboot/overcloud-images``. This will allow referring to
|
||||||
|
images both via ``file:///var/lib/ironic/httpboot/overcloud.images/...`` and
|
||||||
|
``http(s)://<UNDERCLOUD HOST>:<IPXE PORT>/overcloud-images/...``.
|
||||||
|
|
||||||
|
Finally, a checksum file will be generated from the copied images using::
|
||||||
|
|
||||||
|
cd /var/lib/ironic/httpboot/overcloud-images
|
||||||
|
md5sum overcloud-full.* > MD5SUMS
|
||||||
|
|
||||||
|
This is required since the checksums will no longer come from Glance.
|
||||||
|
|
||||||
|
baremetal_deployment.yaml
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This file will describe which the bare metal provisioning parameters. It will
|
||||||
|
provide the information that is currently implicitly deduced from the Heat
|
||||||
|
templates.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
We could continue extracting it from the templates well. However, a separate
|
||||||
|
file will avoid a dependency on any Heat-specific logic, potentially
|
||||||
|
benefiting standalone installer cases. It also provides the operators with
|
||||||
|
more control over the provisioning process.
|
||||||
|
|
||||||
|
The format of this file resembles one of the ``roles_data`` file. It describes
|
||||||
|
the deployment parameters for each role. The file contains a list of roles,
|
||||||
|
each with a ``name``. Other accepted parameters are:
|
||||||
|
|
||||||
|
``count``
|
||||||
|
number of machines to deploy for this role. Defaults to 1.
|
||||||
|
``profile``
|
||||||
|
profile (``compute``, ``control``, etc) to use for this role. Roughly
|
||||||
|
corresponds to a flavor name for a Nova based deployment. Defaults to no
|
||||||
|
profile (any node can be picked).
|
||||||
|
``hostname_format``
|
||||||
|
a template for generating host names. This is similar to
|
||||||
|
``HostnameFormatDefault`` of a ``roles_data`` file and should use
|
||||||
|
``%index%`` to number the nodes. The default is ``%stackname%-<role name in
|
||||||
|
lower case>-%index%``.
|
||||||
|
``instances``
|
||||||
|
list of instances in the format accepted by `deploy_instances workflow`_.
|
||||||
|
This allows to tune parameters per instance.
|
||||||
|
|
||||||
|
Examples
|
||||||
|
^^^^^^^^
|
||||||
|
|
||||||
|
Deploy one compute and one control with any profile:
|
||||||
|
|
||||||
|
.. code-block:: yaml
|
||||||
|
|
||||||
|
- name: Compute
|
||||||
|
- name: Controller
|
||||||
|
|
||||||
|
HA deployment with two computes and profile matching:
|
||||||
|
|
||||||
|
.. code-block:: yaml
|
||||||
|
|
||||||
|
- name: Compute
|
||||||
|
count: 2
|
||||||
|
profile: compute
|
||||||
|
hostname_format: compute-%index%.example.com
|
||||||
|
- name: Controller
|
||||||
|
count: 3
|
||||||
|
profile: control
|
||||||
|
hostname_format: controller-%index%.example.com
|
||||||
|
|
||||||
|
Advanced deployment with custom hostnames and parameters set per instance:
|
||||||
|
|
||||||
|
.. code-block:: yaml
|
||||||
|
|
||||||
|
- name: Compute
|
||||||
|
profile: compute
|
||||||
|
instances:
|
||||||
|
- hostname: compute-05.us-west.example.com
|
||||||
|
nics:
|
||||||
|
- network: ctlplane
|
||||||
|
fixed_ip: 10.0.2.5
|
||||||
|
traits:
|
||||||
|
- HW_CPU_X86_VMX
|
||||||
|
- hostname: compute-06.us-west.example.com
|
||||||
|
nics:
|
||||||
|
- network: ctlplane
|
||||||
|
fixed_ip: 10.0.2.5
|
||||||
|
traits:
|
||||||
|
- HW_CPU_X86_VMX
|
||||||
|
- name: Controller
|
||||||
|
profile: control
|
||||||
|
instances:
|
||||||
|
- hostname: controller-1.us-west.example.com
|
||||||
|
swap_size_mb: 4096
|
||||||
|
- hostname: controller-2.us-west.example.com
|
||||||
|
swap_size_mb: 4096
|
||||||
|
- hostname: controller-3.us-west.example.com
|
||||||
|
swap_size_mb: 4096
|
||||||
|
|
||||||
|
deploy_roles workflow
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The workflow ``tripleo.baremetal_deploy.v1.deploy_roles`` will accept the
|
||||||
|
information from baremetal_deployment.yaml_, convert it into the low-level
|
||||||
|
format accepted by the `deploy_instances workflow`_ and call the
|
||||||
|
`deploy_instances workflow`_ with it.
|
||||||
|
|
||||||
|
It will accept the following mandatory input:
|
||||||
|
|
||||||
|
``roles``
|
||||||
|
parsed baremetal_deployment.yaml_ file.
|
||||||
|
|
||||||
|
It will accept one optional input:
|
||||||
|
|
||||||
|
``plan``
|
||||||
|
plan/stack name, used for templating. Defaults to ``overcloud``.
|
||||||
|
|
||||||
|
It will return the same output as the `deploy_instances workflow`_ plus:
|
||||||
|
|
||||||
|
``environment``
|
||||||
|
the content of the generated baremetal_environment.yaml_ file.
|
||||||
|
|
||||||
|
Examples
|
||||||
|
^^^^^^^^
|
||||||
|
|
||||||
|
The examples from baremetal_deployment.yaml_ will be converted to:
|
||||||
|
|
||||||
|
.. code-block:: yaml
|
||||||
|
|
||||||
|
- hostname: overcloud-compute-0
|
||||||
|
- hostname: overcloud-controller-0
|
||||||
|
|
||||||
|
.. code-block:: yaml
|
||||||
|
|
||||||
|
- hostname: compute-0.example.com
|
||||||
|
profile: compute
|
||||||
|
- hostname: compute-1.example.com
|
||||||
|
profile: compute
|
||||||
|
- hostname: controller-0.example.com
|
||||||
|
profile: control
|
||||||
|
- hostname: controller-1.example.com
|
||||||
|
profile: control
|
||||||
|
- hostname: controller-2.example.com
|
||||||
|
profile: control
|
||||||
|
|
||||||
|
.. code-block:: yaml
|
||||||
|
|
||||||
|
- hostname: compute-05.us-west.example.com
|
||||||
|
nics:
|
||||||
|
- network: ctlplane
|
||||||
|
fixed_ip: 10.0.2.5
|
||||||
|
profile: compute
|
||||||
|
traits:
|
||||||
|
- HW_CPU_X86_VMX
|
||||||
|
- hostname: compute-06.us-west.example.com
|
||||||
|
nics:
|
||||||
|
- network: ctlplane
|
||||||
|
fixed_ip: 10.0.2.5
|
||||||
|
profile: compute
|
||||||
|
traits:
|
||||||
|
- HW_CPU_X86_VMX
|
||||||
|
- hostname: controller-1.us-west.example.com
|
||||||
|
profile: control
|
||||||
|
swap_size_mb: 4096
|
||||||
|
- hostname: controller-2.us-west.example.com
|
||||||
|
profile: control
|
||||||
|
swap_size_mb: 4096
|
||||||
|
- hostname: controller-3.us-west.example.com
|
||||||
|
profile: control
|
||||||
|
swap_size_mb: 4096
|
||||||
|
|
||||||
|
deploy_instances workflow
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The workflow ``tripleo.baremetal_deploy.v1.deploy_instances`` is a thin wrapper
|
||||||
|
around the corresponding metalsmith_ calls.
|
||||||
|
|
||||||
|
The following inputs are mandatory:
|
||||||
|
|
||||||
|
``instances``
|
||||||
|
list of requested instances in the format described in `Instance format`_.
|
||||||
|
``ssh_keys``
|
||||||
|
list of SSH public keys contents to put on the machines.
|
||||||
|
|
||||||
|
The following inputs are optional:
|
||||||
|
|
||||||
|
``ssh_user_name``
|
||||||
|
SSH user name to create, defaults to ``heat-admin`` for compatibility.
|
||||||
|
``timeout``
|
||||||
|
deployment timeout, defaults to 3600 seconds.
|
||||||
|
``concurrency``
|
||||||
|
deployment concurrency - how many nodes to deploy at the same time. Defaults
|
||||||
|
to 20, which matches introspection.
|
||||||
|
|
||||||
|
Instance format
|
||||||
|
^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
The instance record format closely follows one of the `metalsmith ansible
|
||||||
|
role`_ with only a few TripleO-specific additions and defaults changes.
|
||||||
|
|
||||||
|
Either or both of the following fields must be present:
|
||||||
|
|
||||||
|
``hostname``
|
||||||
|
requested hostname. It is used to identify the deployed instance later on.
|
||||||
|
Defaults to ``name``.
|
||||||
|
``name``
|
||||||
|
name of the node to deploy on. If ``hostname`` is not provided, ``name`` is
|
||||||
|
also used as the hostname.
|
||||||
|
|
||||||
|
The following fields will be supported:
|
||||||
|
|
||||||
|
``capabilities``
|
||||||
|
requested node capabilities (except for ``profile`` and ``boot_option``).
|
||||||
|
``conductor_group``
|
||||||
|
requested node's conductor group. This is primary for the *Edge* case when
|
||||||
|
nodes managed by the same Ironic can be physically separated.
|
||||||
|
``nics``
|
||||||
|
list of requested NICs, see metalsmith_ documentation for details. Defaults
|
||||||
|
to ``{"network": "ctlplane"}`` which requests creation of a port on the
|
||||||
|
``ctlplane`` network.
|
||||||
|
``profile``
|
||||||
|
profile to use (e.g. ``compute``, ``control``, etc).
|
||||||
|
``resource_class``
|
||||||
|
requested node's resource class, defaults to ``baremetal``.
|
||||||
|
``root_size_gb``
|
||||||
|
size of the root partition in GiB, defaults to 49.
|
||||||
|
``swap_size_mb``
|
||||||
|
size of the swap partition in MiB, if needed.
|
||||||
|
``traits``
|
||||||
|
list of requested node traits.
|
||||||
|
``whole_disk_image``
|
||||||
|
boolean, whether to treat the image (``overcloud-full.qcow2`` or provided
|
||||||
|
through the ``image`` field) as a whole disk image. Defaults to false.
|
||||||
|
|
||||||
|
The following fields will be supported, but the defaults should work for all
|
||||||
|
but the most extreme cases:
|
||||||
|
|
||||||
|
``image``
|
||||||
|
file or HTTP URL of the root partition or whole disk image.
|
||||||
|
``image_kernel``
|
||||||
|
file or HTTP URL of the kernel image (partition images only).
|
||||||
|
``image_ramdisk``
|
||||||
|
file or HTTP URL of the ramdisk image (partition images only).
|
||||||
|
``image_checksum``
|
||||||
|
checksum of URL of checksum of the root partition or whole disk image.
|
||||||
|
|
||||||
|
Certificate authority configuration
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
If TLS is used in the undercloud, we need to make the nodes trust
|
||||||
|
the Certificate Authority (CA) that signed the TLS certificates.
|
||||||
|
If ``/etc/pki/ca-trust/source/anchors/cm-local-ca.pem`` exists, it will be
|
||||||
|
included in the generated configdrive, so that the file is copied into the same
|
||||||
|
location on target systems.
|
||||||
|
|
||||||
|
Outputs
|
||||||
|
^^^^^^^
|
||||||
|
|
||||||
|
The workflow will provide the following outputs:
|
||||||
|
|
||||||
|
``ctlplane_ips``
|
||||||
|
mapping of host names to their respective IP addresses on the ``ctlplane``
|
||||||
|
network.
|
||||||
|
``instances``
|
||||||
|
mapping of host names to full instance representations with fields:
|
||||||
|
|
||||||
|
``node``
|
||||||
|
Ironic node representation.
|
||||||
|
``ip_addresses``
|
||||||
|
mapping of network names to list of IP addresses on them.
|
||||||
|
``hostname``
|
||||||
|
instance hostname.
|
||||||
|
``state``
|
||||||
|
`metalsmith instance state`_.
|
||||||
|
``uuid``
|
||||||
|
Ironic node uuid.
|
||||||
|
|
||||||
|
Also two subdicts of ``instances`` are provided:
|
||||||
|
|
||||||
|
``existing_instances``
|
||||||
|
only instances that already existed.
|
||||||
|
``new_instances``
|
||||||
|
only instances that were deployed.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
Instances are distinguised by their hostnames.
|
||||||
|
|
||||||
|
baremetal_environment.yaml
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This file will serve as an output of the bare metal provisioning process. It
|
||||||
|
will be fed into the overcloud deployment command. Its goal is to provide
|
||||||
|
information for the *deployed-server* workflow.
|
||||||
|
|
||||||
|
The file will contain the ``HostnameMap`` generated from role names and
|
||||||
|
hostnames, e.g.
|
||||||
|
|
||||||
|
.. code-block:: yaml
|
||||||
|
|
||||||
|
parameter_defaults:
|
||||||
|
HostnameMap:
|
||||||
|
overcloud-controller-0: controller-1.us-west.example.com
|
||||||
|
overcloud-controller-1: controller-2.us-west.example.com
|
||||||
|
overcloud-controller-2: controller-3.us-west.example.com
|
||||||
|
overcloud-novacompute-0: compute-05.us-west.example.com
|
||||||
|
overcloud-novacompute-1: compute-06.us-west.example.com
|
||||||
|
|
||||||
|
undeploy_instances workflow
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The workflow ``tripleo.baremetal_deploy.v1.undeploy_instances`` will take a
|
||||||
|
list of hostnames and undeploy the corresponding nodes.
|
||||||
|
|
||||||
|
Novajoin replacement
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
The *novajoin* service is currently used to enroll nodes into IPA and provide
|
||||||
|
them with TLS certificates. Unfortunately, it has hard dependencies on Nova,
|
||||||
|
Glance and Metadata API, even though the information could be provided via
|
||||||
|
other means. Actually, the metadata API cannot always be provided with Ironic
|
||||||
|
(notably, it may not be available when using isolated provisioning networks).
|
||||||
|
|
||||||
|
A potential solution is to provide the required information via a configdrive,
|
||||||
|
and make the nodes register themselves instead.
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
* Do nothing, continue to rely on Nova and work around cases when it does
|
||||||
|
match our goals well. See `Problem Description`_ for why it is not desired.
|
||||||
|
|
||||||
|
* Avoid metalsmith_, use OpenStack Ansible modules or Bifrost. They currently
|
||||||
|
lack features (such as VIF attach/detach API) and do not have any notion of
|
||||||
|
scheduling. Implementing sophisticated enough scheduling in pure Ansible
|
||||||
|
seems a serious undertaking.
|
||||||
|
|
||||||
|
* Avoid Mistral, drive metalsmith_ via Ansible. This is a potential future
|
||||||
|
direction of this work, but currently it seems much simpler to call
|
||||||
|
metalsmith_ Python API from Mistral actions. We would anyway need Mistral (
|
||||||
|
(or Ansible Tower) to drive Ansible, because we need some API level.
|
||||||
|
|
||||||
|
* Remove Neutron in the same change. Would reduce footprint even further, but
|
||||||
|
some operators may find the presence of an IPAM desirable. Also setting up
|
||||||
|
static DHCP would increase the scope of the implementation substantially and
|
||||||
|
complicate the upgrade even further.
|
||||||
|
|
||||||
|
* Keep Glance but remove Nova. Does not make much sense, since Glance is only a
|
||||||
|
requirement because of Nova. Ironic can deploy from HTTP or local file
|
||||||
|
locations just as well.
|
||||||
|
|
||||||
|
Security Impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
* Overcloud images will be exposed to unauthenticated users via HTTP. We need
|
||||||
|
to communicate it clearly that secrets must not be built into images in plain
|
||||||
|
text and should be delivered via *configdrive* instead. If it proves
|
||||||
|
a problem, we can limit ourselves to providing images via local files.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
This issue exists today, as images are transferred via insecure medium in
|
||||||
|
all supported deploy methods.
|
||||||
|
|
||||||
|
* Removing two services from the undercloud will reduce potential attack
|
||||||
|
surface and simplify audit.
|
||||||
|
|
||||||
|
Upgrade Impact
|
||||||
|
--------------
|
||||||
|
|
||||||
|
The initial version of this feature will be enabled for new deployments only.
|
||||||
|
|
||||||
|
The upgrade procedure will happen within a release, not between releases.
|
||||||
|
It will go roughly as follows:
|
||||||
|
|
||||||
|
#. Upgrade to a release where undercloud without Nova and Glance is supported.
|
||||||
|
|
||||||
|
#. Make a full backup of the undercloud.
|
||||||
|
|
||||||
|
#. Run ``openstack overcloud image upload`` to ensure that the
|
||||||
|
``overcloud-full`` images are available via HTTP(s).
|
||||||
|
|
||||||
|
The next steps will probably be automated via an Ansible playbook or a Mistral
|
||||||
|
workflow:
|
||||||
|
|
||||||
|
#. Mark deployed nodes *protected* in Ironic to prevent undeploying them
|
||||||
|
by mistake.
|
||||||
|
|
||||||
|
#. Run a Heat stack update replacing references to Nova servers with references
|
||||||
|
to deployed servers. This will require telling Heat not to remove the
|
||||||
|
instances.
|
||||||
|
|
||||||
|
#. Mark nodes as managed by *metalsmith* (optional, but simplifies
|
||||||
|
troubleshooting).
|
||||||
|
|
||||||
|
#. Update node's ``instance_info`` to refer to images over HTTP(s).
|
||||||
|
|
||||||
|
.. note:: This may require temporary moving nodes to maintenance.
|
||||||
|
|
||||||
|
#. Run an undercloud update removing Nova and Glance.
|
||||||
|
|
||||||
|
Other End User Impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
* Nova CLI will no longer be available for troubleshooting. It should not be a
|
||||||
|
big problem in reality, as most of the problems it is used for are caused by
|
||||||
|
using Nova itself.
|
||||||
|
|
||||||
|
metalsmith_ provides a CLI tool for troubleshooting and advanced users. We
|
||||||
|
will document using it for tasks like determining IP addresses of nodes.
|
||||||
|
|
||||||
|
* It will no longer be possible to update images via Glance API, e.g. from GUI.
|
||||||
|
It should not be a bit issue, as most of users use pre-built images. Advanced
|
||||||
|
operators are likely to resort to CLI anyway.
|
||||||
|
|
||||||
|
* *No valid host found* error will no longer be seen by operators. metalsmith_
|
||||||
|
provides more detailed errors, and is less likely to fail because of its
|
||||||
|
scheduling approach working better with the undercloud case.
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
* A substantial speed-up is expected for deployments because of removing
|
||||||
|
several layers of indirection. The new deployment process will also fail
|
||||||
|
faster if the scheduling request cannot be satisfied.
|
||||||
|
|
||||||
|
* Providing images via local files will remove the step of downloading them
|
||||||
|
from Glance, providing even more speed-up for larger images.
|
||||||
|
|
||||||
|
* An operator will be able to tune concurrency of deployment via CLI arguments
|
||||||
|
or GUI parameters, other than ``nova.conf``.
|
||||||
|
|
||||||
|
Other Deployer Impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Developer Impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
New features for bare metal provisioning will have to be developed with this
|
||||||
|
work in mind. It may mean implementing something in metalsmith_ code instead of
|
||||||
|
relying on Nova servers or flavors, or Glance images.
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
Dmitry Tantsur, IRC: dtantsur, LP: divius
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
Phase 1 (Stein, technical preview):
|
||||||
|
|
||||||
|
#. Update ``openstack overcloud image upload`` to copy images into the HTTP
|
||||||
|
location and generate checksums.
|
||||||
|
|
||||||
|
#. Implement `deploy_instances workflow`_ and `undeploy_instances workflow`_.
|
||||||
|
|
||||||
|
#. Update validations to not fail if Nova and/or Glance are not present.
|
||||||
|
|
||||||
|
#. Implement `deploy_roles workflow`_.
|
||||||
|
|
||||||
|
#. Provide CLI commands for the created workflows.
|
||||||
|
|
||||||
|
#. Provide an experimental OVB CI job exercising the new approach.
|
||||||
|
|
||||||
|
Phase 2 (T+, fully supported):
|
||||||
|
|
||||||
|
#. Update ``openstack overcloud deploy`` to support the new workflow.
|
||||||
|
|
||||||
|
#. Support scaling down.
|
||||||
|
|
||||||
|
#. Provide a `Novajoin replacement`_.
|
||||||
|
|
||||||
|
#. Provide an upgrade workflow.
|
||||||
|
|
||||||
|
#. Consider deprecating provisioning with Nova and Glance.
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
* metalsmith_ library will be used for easier access to Ironic+Neutron API.
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
Since testing this feature requires bare metal provisioning, a new OVB job will
|
||||||
|
be created for it. Initially it will be experimental, and will move to the
|
||||||
|
check queue before the feature is considered fully supported.
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
Documentation will have to be reworked to explain the new deployment approach.
|
||||||
|
Troubleshooting documentation will have to be updated.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
.. _metalsmith: https://docs.openstack.org/metalsmith/latest/
|
||||||
|
.. _metalsmith ansible role: https://docs.openstack.org/metalsmith/latest/user/ansible.html#instance
|
||||||
|
.. _metalsmith instance state: https://docs.openstack.org/metalsmith/latest/reference/api/metalsmith.html#metalsmith.Instance.state
|
Loading…
Reference in New Issue