Merge "Reference architecture: common bits"
This commit is contained in:
commit
c13e50de88
doc/source/install
@ -52,6 +52,8 @@ If there are slow or unresponsive BMCs in the environment, the
|
||||
need to be raised. The default is fairly conservative, as setting this timeout
|
||||
too low can cause older BMCs to crash and require a hard-reset.
|
||||
|
||||
.. _ipmi-sensor-data:
|
||||
|
||||
Collecting sensor data
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
@ -13,6 +13,7 @@ It contains the following sections:
|
||||
:maxdepth: 2
|
||||
|
||||
get_started.rst
|
||||
refarch/index
|
||||
install.rst
|
||||
configure-integration.rst
|
||||
deploy-ramdisk.rst
|
||||
|
327
doc/source/install/refarch/common.rst
Normal file
327
doc/source/install/refarch/common.rst
Normal file
@ -0,0 +1,327 @@
|
||||
Common Considerations
|
||||
=====================
|
||||
|
||||
This section covers considerations that are equally important to all described
|
||||
architectures.
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
Components
|
||||
----------
|
||||
|
||||
As explained in :doc:`../get_started`, the Bare Metal service has three
|
||||
components.
|
||||
|
||||
* The Bare Metal API service (``ironic-api``) should be deployed in a similar
|
||||
way as the control plane API services. The exact location will depend on the
|
||||
architecture used.
|
||||
|
||||
* The Bare Metal conductor service (``ironic-conductor``) is where most of the
|
||||
provisioning logic lives. The following considerations are the most
|
||||
important when deciding on the way to deploy it:
|
||||
|
||||
* The conductor manages a certain proportion of nodes, distributed to it
|
||||
via a hash ring. This includes constantly polling these nodes for their
|
||||
current power state and hardware sensor data (if enabled and supported
|
||||
by hardware, see :ref:`ipmi-sensor-data` for an example).
|
||||
|
||||
* The conductor needs access to the `management controller`_ of each node
|
||||
it manages.
|
||||
|
||||
* The conductor co-exists with TFTP (for PXE) and/or HTTP (for iPXE) services
|
||||
that provide the kernel and ramdisk to boot the nodes. The conductor
|
||||
manages them by writing files to their root directories.
|
||||
|
||||
* If serial console is used, the conductor launches console processes
|
||||
locally. If the nova-serialproxy service (part of the Compute service)
|
||||
it used, it has to be able to reach them. Otherwise, they have to be
|
||||
directly accessible by the users.
|
||||
|
||||
* There has to be mutual connectivity between the conductor and the nodes
|
||||
being deployed or cleaned. See Networking_ for details.
|
||||
|
||||
* The provisioning ramdisk which runs the ``ironic-python-agent`` service
|
||||
on start up.
|
||||
|
||||
.. warning::
|
||||
The ``ironic-python-agent`` service is not intended to be used anywhere
|
||||
other than a provisioning/cleaning ramdisk.
|
||||
|
||||
Hardware and drivers
|
||||
--------------------
|
||||
|
||||
The Bare Metal service strives to provide the best support possible for a
|
||||
variety of hardware. However, not all hardware is supported equally well.
|
||||
It depends on both the capabilities of hardware itself and the available
|
||||
drivers. This section covers various considerations related to the hardware
|
||||
interfaces. See :doc:`/install/enabling-drivers` for a detailed introduction
|
||||
into hardware types and interfaces before proceeding.
|
||||
|
||||
Power and management interfaces
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The minimum set of capabilities that the hardware has to provide and the
|
||||
driver has to support is as follows:
|
||||
|
||||
#. getting and setting the power state of the machine
|
||||
#. getting and setting the current boot device
|
||||
#. booting an image provided by the Bare Metal service (in the simplest case,
|
||||
support booting using PXE_ and/or iPXE_)
|
||||
|
||||
.. note::
|
||||
Strictly speaking, it is possible to make the Bare Metal service provision
|
||||
nodes without some of these capabilities via some manual steps. It is not
|
||||
the recommended way of deployment, and thus it is not covered in this
|
||||
guide.
|
||||
|
||||
Once you make sure that the hardware supports these capabilities, you need to
|
||||
find a suitable driver. Most of enterprise-grade hardware has support for
|
||||
IPMI_ and thus can utilize :doc:`/admin/drivers/ipmitool`. Some newer hardware
|
||||
also supports :doc:`/admin/drivers/redfish`. Several vendors
|
||||
provide more specific drivers that usually provide additional capabilities.
|
||||
Check :doc:`/admin/drivers` to find the most suitable one.
|
||||
|
||||
Boot interface
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
The boot interface of a node manages booting of both the deploy ramdisk and
|
||||
the user instances on the bare metal node. The deploy interface orchestrates
|
||||
the deployment and defines how the image gets transferred to the target disk.
|
||||
|
||||
The ``pxe`` boot interface uses PXE_ or iPXE_ to deliver the target
|
||||
kernel/ramdisk pair. PXE uses relatively slow and unreliable TFTP protocol
|
||||
for transfer, while iPXE uses HTTP. The downside of iPXE is that it's less
|
||||
common, and usually requires bootstrapping using PXE first. It is recommended
|
||||
to use iPXE, when it is supported by target hardware, see
|
||||
:doc:`../configure-pxe` for details.
|
||||
|
||||
.. note::
|
||||
Both PXE and iPXE are configured differently, when UEFI boot is used
|
||||
instead of conventional BIOS boot. This is particularly important for CPU
|
||||
architectures that do not have BIOS support at all.
|
||||
|
||||
Alternatively, several vendors provide *virtual media* implementations of the
|
||||
boot interface. They work by pushing an ISO image to the node's `management
|
||||
controller`_, and do not require either PXE or iPXE. If such boot
|
||||
implementation is available for the hardware, it is recommended using it
|
||||
for better scalability and security. Check your driver documentation at
|
||||
:doc:`/admin/drivers` for details.
|
||||
|
||||
Deploy interface
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
There are two deploy interfaces in-tree, ``iscsi`` and ``direct``. See
|
||||
:doc:`../enabling-drivers` for explanation of the difference. With the
|
||||
``iscsi`` deploy method, most of the deployment operations happen on the
|
||||
conductor. If the Object Storage service (swift) or RadosGW is present in
|
||||
the cloud, it is recommended to use the ``direct`` deploy method for better
|
||||
scalability and reliability.
|
||||
|
||||
Hardware specifications
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The Bare Metal services does not impose too many restrictions on the
|
||||
characteristics of hardware itself. However, keep in mind that
|
||||
|
||||
* By default, the Bare Metal service will pick the smallest hard drive that
|
||||
is large than 4 GiB for deployment. A smaller hard drive can be used, but
|
||||
requires setting :ref:`root device hints <root-device-hints>`.
|
||||
|
||||
* The machines should have enough RAM to fit the deployment/cleaning ramdisk
|
||||
to run. The minimum varies greatly depending on the way the ramdisk was
|
||||
built. For example, *tinyipa*, the TinyCoreLinux-based ramdisk used in the
|
||||
CI, only needs 400 MiB of RAM, while ramdisks built by *diskimage-builder*
|
||||
may require 3 GiB or more.
|
||||
|
||||
Image types
|
||||
-----------
|
||||
|
||||
The Bare Metal service can deploy two types of images:
|
||||
|
||||
* *Whole-disk* images contain a complete partitioning table with all necessary
|
||||
partitions. Such images are the most universal, but may be harder to build.
|
||||
|
||||
* *Partition images* contain only the root partition. The Bare Metal service
|
||||
will create the necessary partitions and install a boot loader, if needed.
|
||||
|
||||
.. warning::
|
||||
Partition images are only supported with GNU/Linux operating systems,
|
||||
and requires the GRUB2 bootloader to be present on the root image.
|
||||
|
||||
Local vs network boot
|
||||
---------------------
|
||||
|
||||
The Bare Metal service supports booting user instances either using a local
|
||||
bootloader or using the driver's boot interface (e.g. via PXE_ or iPXE_
|
||||
protocol in case of the ``pxe`` interface).
|
||||
|
||||
Network boot cannot be used with certain architectures (for example, when no
|
||||
tenant networks have access to the control plane).
|
||||
|
||||
Additional considerations are related to the ``pxe`` boot interface, and other
|
||||
boot interfaces based on it:
|
||||
|
||||
* Local boot makes node's boot process independent of the Bare Metal conductor
|
||||
managing it. Thus, nodes are able to reboot correctly, even if the Bare Metal
|
||||
TFTP or HTTP service is down.
|
||||
|
||||
* Network boot (and iPXE) must be used when booting nodes from remote volumes.
|
||||
|
||||
The default boot option for the cloud can be changed via the Bare Metal service
|
||||
configuration file, for example:
|
||||
|
||||
.. code-block:: ini
|
||||
|
||||
[deploy]
|
||||
default_boot_option = local
|
||||
|
||||
This default can be overriden by setting the ``boot_option`` capability on a
|
||||
node. See :ref:`local-boot-partition-images` for details.
|
||||
|
||||
.. note::
|
||||
Currently, network boot is used by default. However, we plan on changing it
|
||||
in the future, so it's safer to set the ``default_boot_option`` explicitly.
|
||||
|
||||
Networking
|
||||
----------
|
||||
|
||||
There are several recommended network topologies to be used with the Bare
|
||||
Metal service. They are explained in depth in specific architecture
|
||||
documentation. However, several considerations are common for all of them:
|
||||
|
||||
* There has to be a *provisioning* network, which is used by nodes during
|
||||
the deployment process. If allowed by the architecture, this network should
|
||||
not be accessible by end users, and should not have access to the internet.
|
||||
|
||||
* There has to be a *cleaning* network, which is used by nodes during
|
||||
the cleaning process. In the majority of cases, the same network should
|
||||
be used as cleaning and provisioning for simplicity.
|
||||
|
||||
Unless noted otherwise, everything in these sections apply to both networks.
|
||||
|
||||
* The baremetal nodes have to have access to the Bare Metal API while connected
|
||||
to the provisioning/cleaning network.
|
||||
|
||||
.. note::
|
||||
Actually, only two endpoints need to be exposed there::
|
||||
|
||||
GET /v1/lookup
|
||||
POST /v1/heartbeat/[a-z0-9\-]+
|
||||
|
||||
You may want to limit access from this network to only these endpoints,
|
||||
and make these endpoint not accessible from other networks.
|
||||
|
||||
* If the ``pxe`` boot interface (or any boot interface based on it) is used,
|
||||
then the baremetal nodes should have untagged (access mode) connectivity
|
||||
to the provisioning/cleaning networks. It allows PXE firmware, which does not
|
||||
support VLANs, to communicate with the services required for provisioning.
|
||||
|
||||
.. note::
|
||||
It depends on the *network interface* whether the Bare Metal service will
|
||||
handle it automatically. Check the networking documentation for the
|
||||
specific architecture.
|
||||
|
||||
* The Baremetal nodes need to have access to any services required for
|
||||
provisioning/cleaning, while connected to the provisioning/cleaning network.
|
||||
This may include:
|
||||
|
||||
* a TFTP server for PXE boot and also an HTTP server when iPXE is enabled
|
||||
* either an HTTP server or the Object Storage service in case of the
|
||||
``direct`` deploy interface and some virtual media boot interfaces
|
||||
|
||||
* The Baremetal Conductor needs to have access to the booted baremetal nodes
|
||||
during provisioning/cleaning. The conductor communicates with an internal
|
||||
API, provided by **ironic-python-agent**, to conduct actions on nodes.
|
||||
|
||||
HA and Scalability
|
||||
------------------
|
||||
|
||||
ironic-api
|
||||
~~~~~~~~~~
|
||||
|
||||
The Bare Metal API service is stateless, and thus can be easily scaled
|
||||
horizontally. It is recommended to deploy it as a WSGI application behind e.g.
|
||||
Apache or another WSGI container.
|
||||
|
||||
Note, that this service accesses the ironic database for reading entities
|
||||
(e.g. in response to ``GET /v1/nodes`` request) and in rare cases for writing.
|
||||
|
||||
ironic-conductor
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
High availability
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
The Bare Metal conductor service utilizes the active/active HA model. Every
|
||||
conductor manages a certain subset of nodes. The nodes are organized in a hash
|
||||
ring that tries to keep the load spread more or less uniformly across the
|
||||
conductors. When a conductor is considered offline, its nodes are taken over by
|
||||
other conductors. As a result of this, you need at least 2 conductor hosts
|
||||
for an HA deployment.
|
||||
|
||||
Performance
|
||||
^^^^^^^^^^^
|
||||
|
||||
Conductors can be resource intensive, so it is recommended (but not required)
|
||||
to keep all conductors separate from other services in the cloud. The minimum
|
||||
required number of conductors in a deployment depends on several factors:
|
||||
|
||||
* the performance of the hardware where the conductors will be running,
|
||||
* the speed and reliability of the `management controller`_ of the
|
||||
bare metal nodes (for example, handling slower controllers may require having
|
||||
less nodes per conductor),
|
||||
* the frequency, at which the management controllers are polled by the Bare
|
||||
Metal service (see the ``sync_power_state_interval`` option),
|
||||
* the bare metal driver used for nodes (see `Hardware and drivers`_ above),
|
||||
* the network performance,
|
||||
* the maximum number of bare metal nodes that are provisioned simultaneously
|
||||
(see the ``max_concurrent_builds`` option for the Compute service).
|
||||
|
||||
We recommend a target of **100** bare metal nodes per conductor for maximum
|
||||
reliability and performance. There is some tolerance for a larger number per
|
||||
conductor. However, it was reported [1]_ [2]_ that reliability degrades when
|
||||
handling approximately 300 bare metal nodes per conductor.
|
||||
|
||||
Disk space
|
||||
^^^^^^^^^^
|
||||
|
||||
Each conductor needs enough free disk space to cache images it uses.
|
||||
Depending on the combination of the deploy interface and the boot option,
|
||||
the space requirements are different:
|
||||
|
||||
* The deployment kernel and ramdisk are always cached during the deployment.
|
||||
|
||||
* The ``iscsi`` deploy method requires caching of the whole instance image
|
||||
locally during the deployment. The image has to be converted to the raw
|
||||
format, which may increase the required amount of disk space, as well as
|
||||
the CPU load.
|
||||
|
||||
.. note::
|
||||
This is not a concern for the ``direct`` deploy interface, as in this case
|
||||
the deployment ramdisk downloads the image and either streams it to the
|
||||
disk or caches it in memory.
|
||||
|
||||
* When network boot is used, the instance image kernel and ramdisk are cached
|
||||
locally while the instance is active.
|
||||
|
||||
.. note::
|
||||
All images may be stored for some time after they are no longer needed.
|
||||
This is done to speed up simultaneous deployments of many similar images.
|
||||
The caching can be configured via the ``image_cache_size`` and
|
||||
``image_cache_ttl`` configuration options in the ``pxe`` group.
|
||||
|
||||
.. [1] http://lists.openstack.org/pipermail/openstack-dev/2017-June/118033.html
|
||||
.. [2] http://lists.openstack.org/pipermail/openstack-dev/2017-June/118327.html
|
||||
|
||||
Other services
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
When integrating with other OpenStack services, more considerations may need
|
||||
to be applied. This is covered in other parts of this guide.
|
||||
|
||||
|
||||
.. _PXE: https://en.wikipedia.org/wiki/Preboot_Execution_Environment
|
||||
.. _iPXE: https://en.wikipedia.org/wiki/IPXE
|
||||
.. _IPMI: https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface
|
||||
.. _management controller: https://en.wikipedia.org/wiki/Out-of-band_management
|
12
doc/source/install/refarch/index.rst
Normal file
12
doc/source/install/refarch/index.rst
Normal file
@ -0,0 +1,12 @@
|
||||
Reference Deploy Architectures
|
||||
==============================
|
||||
|
||||
This section covers the way we recommend the Bare Metal service to be deployed
|
||||
and managed. It is assumed that a reader has already gone through
|
||||
:doc:`/user/index`. It may be also useful to try :ref:`deploy_devstack` first
|
||||
to get better familiar with the concepts used in this guide.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
common
|
Loading…
x
Reference in New Issue
Block a user