This was introduced in mitaka but didn't land any changes, so re-proposed for newton. Previously-approved: Mitaka Spec for blueprint ironic-multiple-compute-hosts Change-Id: I33c2e9fb506c437159cfd53b45525b6561e900bb
219 lines
6.2 KiB
ReStructuredText
219 lines
6.2 KiB
ReStructuredText
..
|
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
|
License.
|
|
|
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
|
|
|
=====================================
|
|
Ironic: Multiple compute host support
|
|
=====================================
|
|
|
|
https://blueprints.launchpad.net/nova/+spec/ironic-multiple-compute-hosts
|
|
|
|
Today, the Ironic virt driver only supports a single nova-compute service.
|
|
This is clearly not viable for an environment of any interesting scale;
|
|
there's no HA, everything fails if the compute service goes down. Let's fix
|
|
that.
|
|
|
|
|
|
Problem description
|
|
===================
|
|
|
|
Computers are horrible things. They die sometimes. They crash processes at
|
|
random. Solar flares can make bad things happen. And so on and so forth.
|
|
|
|
Running only one instance of nova-compute for an entire Ironic environment
|
|
is going to be a bad time. The Ironic virt driver currently assumes that only
|
|
one nova-compute process can run at once. It exposes all resources from an
|
|
Ironic installation to the resource tracker, without the ability to split
|
|
those resources out into many compute services.
|
|
|
|
Use Cases
|
|
----------
|
|
|
|
This allows operators to avoid having a single nova-compute service for an
|
|
Ironic deployment, so that the deployment may continue to function if a
|
|
compute service goes down. Note that this assumes a single Ironic cluster
|
|
per Nova deployment; this is not unreasonable in most cases, as Ironic should
|
|
be able to scale to 10^5 nodes.
|
|
|
|
|
|
Proposed change
|
|
===============
|
|
|
|
In general, a nova-compute running the Ironic virt driver should only
|
|
register as a single row in the compute_nodes table, rather than many rows.
|
|
|
|
Nova's scheduler should schedule only to a nova-compute host; the host will
|
|
choose an Ironic node itself, from the nodes that match the request (explained
|
|
further below). Once an instance is placed on a given nova-compute service
|
|
host, that host will always manage other requests for that instance (delete,
|
|
etc). So the instance count scheduler filter can just be used here to equally
|
|
distribute instances between compute hosts. This reduces the failure domain to
|
|
failing actions for existing instances on a compute host, if a compute host
|
|
happens to fail.
|
|
|
|
The Ironic virt driver should be modified to call an Ironic endpoint with
|
|
the request spec for the instance. This endpoint will reserve a node, and
|
|
return that node. The virt driver will then deploy the instance to this node.
|
|
When the instance is destroyed, the reservation should also be destroyed.
|
|
|
|
This endpoint will take parameters related to the request spec, and is being
|
|
worked on the Ironic side here.[0] This has not yet been solidified, but it
|
|
will have, at a minimum, all fields in the flavor object. This might look
|
|
something like::
|
|
|
|
{
|
|
"memory_mb": 1024,
|
|
"vcpus": 8,
|
|
"vcpu_weight": null,
|
|
"root_gb": 20,
|
|
"ephemeral_gb": 10,
|
|
"swap": 2,
|
|
"rxtx_factor": 1.0,
|
|
"extra_specs": {
|
|
"capabilities": "supports_uefi,has_gpu",
|
|
},
|
|
"image": {
|
|
"id": "some-uuid",
|
|
"properties": {...},
|
|
},
|
|
}
|
|
|
|
|
|
We will report (total ironic capacity) into the resource tracker for each
|
|
compute host. This will end up over-reporting total available capacity to Nova,
|
|
however is the least wrong option here. Other (worse) options are:
|
|
|
|
* Report (total ironic capacity)/(number of compute hosts) from each compute
|
|
host. This is more "right", but has the possibility of a compute host
|
|
reporting (usage) > (max capacity), and therefore becoming unable to perform
|
|
new build actions.
|
|
|
|
* Report some arbitrary incorrect number for total capacity, and try to make
|
|
the scheduler ignore it. This reports numbers more incorrectly, and also
|
|
takes more code and has more room for error.
|
|
|
|
Alternatives
|
|
------------
|
|
|
|
Do what we do today, with active/passive failover.
|
|
|
|
Doing active/passive failover well is not an easy task, and doesn't account for
|
|
all possible failures. This also does not follow Nova's prescribed model for
|
|
compute failure. Furthermore, the resource tracker initialization is slow
|
|
with many Ironic nodes, and so a cold failover could take minutes.
|
|
|
|
Resource providers[1] may be another viable alternative, but we shouldn't
|
|
have a hard dependency on that.
|
|
|
|
Data model impact
|
|
-----------------
|
|
|
|
None.
|
|
|
|
REST API impact
|
|
---------------
|
|
|
|
None.
|
|
|
|
Security impact
|
|
---------------
|
|
|
|
None.
|
|
|
|
Notifications impact
|
|
--------------------
|
|
|
|
None.
|
|
|
|
Other end user impact
|
|
---------------------
|
|
|
|
None.
|
|
|
|
Performance Impact
|
|
------------------
|
|
|
|
This should improve performance a bit. Currently the resource tracker is
|
|
responsible for every node in an Ironic deployment. This will make that group
|
|
smaller and improve the performance of the resource tracker loop.
|
|
|
|
Other deployer impact
|
|
---------------------
|
|
|
|
A version of Ironic that supports the reservation endpoint must be deployed
|
|
before a version of Nova with this change is deployed. If this is not the
|
|
case, the previous behavior should be used. We'll need to properly deprecate
|
|
the old behavior behind a config option, as deployers will need to configure
|
|
different scheduler filters and host managers than the current recommendation
|
|
for this to work correctly. We should investigate if this can be done
|
|
gracefully without a new config option, however I'm not sure it's possible.
|
|
|
|
Developer impact
|
|
----------------
|
|
|
|
None, though Ironic driver developers should be aware of the situation.
|
|
|
|
|
|
Implementation
|
|
==============
|
|
|
|
Assignee(s)
|
|
-----------
|
|
|
|
Primary assignee:
|
|
jim-rollenhagen (jroll)
|
|
|
|
Other contributors:
|
|
devananda
|
|
jaypipes
|
|
|
|
Work Items
|
|
----------
|
|
|
|
* Change the Ironic driver to be a 1:1 host:node mapping.
|
|
|
|
* Change the Ironic driver to get reservations from Ironic.
|
|
|
|
|
|
Dependencies
|
|
============
|
|
|
|
This depends on a new endpoint in Ironic.[0]
|
|
|
|
|
|
Testing
|
|
=======
|
|
|
|
This should be tested by being the default configuration.
|
|
|
|
|
|
Documentation Impact
|
|
====================
|
|
|
|
Deployer documentation will need updates to specify how this works, since it
|
|
is different than most drivers.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
[0] https://review.openstack.org/#/c/204641/
|
|
|
|
[1] https://review.openstack.org/#/c/225546/
|
|
|
|
|
|
History
|
|
=======
|
|
|
|
.. list-table:: Revisions
|
|
:header-rows: 1
|
|
|
|
* - Release Name
|
|
- Description
|
|
* - Mitaka
|
|
- Introduced but no changes merged.
|
|
* - Newton
|
|
- Re-proposed.
|