A fresh way of looking at step retrieval
This is an attempt at a fresh, simplistic, current state driven way to obtain possible steps. Change-Id: Iee540569380365f945f7e072c12e0c5739128e42
This commit is contained in:
parent
8722aa582a
commit
b2407ddcff
|
@ -0,0 +1,343 @@
|
||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
===============
|
||||||
|
Obtaining Steps
|
||||||
|
===============
|
||||||
|
|
||||||
|
https://storyboard.openstack.org/#!/story/1719925
|
||||||
|
|
||||||
|
https://storyboard.openstack.org/#!/story/1715419
|
||||||
|
|
||||||
|
In Ironic, we have a concept of steps [1]_ to be executed to achieve a task
|
||||||
|
utilizing a blend of driver code running in the conductor and code operating
|
||||||
|
inside of the
|
||||||
|
`ironic-python-agent <https://git.openstack.org/cgit/openstack/ironic-python-agent>`_.
|
||||||
|
|
||||||
|
In order for this to be useful, we have to be able to raise the visibility of
|
||||||
|
what is available to be performed to the end user of the API. Presently users
|
||||||
|
are only able to rely upon documentation, and the state of the code including
|
||||||
|
modules that could be loaded in.
|
||||||
|
|
||||||
|
This issue is further compounded as the entire list of steps is a union
|
||||||
|
of information identified from the ``ironic-conductor`` process managing the
|
||||||
|
node and the ``ironic-python-agent`` process executing upon the node.
|
||||||
|
|
||||||
|
.. Note::
|
||||||
|
This document is present in the backlog as there are implementation issues
|
||||||
|
to this feature. Please see Gerrit change
|
||||||
|
`606199 <https://review.openstack.org/#/c/606199/4>`_ for more information.
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
* API users presently have to rely upon documentation of steps to know
|
||||||
|
what is available.
|
||||||
|
|
||||||
|
* Different steps may be available with different hardware managers.
|
||||||
|
|
||||||
|
* With the increasing use of the Deploy Steps [2]_ framework, new steps
|
||||||
|
should be anticipated to be added with new releases of Ironic.
|
||||||
|
|
||||||
|
* The ``ironic-python-agent`` must be running to obtain a complete list
|
||||||
|
of steps.
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
In order to keep this solution relatively lightweight, there are four
|
||||||
|
fundamental changes that will be needed in order to facilitate visibility.
|
||||||
|
|
||||||
|
This doesn't seek to solve complete visibility by creating additional
|
||||||
|
processes, but instead seeks to provide tools to collect data,
|
||||||
|
with the limiting factor being we can only return the current available
|
||||||
|
information.
|
||||||
|
|
||||||
|
How to do it?
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Step 1
|
||||||
|
~~~~~~
|
||||||
|
|
||||||
|
The initial step is to provide an API endpoint that returns the current
|
||||||
|
available list of steps visible for a node running in the conductor.
|
||||||
|
This would be an API endpoint, to a RPC method, to a conductor manager
|
||||||
|
method, which would then return the list of steps, while tolerating the
|
||||||
|
absence of ``ironic-python-agent``.
|
||||||
|
|
||||||
|
.. Note::
|
||||||
|
The ironic community consensus is that this feature should cache steps
|
||||||
|
and return those cached steps as available to the user.
|
||||||
|
|
||||||
|
Step 2
|
||||||
|
~~~~~~
|
||||||
|
|
||||||
|
Addition of a ``hold`` provision state verb and ``holding`` state.
|
||||||
|
|
||||||
|
.. Note::
|
||||||
|
During a specific planning and discussion meeting to determine the path
|
||||||
|
for a feature such as this, the ironic community reached a consensus on
|
||||||
|
the call that a holding state would be useful, and could likey be
|
||||||
|
implemented aside from the API functionality proposed in this backlog
|
||||||
|
specification.
|
||||||
|
|
||||||
|
+-----------------+-------------------+---------------------------------+
|
||||||
|
| *Initial State* | *Temporary State* | *Possible next verbs* |
|
||||||
|
+-----------------+-------------------+---------------------------------+
|
||||||
|
| manageable | holding | manage, clean, provide, active, |
|
||||||
|
| | | inspect |
|
||||||
|
+-----------------+-------------------+---------------------------------+
|
||||||
|
| available | holding | active, manage, provide |
|
||||||
|
+-----------------+-------------------+---------------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
With the invocation of the state:
|
||||||
|
|
||||||
|
* The machine is moved to the provisioning network.
|
||||||
|
|
||||||
|
.. Note::
|
||||||
|
There is a slight issue with this transition in that to clean the node
|
||||||
|
would realistically need to be on the cleaning network. Operationally
|
||||||
|
changing the DHCP address is problematic as we have learned with the
|
||||||
|
rescue feature.
|
||||||
|
|
||||||
|
* The deployment ramdisk is booted.
|
||||||
|
* The ``ironic-python-agent`` would then be left in a running
|
||||||
|
state, allowed to heartbeat (or be polled), and the API
|
||||||
|
endpoint added in the prior step would fetch a complete
|
||||||
|
list of steps that can be executed upon.
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
An alternative to this solution would be to provide an async API endpoint
|
||||||
|
to perform the steps detailed in step 2, and cache the data which could then
|
||||||
|
be retrieved by the user asynchronously. In this case, the user would have
|
||||||
|
to poll the API to determine if the cached information has been updated.
|
||||||
|
|
||||||
|
The conundrum is that this would have to be constrained by states, which
|
||||||
|
means we would still need to build state machine states around this to
|
||||||
|
represent the current operation to users.
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
State Machine Impact
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
As noted above, we would add a new hold verb, which would allow transition
|
||||||
|
back to the prior state. This ``hold`` verb would only be accessible from
|
||||||
|
the ``manageable`` and ``available`` states.
|
||||||
|
|
||||||
|
In this holding state, API users would be able to request logical next steps,
|
||||||
|
in-line with the present state, as detailed in the table above.
|
||||||
|
|
||||||
|
REST API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
The node object returned would expose additional ``provision_state`` states,
|
||||||
|
however this is a known quantity with all state machine impacts.
|
||||||
|
|
||||||
|
An additional provision state target verb of ``hold`` to trigger the state
|
||||||
|
machine change.
|
||||||
|
|
||||||
|
An endpoint will be added on to enable an API user to return the list
|
||||||
|
of known steps via the RPC interface and the conductor, which will be
|
||||||
|
triggered as a GET request.
|
||||||
|
|
||||||
|
.. Note::
|
||||||
|
Community consensus is that we should not be initiating a synchronous call
|
||||||
|
to IPA to collect data, that we should instead return cached data and
|
||||||
|
somehow trigger the cache to be updated.
|
||||||
|
|
||||||
|
Example::
|
||||||
|
|
||||||
|
GET /v1/nodes/{node_ident}/steps[?type=(clean|deploy)]
|
||||||
|
{
|
||||||
|
[{"source": "conductor",
|
||||||
|
"deploy": [
|
||||||
|
{
|
||||||
|
"interface": "deploy",
|
||||||
|
"step": "deploy",
|
||||||
|
"priority": 100,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
"clean": [
|
||||||
|
{
|
||||||
|
"interface": "deploy",
|
||||||
|
"step": "erase_devices",
|
||||||
|
"reboot_requested": False,
|
||||||
|
"priority": 10,
|
||||||
|
"abortable": True,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"interface": "bios",
|
||||||
|
"step": "apply_configuration",
|
||||||
|
"args": {....},
|
||||||
|
"priority": 0,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"interface": "raid",
|
||||||
|
"step": "create_configuration",
|
||||||
|
"args": {....},
|
||||||
|
"priority": 0
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"interface": "raid"
|
||||||
|
"step": "delete_configuration",
|
||||||
|
"args": {....},
|
||||||
|
"priority": 0
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{"source": "agent",
|
||||||
|
...
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
If a specific ``type`` is requested, then the request shall only return the
|
||||||
|
requested type of steps. If no type is defined, both sets will be returned
|
||||||
|
to the caller.
|
||||||
|
|
||||||
|
Normal response code: 200
|
||||||
|
Expected error codes::
|
||||||
|
|
||||||
|
* 400 with malformed request
|
||||||
|
* 503 upon conductor error
|
||||||
|
|
||||||
|
.. NOTE::
|
||||||
|
API micro-version will be incremented in accordance with standard
|
||||||
|
procedure.
|
||||||
|
|
||||||
|
|
||||||
|
Client (CLI) impact
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
"ironic" CLI
|
||||||
|
~~~~~~~~~~~~
|
||||||
|
None
|
||||||
|
|
||||||
|
"openstack baremetal" CLI
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
An ``openstack baremetal node steps`` and ``openstack baremetal node hold``
|
||||||
|
commands will be added to facilitate returning the data exposed by this api.
|
||||||
|
|
||||||
|
RPC API impact
|
||||||
|
--------------
|
||||||
|
|
||||||
|
A new RPC method will need to be added called ``get_steps``
|
||||||
|
that will support a single argument to indicate what class of
|
||||||
|
steps are being requested by the API user.
|
||||||
|
|
||||||
|
Driver API impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Nova driver impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
None is required for this feature.
|
||||||
|
|
||||||
|
That being said, there is value to enable a node to be scheduled which is
|
||||||
|
being held for an available deployment. As such, it could be an optional
|
||||||
|
enhancement which could save quite a bit of time in a deployment process.
|
||||||
|
This could be enabled by allowing nova to consider a node in the ``holding``
|
||||||
|
state to be available for deployments by also evaluating the
|
||||||
|
``target_provision_state`` for nodes in ``holding``. It would be
|
||||||
|
fairly tight coupling, but a frequent ask is for faster deployments,
|
||||||
|
and it would be a route that we could take to enable such
|
||||||
|
functionality in terms of "holding for deployment".
|
||||||
|
|
||||||
|
Ramdisk impact
|
||||||
|
--------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other end user impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Scalability impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other deployer impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Developer impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
Julia Kreger (TheJulia) <juliaashleykreger@gmail.com>
|
||||||
|
|
||||||
|
Other contributors:
|
||||||
|
?
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
* Implement API to retrieve a list of states.
|
||||||
|
* Implement State machine changes to allow an idle agent instance to return
|
||||||
|
cleaning step data.
|
||||||
|
* Add API tests to ironic-tempest-plugin.
|
||||||
|
* Update state machine documentation.
|
||||||
|
* Add Admin documentation.
|
||||||
|
* Update CLI documentation.
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
Basic API contract and state testing should be sufficient for this feature.
|
||||||
|
|
||||||
|
Upgrades and Backwards Compatibility
|
||||||
|
====================================
|
||||||
|
|
||||||
|
N/A, The existing rolling upgrades and RPC version pinning practice should
|
||||||
|
be more than sufficient to support this feature.
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
Additional details will need to be added to the Admin guide.
|
||||||
|
State documentation will need to be updated.
|
||||||
|
Update client documentation for new state verb.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
.. [1] Manual cleaning - https://specs.openstack.org/openstack/ironic-specs/specs/5.0/manual-cleaning.html
|
||||||
|
.. [2] Deploy Steps - https://specs.openstack.org/openstack/ironic-specs/specs/11.1/deployment-steps-framework.html
|
Loading…
Reference in New Issue