There are many references to review.openstack.org, and while the redirect should work, we can also go ahead and fix them. Change-Id: Ic82bde84617c461aba1e4a02b3fc562a90c99a6e
9.6 KiB
Obtaining Steps
https://storyboard.openstack.org/#!/story/1719925
https://storyboard.openstack.org/#!/story/1715419
In Ironic, we have a concept of steps1 to be executed to achieve a task utilizing a blend of driver code running in the conductor and code operating inside of the ironic-python-agent.
In order for this to be useful, we have to be able to raise the visibility of what is available to be performed to the end user of the API. Presently users are only able to rely upon documentation, and the state of the code including modules that could be loaded in.
This issue is further compounded as the entire list of steps is a
union of information identified from the ironic-conductor
process managing the node and the ironic-python-agent
process executing upon the node.
Note
This document is present in the backlog as there are implementation issues to this feature. Please see Gerrit change 606199 for more information.
Problem description
- API users presently have to rely upon documentation of steps to know what is available.
- Different steps may be available with different hardware managers.
- With the increasing use of the Deploy Steps2 framework, new steps should be anticipated to be added with new releases of Ironic.
- The
ironic-python-agent
must be running to obtain a complete list of steps.
Proposed change
In order to keep this solution relatively lightweight, there are four fundamental changes that will be needed in order to facilitate visibility.
This doesn't seek to solve complete visibility by creating additional processes, but instead seeks to provide tools to collect data, with the limiting factor being we can only return the current available information.
How to do it?
Step 1
The initial step is to provide an API endpoint that returns the
current available list of steps visible for a node running in the
conductor. This would be an API endpoint, to a RPC method, to a
conductor manager method, which would then return the list of steps,
while tolerating the absence of ironic-python-agent
.
Note
The ironic community consensus is that this feature should cache steps and return those cached steps as available to the user.
Step 2
Addition of a hold
provision state verb and
holding
state.
Note
During a specific planning and discussion meeting to determine the path for a feature such as this, the ironic community reached a consensus on the call that a holding state would be useful, and could likey be implemented aside from the API functionality proposed in this backlog specification.
Initial State | Temporary State | Possible next verbs |
manageable | holding | manage, clean, provide, active, inspect |
available | holding | active, manage, provide |
With the invocation of the state:
- The machine is moved to the provisioning network.
Note
There is a slight issue with this transition in that to clean the node would realistically need to be on the cleaning network. Operationally changing the DHCP address is problematic as we have learned with the rescue feature.
- The deployment ramdisk is booted.
- The
ironic-python-agent
would then be left in a running state, allowed to heartbeat (or be polled), and the API endpoint added in the prior step would fetch a complete list of steps that can be executed upon.
Alternatives
An alternative to this solution would be to provide an async API endpoint to perform the steps detailed in step 2, and cache the data which could then be retrieved by the user asynchronously. In this case, the user would have to poll the API to determine if the cached information has been updated.
The conundrum is that this would have to be constrained by states, which means we would still need to build state machine states around this to represent the current operation to users.
Data model impact
None
State Machine Impact
As noted above, we would add a new hold verb, which would allow
transition back to the prior state. This hold
verb would
only be accessible from the manageable
and
available
states.
In this holding state, API users would be able to request logical next steps, in-line with the present state, as detailed in the table above.
REST API impact
The node object returned would expose additional
provision_state
states, however this is a known quantity
with all state machine impacts.
An additional provision state target verb of hold
to
trigger the state machine change.
An endpoint will be added on to enable an API user to return the list of known steps via the RPC interface and the conductor, which will be triggered as a GET request.
Note
Community consensus is that we should not be initiating a synchronous call to IPA to collect data, that we should instead return cached data and somehow trigger the cache to be updated.
Example:
GET /v1/nodes/{node_ident}/steps[?type=(clean|deploy)]
{
[{"source": "conductor",
"deploy": [
{
"interface": "deploy",
"step": "deploy",
"priority": 100,
},
],
"clean": [
{
"interface": "deploy",
"step": "erase_devices",
"reboot_requested": False,
"priority": 10,
"abortable": True,
},
{
"interface": "bios",
"step": "apply_configuration",
"args": {....},
"priority": 0,
},
{
"interface": "raid",
"step": "create_configuration",
"args": {....},
"priority": 0
},
{
"interface": "raid"
"step": "delete_configuration",
"args": {....},
"priority": 0
}
]
},
{"source": "agent",
...
}
]
}
If a specific type
is requested, then the request shall
only return the requested type of steps. If no type is defined, both
sets will be returned to the caller.
Normal response code: 200 Expected error codes:
* 400 with malformed request
* 503 upon conductor error
Note
API micro-version will be incremented in accordance with standard procedure.
Client (CLI) impact
"ironic" CLI
None
"openstack baremetal" CLI
An openstack baremetal node steps
and
openstack baremetal node hold
commands will be added to
facilitate returning the data exposed by this api.
RPC API impact
A new RPC method will need to be added called get_steps
that will support a single argument to indicate what class of steps are
being requested by the API user.
Driver API impact
None
Nova driver impact
None is required for this feature.
That being said, there is value to enable a node to be scheduled
which is being held for an available deployment. As such, it could be an
optional enhancement which could save quite a bit of time in a
deployment process. This could be enabled by allowing nova to consider a
node in the holding
state to be available for deployments
by also evaluating the target_provision_state
for nodes in
holding
. It would be fairly tight coupling, but a frequent
ask is for faster deployments, and it would be a route that we could
take to enable such functionality in terms of "holding for
deployment".
Ramdisk impact
None
Security impact
None
Other end user impact
None
Scalability impact
None
Performance Impact
None
Other deployer impact
None
Developer impact
None
Implementation
Assignee(s)
- Primary assignee:
-
Julia Kreger (TheJulia) <juliaashleykreger@gmail.com>
- Other contributors:
-
?
Work Items
- Implement API to retrieve a list of states.
- Implement State machine changes to allow an idle agent instance to return cleaning step data.
- Add API tests to ironic-tempest-plugin.
- Update state machine documentation.
- Add Admin documentation.
- Update CLI documentation.
Dependencies
None
Testing
Basic API contract and state testing should be sufficient for this feature.
Upgrades and Backwards Compatibility
N/A, The existing rolling upgrades and RPC version pinning practice should be more than sufficient to support this feature.
Documentation Impact
Additional details will need to be added to the Admin guide. State documentation will need to be updated. Update client documentation for new state verb.