This feature adds support for the PENDING server state. Instead of setting a server to the ERROR state when the Placement returns no valid hosts for the requested server, the PENDING state is used if the operator wishes so. This will allow the execution of subsequent actions transparently to the end user. Previously-approved: Stein Change-Id: Ia605b2a64a468be264be86f27d46ce00e6a16b10 Implements: blueprint introduce-pending-vm-state
7.2 KiB
Introduce Pending VM state
https://blueprints.launchpad.net/nova/+spec/introduce-pending-vm-state
This feature adds support for the PENDING server state.
When the scheduler determines there is no capacity available for the
given request, and so the instance is about to be routed into cell0, the
server should be set into the PENDING state instead of
ERROR if the operator wishes so. This will allow the
execution of subsequent actions transparently to the end user.
Problem description
Use Cases
As an operator, I want to enable an external -to Nova- service,
triggered as soon as a server's build request fails due to
NoValidHost, and try to free up the requested
resources.
If the outcome of the follow up actions is:
success, the external service will try to rebuild the instance:
POST /servers/{server_id}/action { "rebuild": { "description": null, "imageRef": {image_id} } }Note
The rebuild api needs to be adapted to take care of instances that fail while building and are mapped to cell0. This change is considered out of scope for this spec and is being addressed by another spec1.
failure, the external service will set the state of the instance to
ERROR(using reset-state):POST /servers/{server_id}/action { "os-resetState": { "state": "error" } }
In order to achieve that, transparently to the user, the instance
should not be set to the ERROR state but to the new
PENDING state.
We need to clarify here that, as for all the other VM states, the end
user will be able to delete instances set to the PENDING
state. Failures to the follow up actions, caused by the deletion of
instances in the new state, have to be handled by the external
service.
Proposed change
Add the
PENDINGstate in theInstanceStateobject.Add the
PENDINGstate in compute vm_states.Add the
PENDINGstate in the server ViewBuilder as a new progress status.Add a configuration option that defaults to
Falsein the DEFAULT group to enable the use ofPENDINGvm_state onNoValidHostevents:CONF.use_pending_stateAdd the following code in the conductor manager
_bury_in_cell0method to make sure that the a vm is set toPENDINGonly when the operator has chosen so and the failure reported by the scheduler is aNoValidHost:verify = isinstance(exc, exception.NoValidHost) if CONF.use_pending_state and verify: vm_state = vm_states.PENDING else: vm_state = vm_states.ERROR updates = {'vm_state': vm_state, 'task_state': None}Add a new API microversion and Map the
PENDINGstate toERRORfor requests to previous microversions. See REST API impact.
Alternatives
Follow the vendor data example and perform an asynchronous REST API call from the Nova Conductor to the external service when enabled by the operator. But having an asynchronous REST API call from the conductor would potentially have performance impact.
Data model impact
None.
REST API impact
A new API microversion is needed for this change. For the older
microversions the PENDING state will be mapped to the
ERROR state.
Example responses for a server set to PENDING would
be:
GET /servers/detail (new microversion)
{
"servers":[
{
...: ...,
"name": "test",
"id":"2dd26c1e-bc6f-45f6-83b3-2cb72ea026eb",
"OS-EXT-STS:vm_state":"pending",
"status":"PENDING",
...: ...
}
]
}
GET /servers/detail (previous microversions)
{
"servers":[
{
...: ...,
"name": "test",
"id":"2dd26c1e-bc6f-45f6-83b3-2cb72ea026eb",
"OS-EXT-STS:vm_state":"error",
"status":"ERROR",
...: ...
}
]
}
Security impact
None.
Notifications impact
Firstly, the external third party service has to be notified when a
server is set to PENDING state. For this, the already
existing versioned notification instance.update2.
For the second part, a notification is needed in order to inform the
external service about a server's build procedure outcome. The plan is
to use this notification in order to enable the external Reaper service,
to know where the requested resources have to be freed up. The existing
select_destinations versioned notification can be used3.
Other end user impact
From the new microversion that introduces the new instance state and beyond, end users need to account for the possibility of instances going through the PENDING state (which may or may not happen, depending on the way the operator chooses to configure the cloud).
Performance Impact
None.
Other deployer impact
There will be a new config option specifying if the
PENDING state will be used or not. It seems that the most
appropriate place for this option is the DEFAULT section.
Developer impact
None.
Upgrade impact
None.
Implementation
Assignee(s)
- Primary assignee:
-
<ttsiouts>
- Other contributors:
-
<johnthetubaguy> <strigazi> <belmoreira>
Work Items
See Proposed change.
Dependencies
None.
Testing
Updating existing unit and functional tests should be enough to verify the use of the new state. New unit and functional tests have to be added to verify the new notification.
Documentation Impact
The new configuration option as well as the meaning of the
PENDINGstate should be documented.Update the allowed state transitions documentation to include:
BUILD to PENDING PENDING to BUILD PENDING to ERRORDocument that the responsibility of managing the instance's lifecycle is transferred to the external service as soon as the instance is set to the
PENDINGstate.Document that after the new microversion instances might go through the
PENDINGstate as well, depending on whether the operator chooses to enable this state or not.
References
As discussed in the Dublin PTG: https://etherpad.openstack.org/p/nova-ptg-rocky L472
History
| Release Name | Description |
|---|---|
| Rocky | Introduced |
| Stein | Re-proposed |
| Train | Re-proposed |