Merge "Add Placement error-handling to user guide"

This commit is contained in:
Zuul 2023-02-27 14:12:05 +00:00 committed by Gerrit Code Review
commit 6827670a82
2 changed files with 248 additions and 0 deletions

View File

@ -114,6 +114,7 @@ Error Handling
etsi_vnf_error_handling
db_sync_error_handling
placement_error_handling
.. TODO(h-asahina): add `Action Driver`
* https://etherpad.opendev.org/p/tacker-wallaby-revise-docs

View File

@ -0,0 +1,247 @@
============================
VDU Placement error-handling
============================
This document describes how to error-handling VDU placement errors
due to insufficient availability zone's resources.
VDU Placement error due to insufficient resources
-------------------------------------------------
The placement constraints are defined in `ETSI NFV-SOL 003 v3.3.1`_ and
that VNFM sends to NFVO in order to the resource placement decision.
In VNF Lifecycle Management (LCM), there are some error cases that VDUs
are not deployed due to insufficient availability zone's resources.
When stack create/update fails, it can be detected from `Show stack
details`_ of Heat-API response whether the failure is due to
insufficient resources.
The error message that indicates insufficient resources is extracted
from the parameter "stack_status_reason" in the response.
.. note::
In the case of insufficient resources, the error occurs after stack
create/update returns an acceptance response, so the "Show stack
details" response can be used to detect the cause.
The following are examples of an error message stored in
"stack_status_reason" when resources are insufficient.
+ ex1) Set the flavor defined in "OS::Nova::Server" to a large value
that cannot be deployed (not enough storage/not enough vcpu/not enough
memory).
+ Resource CREATE failed: ResourceInError: resources.<VDU-name>: Went
to status ERROR due to “Message: No valid host was found. , Code:
500”
+ ex2) Specifies an extra-spec that cannot be assigned for the flavor
defined in "OS::Nova::Server."
+ Resource CREATE failed: ResourceInError: resources.<VDU-name>: Went
to status ERROR due to “Message: Exceeded maximum number of retries.
Exhausted all hosts available for retrying build failures for
instance <server-UUID>., Code: 500”
Availability zone reselection
-----------------------------
If VDU placement errors due to insufficient availability zone's
resources occur, availability zone reselection is possible as
error-handling.
The VNF LCM v2 API (instantiate/heal/scale for VNF) process can change
the availability zone to be used from the one notified by the NFVO if
necessary.
If the availability zone notified by the NFVO has insufficient
resources, the VNF is re-created/updated in a different availability
zone.
The availability zone is reselected and the VNF is re-created/updated
until there are no more candidates.
Settings
~~~~~~~~
The settings for performing availability zone reselection are following.
+ Using StandardUserData as the UserData class
+ Set `placement_fallback_best_effort = True` in the ``tacker.conf`` file
.. note::
Maximum number of retries for reselection of availability zone is
unlimited by default.
If the retry limit needs to be set, set the limit number to
`placement_az_select_retry` in ``tacker.conf``.
(Default value ``0`` means unlimited number of retries.)
.. note::
Regular expression for detecting insufficient resource error is
following by default.
``Resource CREATE failed: ResourceInError: resources\.(.*)\.(.*): (.*)
| Resource UPDATE failed: resources\.(.*): Resource CREATE failed:
ResourceInError: resources\.(.*): (.*)``
If the detection condition for insufficient resource error needs to be
changed, set regular expression to `placement_az_resource_error` in
``tacker.conf``.
It is out of community support if you change `placement_az_resource_error`
from the default, so please do it at your own risk.
Policy
~~~~~~
Availability zones in error are excluded from the reselection
candidates, and are reselected preferentially from unselected
availability zones.
.. note::
Affinity/Anti-Affinity of PlacementConstraint and resource states of
availability zones are not considered during reselection.
The availability zone in error can be identified in the following way.
1. Call Heat-API "Show stack details" after an error occurs in "stack
create/update"
2. Identify the VDU where the error occurred due to insufficient resource
by the stack_status_reason in the response of 1.
3. Identify the availability zone by the VDU identified in 2.
.. note::
Insufficient resources in availability zones that once failed during
reselection attempts may be resolved, but the availability zones will
not be reselected.
In Scale/Heal operations, VDUs that have already been deployed will
not be re-created.
Availability zone reselection for each VNF LCM v2 API
(instantiate/heal/scale for VNF) is as follows.
Precondition: availability zones AZ-1/AZ-2/AZ-3/AZ-4/AZ-5 exist and VNFs
VDU1-0/VDU1-1/VDU2-0/VDU2-1 are deployed
.. note::
VNFs in VDU1 are in the same availability zone (Affinity), and VNFs in
VDU2 and VDU1/VDU2 are in different availability zones (Anti-Affinity).
+ Instantiate
+ Before reselection, the following attempts to deploy failed (AZ-1
and AZ-2 have insufficient resource)
+ VDU1-0: AZ-1
+ VDU1-1: AZ-1
+ VDU2-0: AZ-2
+ VDU2-1: AZ-3
+ VDU1-0/1: Reselect the following (except AZ-1/AZ-2/AZ-3, select AZ-4
or AZ-5)
+ VDU1-0: AZ-4
+ VDU1-1: AZ-4
+ VDU2-0: AZ-2
+ VDU2-1: AZ-3
+ VDU2-0: Reselect the following (except AZ-2/AZ-3/AZ-4, select AZ-1 or
AZ-5)
+ VDU1-0: AZ-4
+ VDU1-1: AZ-4
+ VDU2-0: AZ-5
+ VDU2-1: AZ-3
.. note::
The above is an example, and the reselection target is randomly
selected from unselected availability zones.
+ Heal (VDU1-1/VDU2-0)
+ Before reselection, the following attempts to deploy failed (AZ-1
and AZ-2 have insufficient resource)
+ VDU1-0: AZ-1
+ VDU1-1: AZ-1
+ VDU2-0: AZ-2
+ VDU2-1: AZ-3
+ VDU1-1: Reselect the following (except AZ-1/AZ-2/AZ-3, select AZ-4
or AZ-5)
+ VDU1-0: AZ-1
+ VDU1-1: AZ-4
+ VDU2-0: AZ-2
+ VDU2-1: AZ-3
.. note::
Only Heal target VNFs are targeted for availability zone
reselection.
Therefore, Affinity may not be satisfied due to the operation of
reselection.
+ VDU2-0: Reselect the following (except AZ-1/AZ-2/AZ-3/AZ-4, select
AZ-5)
+ VDU1-0: AZ-1
+ VDU1-1: AZ-4
+ VDU2-0: AZ-5
+ VDU2-1: AZ-3
+ Scale out (add VDU1-2/VDU1-3)
+ Before reselection, VDU1-3 deploy failed (AZ-1 has insufficient
resource)
+ VDU1-0: AZ-1
+ VDU1-1: AZ-1
+ VDU1-2: AZ-1
+ VDU1-3: AZ-1
+ VDU2-0: AZ-2
+ VDU2-1: AZ-3
+ VDU1-2/3: Reselect the following (except AZ-1/AZ-2/AZ-3, select AZ-4
or AZ-5)
+ VDU1-0: AZ-1
+ VDU1-1: AZ-1
+ VDU1-2: AZ-4
+ VDU1-3: AZ-4
+ VDU2-0: AZ-2
+ VDU2-1: AZ-3
.. note::
In the case of Affinity, even if VDU1-2 has been successfully
deployed, both VDU1-2/VDU1-3 availability zones will be reselected.
Existing VDU1-0/VDU1-1 will not be reselected, so all VDUs may not
be in the same availability zone even in Affinity case.
+ Scale out (add VDU2-2/VDU2-3)
+ Before reselection, VDU2-3 deploy failed (AZ-5 has insufficient
resource)
+ VDU1-0: AZ-1
+ VDU1-1: AZ-1
+ VDU2-0: AZ-2
+ VDU2-1: AZ-3
+ VDU2-2: AZ-4
+ VDU2-3: AZ-5
+ VDU2-3: Reselect the following (except AZ-5, select AZ-1 or AZ-2 or
AZ-3 or AZ-4)
+ VDU1-0: AZ-1
+ VDU1-1: AZ-1
+ VDU2-0: AZ-2
+ VDU2-1: AZ-3
+ VDU2-2: AZ-4
+ VDU2-3: AZ-1
.. note::
If there are no unselected availability zones left, randomly select
a reselection target from the selected availability zones.
In this case, Anti-Affinity cannot be satisfied.
.. _ETSI NFV-SOL 003 v3.3.1: https://www.etsi.org/deliver/etsi_gs/NFV-SOL/001_099/003/03.03.01_60/gs_nfv-sol003v030301p.pdf
.. _Show stack details: https://docs.openstack.org/api-ref/orchestration/v1/index.html?expanded=show-stack-details-detail#show-stack-details