
- New VNF workflow - ETSI FEAT03 changes story: 2006838 Task: #37843 Change-Id: I2cdcbbb3f68a71004e59427c6c1a48e38d4ae2cb Signed-off-by: Tomi Juvonen <tomi.juvonen@nokia.com>
301 lines
12 KiB
ReStructuredText
301 lines
12 KiB
ReStructuredText
..
|
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
|
License.
|
|
|
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
|
|
|
=============================================
|
|
ETSI NFVI software modification specification
|
|
=============================================
|
|
|
|
https://storyboard.openstack.org/#!/story/2006557
|
|
|
|
Implement the needed interfacing between VNFM and Fenix that is specified in
|
|
`ETSI FEAT03 related documentation`_ etsi. Limit current changes to instances
|
|
and instance groups.
|
|
|
|
Problem description
|
|
===================
|
|
|
|
This feature addresses the support for the coordination of the NFVI software
|
|
modification process with the VNFs hosted on the NFVI in order to minimize
|
|
impact on service availability.
|
|
|
|
Use Cases
|
|
---------
|
|
|
|
Guarantee a zero impact to VNF service during Fenix infrastructure maintenance,
|
|
upgrade and scaling workflow operation. This implies that VNF and VNFM supports
|
|
the ETSI specification and Fenix interaction.
|
|
|
|
|
|
Proposed change
|
|
===============
|
|
|
|
Implement APIs to set VNF specific instance and instance group variables.
|
|
|
|
New APIs are to have VNF project instance and instance group data changed in
|
|
the Fenix database. These constraints might be set in VNFD or the VNF element
|
|
manager can change these any time according to VNF current load level.
|
|
Having the constraints gives the ability to optimize the infrastructure
|
|
maintenance operation as we can scale down the VNFs as much as possible and
|
|
therefore to able to maintain parallel as many compute nodes as possible.
|
|
Instance grouping can be instances belonging to certain anti-affinity group,
|
|
but all instances need to be grouped, so we know how many of those are at
|
|
least needed and how many of those can be exposed to maintenance at the same
|
|
time. If nothing else, group mean instance of a certain flavor.
|
|
|
|
Make an example workflow that supports the usage of these APIs. Workflow should
|
|
implement one example rolling maintenance use case. Existing Fenix interaction
|
|
towards VNFM will be utilized with small changes.
|
|
|
|
The variables common to instance and instance group can be overridden in the
|
|
instance object. Both objects can be updated at any time. Update can be
|
|
considered in any action that is not currently not ongoing. Existing timer
|
|
would not be updated. These objects are not enough to optimize infrastructure
|
|
workflow. The existing Fenix interaction is also needed to optimize the
|
|
maintenance window as small as possible. Also this allows upgrading the VNF
|
|
with new infrastructure capabilities and with no additional impact on VNF
|
|
service availability if done at the same time as the infrastructure upgrade.
|
|
|
|
This diagram will illustrate the existing Fenix workflow where application
|
|
manager updates instance and instance group constraints always when
|
|
instances are created or deleted. Constraints can also be updated anytime
|
|
if the level of VNF service will allow different amount of instances at
|
|
that time.
|
|
|
|
.. seqdiag::
|
|
|
|
seqdiag {
|
|
activation = none;
|
|
app-manager --> fenix [label = "Update instance and instance group constraints anytime and when created"]
|
|
=== --- ===
|
|
infra-admin -> fenix [label = "Maintenance session \n for hosts", note="Start the maintenance process"];
|
|
fenix -> app-manager [label = "MAINTENANCE"];
|
|
app-manager -> fenix [label = "ACK_MAINTENANCE"];
|
|
fenix --> app-manager [label = "IN_SCALE", note="Optional down scale"];
|
|
app-manager --> fenix [label = "Remove instance related constraints of scaled down instances. Update instance groups constraints to match scaling"]
|
|
app-manager --> fenix [label = "ACK_IN_SCALE"]
|
|
fenix --> app-manager [label = "PREPARE_MAINTENANCE", note="If there is not empty host Fenix makes one"]
|
|
app-manager --> fenix [label = "ACK_PREPARE_MAINTENANCE"]
|
|
fenix --> app-manager [label = "ADMIN_ACTION_DONE"]
|
|
=== Repeated for every compute ===
|
|
fenix -> app-manager [label = "PLANNED_MAINTENANCE", note="If VM-s are on the host. Migrate or Live migrate"]
|
|
app-manager -> fenix [label = "ACK_PLANNED_MAINTENANCE"]
|
|
fenix --> app-manager [label = "ADMIN_ACTION_DONE"]
|
|
fenix --> app-manager [label = "IN_MAINTENANCE"]
|
|
... Actual maintenance happens here ...
|
|
fenix --> app-manager [label = "MAINTENANCE_COMPLETE"]
|
|
=== --- ===
|
|
fenix --> app-manager [label = "MAINTENANCE_COMPLETE", note="Maintenance is done"]
|
|
app-manager --> fenix [label = "Add instance constraints of instances possibly added when scaling up when maintenance is completed. Update instance groups constraints to match scaling"]
|
|
app-manager --> fenix [label = "ACK_MAINTENANCE_COMPLETE", note="Up scale"]
|
|
|
|
}
|
|
|
|
|
|
Alternatives
|
|
------------
|
|
|
|
N/A
|
|
|
|
Data model impact
|
|
-----------------
|
|
|
|
Fenix database will need to have new tables to support instance and
|
|
instance group objects.
|
|
|
|
REST API impact
|
|
---------------
|
|
|
|
All APIs will have 200 OK as return. Error codes defined during implementation.
|
|
|
|
API PUT ``/v1/instance/{instance_id}`` is used to update instance object.
|
|
API GET ``/v1/instance/{instance_id}`` is used to get instance object.
|
|
``PUT`` API should have this structure as input and ``GET`` API as return::
|
|
|
|
{
|
|
"instance_id": "instance_UUId string",
|
|
"project_id": "Project UUID string",
|
|
"group_id": "group_UUID string",
|
|
"instance_name": "Name string",
|
|
"max_interruption_time": 120, # seconds
|
|
# How long live migration can take
|
|
"migration_type": "LIVE_MIGRATION",
|
|
# LIVE_MIGRATION, MIGRATION or OWN_ACTION
|
|
# Own action is create new and delete old instance.
|
|
# Note! VNF need to obey resource_mitigation with own action
|
|
# This affects to order of delete old and create new to not over
|
|
# commit the resources.
|
|
"resource_mitigation": "True", # True or False
|
|
# Current instance needs double allocation when being migrated.
|
|
# This is true also if instance first scaled out and only then the old
|
|
# instance is removed. It must be True also if VNF needed to scale
|
|
# down, since we go over that scaled down capacity.
|
|
"lead_time": 60 # seconds
|
|
# How long lead time VNF needs for 'migration_type' operation. VNF needs to
|
|
# report back to Fenix as soon as it is ready, but at least within this
|
|
# time. Reporting as fast as can is crucial for optimizing
|
|
# infrastructure upgrade/maintenance.
|
|
}
|
|
|
|
API DELETE ``/v1/instance/{instance_id}`` is used to delete instance object.
|
|
|
|
API PUT ``/v1/instance_group/{group_id}`` is used to update instance group
|
|
object::
|
|
|
|
{
|
|
"group_id": "group_UUID string",
|
|
"project_id": "Project UUID string",
|
|
"group_name": "Name string",
|
|
"anti_affinity_group": "True", # True or False
|
|
"max_instances_per_host": 2, # 1..N
|
|
# Describes how many instance can be on same host with
|
|
# anti_affinity_group: True
|
|
# Already exist in OpenStack as 'max_server_per_host', but might not
|
|
# exist in different clouds.
|
|
"max_impacted_members": 2, # 1..N
|
|
# Maximum amount of instances that can be impacted
|
|
# Note! This can be dynamic to VNF load
|
|
"recovery_time": 10, # seconds
|
|
# max_impacted_members needs to take into account counting previous
|
|
# action members before the recovery time passes
|
|
# Note! regardless anti_affinity
|
|
"resource_mitigation": "True", # True or False
|
|
# Instances in group needs double allocation when affected.
|
|
# This is true in migrations, but also if instance first scaled out and
|
|
# only then the old instance removed.
|
|
# It must be True also if VNF needed to scale down, since we go over
|
|
# that scaled down capacity.
|
|
}
|
|
|
|
API GET ``/v1/instance_group/{group_id}`` is used to get instance group.
|
|
compared to ``PUT`` this strcuture has also the ``instance_ids``::
|
|
|
|
{
|
|
"group_id": "group_UUID string",
|
|
"project_id": "Project UUID string",
|
|
"group_name": "Name string",
|
|
"anti_affinity_group": "True", # True or False
|
|
"max_instances_per_host": 2, # 1..N
|
|
# Describes how many instance can be on same host with
|
|
# anti_affinity_group: True
|
|
# Already exist in OpenStack as 'max_server_per_host', but might not
|
|
# exist in different clouds.
|
|
"max_impacted_members": 2, # 1..N
|
|
# Maximum amount of instances that can be impacted
|
|
# Note! This can be dynamic to VNF load
|
|
"recovery_time": 10, # seconds
|
|
# max_impacted_members needs to take into account counting previous
|
|
# action members before the recovery time passes
|
|
# Note! regardless anti_affinity
|
|
"resource_mitigation": "True", # True or False
|
|
# Instances in group needs double allocation when affected.
|
|
# This is true in migrations, but also if instance first scaled out and
|
|
# only then the old instance removed.
|
|
# It must be True also if VNF needed to scale down, since we go over
|
|
# that scaled down capacity.
|
|
"instance_ids": [] # List of instances belonging to this group
|
|
}
|
|
|
|
|
|
API DELETE ``/v1/instance_group/{instance_id}`` is used to delete instance
|
|
group object.
|
|
|
|
New API is needed for project instance specific reply:
|
|
|
|
This API will not be used to reply to 'state' 'PREPARE_MAINTENANCE' and
|
|
'PLANNED_MAINTENANCE' notifications that will be instance specific.
|
|
|
|
PUT ``/v1/maintenance/<session_id>/<project_id>/<instance_id>``::
|
|
|
|
{
|
|
"instance_action": "MIGRATE",
|
|
"state": "ACK_PLANNED_MAINTENANCE"
|
|
}
|
|
|
|
|
|
Notifications impact
|
|
--------------------
|
|
|
|
Event type ``maintenance.planned`` notification will need changes.
|
|
|
|
New ``state`` value ``INSTANCE_ACTION_FALLBACK`` should be added to tell live
|
|
migration was not possible and Fenix will force the migration to complete.
|
|
After that the normal ``INSTANCE_ACTION_DONE`` or ``INSTANCE_ACTION_FAILED``
|
|
will be expected.
|
|
|
|
``instance_ids`` is currently limited to either single ``instance_id`` or
|
|
a link to get all affected instances. Now this should be always a single
|
|
instance, but in ``state`` value of ``MAINTENANCE`` or ``SCALE_IN``.
|
|
``MAINTENANCE`` should always have the link to Fenix API to get all instances
|
|
that may be affected during the maintenance session. ``SCALE_IN`` can mention
|
|
only one exact instance as it maybe be needed to allow other pinned instance
|
|
to have a target host with needed resources. This can happen in small edge
|
|
deployment. Empty string indicates VNF can decide how it scales down. Workflow
|
|
may then need to have several ``SCALE_IN`` notifications to finally have enough
|
|
unused resources to execute workflow further. ``state`` having value
|
|
``MAINTENANCE_COMPLETE`` should have empty string as ``instance_ids`` value. In
|
|
this ``state`` VNF should scale back to instances it had in the beginning of
|
|
the maintenance session.
|
|
|
|
Other end user impact
|
|
---------------------
|
|
|
|
VNFD and EM needs to support defining and updating instance and instance group
|
|
variables
|
|
|
|
Other deployer impact
|
|
---------------------
|
|
|
|
VNFM needs to proxy updating instance and instance group
|
|
variables
|
|
|
|
|
|
Implementation
|
|
==============
|
|
|
|
Assignee(s)
|
|
-----------
|
|
|
|
Primary assignee:
|
|
Tomi Juvonen <tomi.juvonen@nokia.com>
|
|
|
|
Work Items
|
|
----------
|
|
|
|
* APIs to set instance and instance group objects
|
|
* Example workflow
|
|
* Testing
|
|
* Documentation changes
|
|
|
|
|
|
Dependencies
|
|
============
|
|
|
|
There can be enhancements later on to other projects. Anyhow initially needed
|
|
functionality can be handled completely inside Fenix.
|
|
|
|
|
|
Testing
|
|
=======
|
|
|
|
There is huge amount of combinations of VNF deployments and used variables can
|
|
be changed during the operations. Fenix will support all there variables and
|
|
their changes. Fenix workflow is always an example and limits to what it can
|
|
support and is tested against. The main thing to test is that all variables and
|
|
their changes are supported and validated. The testing of VNF deployment might
|
|
be limited to example use case supported by example workflow.
|
|
|
|
|
|
Documentation Impact
|
|
====================
|
|
|
|
Fenix documentation needs to be updated after the implementation is ready.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. _`ETSI FEAT03 related documentation`: https://nfvwiki.etsi.org/index.php?title=Feature_Tracking#FEAT03:_NFVI_software_modification
|