Spesification to implement ETSI

Specification to start implementing ETSI FEAT03: NFVI software
modification.

story: 2006557
Task: #36646

Change-Id: Iab16f95766e3bb81f072a97ea76921a030fbe3e0
Signed-off-by: Tomi Juvonen <tomi.juvonen@nokia.com>
This commit is contained in:
Tomi Juvonen 2019-09-18 07:40:56 +03:00
parent 0ee8e156ad
commit 33e89ab6d6
3 changed files with 232 additions and 0 deletions

View File

@ -22,6 +22,7 @@ Contents:
cli/index
user/index
admin/index
specifications/index
reference/index
Indices and tables

View File

@ -0,0 +1,11 @@
=====================
Fenix spesicifactions
=====================
.. toctree::
:maxdepth: 2
ussuri-etsi-feat03.rst
List of features having more detailed spesifications

View File

@ -0,0 +1,220 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=============================================
ETSI NFVI software modification specification
=============================================
https://storyboard.openstack.org/#!/story/2006557
Implement the needed interfacing between VNFM and Fenix that is specified in
`ETSI FEAT03 related documentation`_ etsi. Limit current changes to instances
and instance groups.
Problem description
===================
This feature addresses the support for the coordination of the NFVI software
modification process with the VNFs hosted on the NFVI in order to minimize
impact on service availability.
Use Cases
---------
Guarantee a zero impact to VNF service during Fenix infrastructure maintenance,
upgrade and scaling workflow operation. This implies that VNF and VNFM supports
the ETSI specification and Fenix interaction.
Proposed change
===============
Implement APIs to set VNF specific instance and instance group variables.
New APIs are to have VNF project instance and instance group data changed in
the Fenix database. These constraints might be set in VNFD or the VNF element
manager can change these any time according to VNF current load level.
Having the constraints gives the ability to optimize the infrastructure
maintenance operation as we can scale down the VNFs as much as possible and
therefore to able to maintain prallel as many compute nodes as possible.
Instance groupping can be instanaces belonging to certain antiaffinity group,
but all instances need to be groupped, so we know how many of those are at
least needed and how many of those can be expossed to maintenance at the same
time. If nothing else, group mean instance of a certain flavor.
Make an example workflow that supports the usage of these APIs. Workflow should
implement one example rolling maintenance use case. Existing Fenix interaction
towards VNFM will be utilized with small changes.
The variables common to instance and instance group can be overridden in the
instance object. Both objects can be updated at any time. Update can be
considered in any action that is not currently not ongoing. Existing timer
would not be updated. These objects are not enough to optimize infrastructure
workflow. The existing Fenix interaction is also needed to optimize the
maintenance window as small as possible. Also this allows upgrading the VNF
with new infrastructure capabilities and with no additional impact on VNF
service availability if done at the same time as the infrastructure upgrade.
Alternatives
------------
N/A
Data model impact
-----------------
Fenix database will need to have new tables to support instance and
instance group objects.
REST API impact
---------------
API PUT ``/v1/instance/{instance_id}`` is used to update instance object::
{
"instance_id": "instance_UUId string",
"project_id": "Project UUID string",
"name": "Name string",
"max_interruption_time": 120, # seconds
# How long live migration can take
"migration_type": "LIVE_MIGRATION",
# LIVE_MIGRATION, MIGRATION or OWN_ACTION
# Own action is create new and delete old instance.
# Note! VNF need to obey resource_mitigation with own action
# This affects to order of delete old and create new to not over
# commit the resources.
"resource_mitigation": "True", # True or False
# Current instance needs double allocation when being migrated.
# This is true also if instance first scaled out and only then the old
# instance is removed. It must be True also if VNF needed to scale
# down, since we go over that scaled down capacity.
"lead_time": 60 # seconds
# How long time VNF needs for 'migration_type' operation. VNF needs to
# report back to Fenix as soon as it is ready, but at least within this
# time. Reporting as fast as can is crucial for optimizing
# infrastructure upgrade/maintenance.
}
API DELETE ``/v1/instance/{instance_id}`` is used to delete instance object.
API PUT ``/v1/instance_group/{group_id}`` is used to update instance group
object::
{
"group_id": "group_UUID string",
"project_id": "Project UUID string",
"name": "Name string",
"anti_affinity_group": "True", # True or False
"max_instances_per_host": 2, # 1..N
# Describes how many instance can be on same host with
# anti_affinity_group: True
# Already exist in OpenStack as 'max_server_per_host', but might not
# exist in different clouds.
"max_impacted_members": 2, # 1..N
# Maximum amount of instances that can be impacted
# Note! This can be dynamic to VNF load
"recovery_time": 10, # seconds
# max_impacted_members needs to take into account counting previous
# action members before the recovery time passes
# Note! regardless anti_affinity
"resource_mitigation": "True", # True or False
# Instances in group needs double allocation when affected.
# This is true in migrations, but also if instance first scaled out and
# only then the old instance removed.
# It must be True also if VNF needed to scale down, since we go over
# that scaled down capacity.
}
API DELETE ``/v1/instance_group/{instance_id}`` is used to delete instance
group object.
Notifications impact
--------------------
Event type ``maintenance.planned`` notification will need changes.
New ``state`` value ``INSTANCE_ACTION_FALLBACK`` should be added to tell live
migration was not possible and Fenix will force the migration to complete.
After that the normal ``INSTANCE_ACTION_DONE`` or ``INSTANCE_ACTION_FAILED``
will be expected.
``instance_ids`` is currently limited to either single ``instance_id`` or
a link to get all affected instances. Now this should be always a single
instance, but in ``state`` value of ``MAINTENANCE`` or ``SCALE_IN``.
``MAINTENANCE`` should always have the link to Fenix API to get all instances
that may be affected during the maintenance session. ``SCALE_IN`` can mention
only one exact instance as it maybe be needed to allow other pinned instance
to have a target host with needed resources. This can happen in small edge
deployment. Empty string indicates VNF can decide how it scales down. Workflow
may then need to have several ``SCALE_IN`` notifications to finally have enough
unused resources to execute workflow further. ``state`` having value
``MAINTENANCE_COMPLETE`` should have empty string as ``instance_ids`` value. In
this ``state`` VNF should scale back to instances it had in the beginning of
the maintenance session.
Other end user impact
---------------------
VNFD and EM needs to support defining and updating instance and instance group
variables
Other deployer impact
---------------------
VNFM needs to proxy updating instance and instance group
variables
Implementation
==============
Assignee(s)
-----------
Primary assignee:
tojuvone
Work Items
----------
* APIs to set instance and instance group objects
* Example workflow
* Testing
* Documentation changes
Dependencies
============
There can be enhancements later on to other projects. Anyhow initially needed
functionality can be handled completely inside Fenix.
Testing
=======
There is huge amount of combinations of VNF deployments and used variables can
be changed during the operations. Fenix will support all there variables and
their changes. Fenix workflow is always an example and limits to what it can
support and is tested against. The main thing to test is that all variables and
their changes are supported and validated. The testing of VNF deployment might
be limited to example use case supported by example workflow.
Documentation Impact
====================
Fenix documentation needs to be updated after the implementation is ready.
References
==========
* `ETSI FEAT03 related documentation`_
.. _`ETSI FEAT03 related documentation`: https://nfvwiki.etsi.org/index.php?title=Feature_Tracking#FEAT03:_NFVI_software_modification