Spesification to implement ETSI

Specification to start implementing ETSI FEAT03: NFVI software modification. story: 2006557 Task: #36646 Change-Id: Iab16f95766e3bb81f072a97ea76921a030fbe3e0 Signed-off-by: Tomi Juvonen <tomi.juvonen@nokia.com>
2019-09-18 07:40:56 +03:00 · 2019-09-18 07:40:56 +03:00 · 33e89ab6d6
parent 0ee8e156ad
commit 33e89ab6d6
3 changed files with 232 additions and 0 deletions
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@ -22,6 +22,7 @@ Contents:
   cli/index
   user/index
   admin/index
+   specifications/index
   reference/index

 Indices and tables
--- a/doc/source/specifications/index.rst
+++ b/doc/source/specifications/index.rst
@ -0,0 +1,11 @@
+=====================
+Fenix spesicifactions
+=====================
+
+.. toctree::
+   :maxdepth: 2
+
+   ussuri-etsi-feat03.rst
+
+List of features having more detailed spesifications
+
--- a/doc/source/specifications/ussuri-etsi-feat03.rst
+++ b/doc/source/specifications/ussuri-etsi-feat03.rst
@ -0,0 +1,220 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+=============================================
+ETSI NFVI software modification specification
+=============================================
+
+https://storyboard.openstack.org/#!/story/2006557
+
+Implement the needed interfacing between VNFM and Fenix that is specified in
+`ETSI FEAT03 related documentation`_ etsi. Limit current changes to instances
+and instance groups.
+
+Problem description
+===================
+
+This feature addresses the support for the coordination of the NFVI software
+modification process with the VNFs hosted on the NFVI in order to minimize
+impact on service availability.
+
+Use Cases
+---------
+
+Guarantee a zero impact to VNF service during Fenix infrastructure maintenance,
+upgrade and scaling workflow operation. This implies that VNF and VNFM supports
+the ETSI specification and Fenix interaction.
+
+
+Proposed change
+===============
+
+Implement APIs to set VNF specific instance and instance group variables.
+
+New APIs are to have VNF project instance and instance group data changed in
+the Fenix database. These constraints might be set in VNFD or the VNF element
+manager can change these any time according to VNF current load level.
+Having the constraints gives the ability to optimize the infrastructure
+maintenance operation as we can scale down the VNFs as much as possible and
+therefore to able to maintain prallel as many compute nodes as possible.
+Instance groupping can be instanaces belonging to certain antiaffinity group,
+but all instances need to be groupped, so we know how many of those are at
+least needed and how many of those can be expossed to maintenance at the same
+time. If nothing else, group mean instance of a certain flavor.
+
+Make an example workflow that supports the usage of these APIs. Workflow should
+implement one example rolling maintenance use case. Existing Fenix interaction
+towards VNFM will be utilized with small changes.
+
+The variables common to instance and instance group can be overridden in the
+instance object. Both objects can be updated at any time. Update can be
+considered in any action that is not currently not ongoing. Existing timer
+would not be updated. These objects are not enough to optimize infrastructure
+workflow. The existing Fenix interaction is also needed to optimize the
+maintenance window as small as possible. Also this allows upgrading the VNF
+with new infrastructure capabilities and with no additional impact on VNF
+service availability if done at the same time as the infrastructure upgrade.
+
+
+Alternatives
+------------
+
+N/A
+
+Data model impact
+-----------------
+
+Fenix database will need to have new tables to support instance and
+instance group objects.
+
+REST API impact
+---------------
+
+API PUT ``/v1/instance/{instance_id}`` is used to update instance object::
+
+    {
+        "instance_id": "instance_UUId string",
+        "project_id": "Project UUID string",
+        "name": "Name string",
+        "max_interruption_time": 120, # seconds
+        # How long live migration can take
+        "migration_type": "LIVE_MIGRATION",
+        # LIVE_MIGRATION, MIGRATION or OWN_ACTION
+        # Own action is create new and delete old instance.
+        # Note! VNF need to obey resource_mitigation with own action
+        # This affects to order of delete old and create new to not over
+        # commit the resources.
+        "resource_mitigation": "True", # True or False
+        # Current instance needs double allocation when being migrated.
+        # This is true also if instance first scaled out and only then the old
+        # instance is removed. It must be True also if VNF needed to scale
+        # down, since we go over that scaled down capacity.
+        "lead_time": 60 # seconds
+        # How long time VNF needs for 'migration_type' operation. VNF needs to
+        # report back to Fenix as soon as it is ready, but at least within this
+        # time. Reporting as fast as can is crucial for optimizing
+        # infrastructure upgrade/maintenance.
+    }
+
+API DELETE ``/v1/instance/{instance_id}`` is used to delete instance object.
+
+API PUT ``/v1/instance_group/{group_id}`` is used to update instance group
+object::
+
+    {
+        "group_id": "group_UUID string",
+        "project_id": "Project UUID string",
+        "name": "Name string",
+        "anti_affinity_group": "True", # True or False
+        "max_instances_per_host": 2, # 1..N
+        # Describes how many instance can be on same host with
+        # anti_affinity_group: True
+        # Already exist in OpenStack as 'max_server_per_host', but might not
+        # exist in different clouds.
+        "max_impacted_members": 2, # 1..N
+        # Maximum amount of instances that can be impacted
+        # Note! This can be dynamic to VNF load
+        "recovery_time": 10, # seconds
+        # max_impacted_members needs to take into account counting previous
+        # action members before the recovery time passes
+        # Note! regardless anti_affinity
+        "resource_mitigation": "True", # True or False
+        # Instances in group needs double allocation when affected.
+        # This is true in migrations, but also if instance first scaled out and
+        # only then the old instance removed.
+        # It must be True also if VNF needed to scale down, since we go over
+        # that scaled down capacity.
+    }
+
+API DELETE ``/v1/instance_group/{instance_id}`` is used to delete instance
+group object.
+
+
+Notifications impact
+--------------------
+
+Event type ``maintenance.planned`` notification will need changes.
+
+New ``state`` value ``INSTANCE_ACTION_FALLBACK`` should be added to tell live
+migration was not possible and Fenix will force the migration to complete.
+After that the normal ``INSTANCE_ACTION_DONE`` or ``INSTANCE_ACTION_FAILED``
+will be expected.
+
+``instance_ids`` is currently limited to either single ``instance_id`` or
+a link to get all affected instances. Now this should be always a single
+instance, but in ``state`` value of ``MAINTENANCE`` or ``SCALE_IN``.
+``MAINTENANCE`` should always have the link to Fenix API to get all instances
+that may be affected during the maintenance session. ``SCALE_IN`` can mention
+only one exact instance as it maybe be needed to allow other pinned instance
+to have a target host with needed resources. This can happen in small edge
+deployment. Empty string indicates VNF can decide how it scales down. Workflow
+may then need to have several ``SCALE_IN`` notifications to finally have enough
+unused resources to execute workflow further. ``state`` having value
+``MAINTENANCE_COMPLETE`` should have empty string as ``instance_ids`` value. In
+this ``state`` VNF should scale back to instances it had in the beginning of
+the maintenance session.
+
+Other end user impact
+---------------------
+
+VNFD and EM needs to support defining and updating instance and instance group
+variables
+
+Other deployer impact
+---------------------
+
+VNFM needs to proxy updating instance and instance group
+variables
+
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  tojuvone
+
+Work Items
+----------
+
+* APIs to set instance and instance group objects
+* Example workflow
+* Testing
+* Documentation changes
+
+
+Dependencies
+============
+
+There can be enhancements later on to other projects. Anyhow initially needed
+functionality can be handled completely inside Fenix.
+
+
+Testing
+=======
+
+There is huge amount of combinations of VNF deployments and used variables can
+be changed during the operations. Fenix will support all there variables and
+their changes. Fenix workflow is always an example and limits to what it can
+support and is tested against. The main thing to test is that all variables and
+their changes are supported and validated. The testing of VNF deployment might
+be limited to example use case supported by example workflow.
+
+
+Documentation Impact
+====================
+
+Fenix documentation needs to be updated after the implementation is ready.
+
+
+References
+==========
+
+* `ETSI FEAT03 related documentation`_
+
+.. _`ETSI FEAT03 related documentation`: https://nfvwiki.etsi.org/index.php?title=Feature_Tracking#FEAT03:_NFVI_software_modification