From 97f09dd483a5573be49eba2b9a97af525403bde5 Mon Sep 17 00:00:00 2001 From: Tomi Juvonen Date: Wed, 18 Sep 2019 07:40:56 +0300 Subject: [PATCH] Spesification to implement ETSI Specification to start implementing ETSI FEAT03: NFVI software modification. story: 2006557 Task: #36646 Change-Id: Iab16f95766e3bb81f072a97ea76921a030fbe3e0 Signed-off-by: Tomi Juvonen --- doc/source/index.rst | 1 + doc/source/specifications/index.rst | 11 + .../specifications/ussuri-etsi-feat03.rst | 253 ++++++++++++++++++ 3 files changed, 265 insertions(+) create mode 100644 doc/source/specifications/index.rst create mode 100644 doc/source/specifications/ussuri-etsi-feat03.rst diff --git a/doc/source/index.rst b/doc/source/index.rst index 8ab782a..0dd44ef 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -22,6 +22,7 @@ Contents: cli/index user/index admin/index + specifications/index reference/index Indices and tables diff --git a/doc/source/specifications/index.rst b/doc/source/specifications/index.rst new file mode 100644 index 0000000..4a820fa --- /dev/null +++ b/doc/source/specifications/index.rst @@ -0,0 +1,11 @@ +===================== +Fenix specifications +===================== + +.. toctree:: + :maxdepth: 2 + + ussuri-etsi-feat03.rst + +List of features having more detailed specifications + diff --git a/doc/source/specifications/ussuri-etsi-feat03.rst b/doc/source/specifications/ussuri-etsi-feat03.rst new file mode 100644 index 0000000..667453b --- /dev/null +++ b/doc/source/specifications/ussuri-etsi-feat03.rst @@ -0,0 +1,253 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +============================================= +ETSI NFVI software modification specification +============================================= + +https://storyboard.openstack.org/#!/story/2006557 + +Implement the needed interfacing between VNFM and Fenix that is specified in +`ETSI FEAT03 related documentation`_ etsi. Limit current changes to instances +and instance groups. + +Problem description +=================== + +This feature addresses the support for the coordination of the NFVI software +modification process with the VNFs hosted on the NFVI in order to minimize +impact on service availability. + +Use Cases +--------- + +Guarantee a zero impact to VNF service during Fenix infrastructure maintenance, +upgrade and scaling workflow operation. This implies that VNF and VNFM supports +the ETSI specification and Fenix interaction. + + +Proposed change +=============== + +Implement APIs to set VNF specific instance and instance group variables. + +New APIs are to have VNF project instance and instance group data changed in +the Fenix database. These constraints might be set in VNFD or the VNF element +manager can change these any time according to VNF current load level. +Having the constraints gives the ability to optimize the infrastructure +maintenance operation as we can scale down the VNFs as much as possible and +therefore to able to maintain prallel as many compute nodes as possible. +Instance groupping can be instanaces belonging to certain antiaffinity group, +but all instances need to be groupped, so we know how many of those are at +least needed and how many of those can be expossed to maintenance at the same +time. If nothing else, group mean instance of a certain flavor. + +Make an example workflow that supports the usage of these APIs. Workflow should +implement one example rolling maintenance use case. Existing Fenix interaction +towards VNFM will be utilized with small changes. + +The variables common to instance and instance group can be overridden in the +instance object. Both objects can be updated at any time. Update can be +considered in any action that is not currently not ongoing. Existing timer +would not be updated. These objects are not enough to optimize infrastructure +workflow. The existing Fenix interaction is also needed to optimize the +maintenance window as small as possible. Also this allows upgrading the VNF +with new infrastructure capabilities and with no additional impact on VNF +service availability if done at the same time as the infrastructure upgrade. + +This diagram will illustrate the existing Fenix workflow where application +manager updates instance and instance group constraints always when +instances are created or deleted. Contraints can also be updated anytime +if the level of VNF service will allow different amount of instances at +that time. + +.. seqdiag:: + + seqdiag { + activation = none; + app-manager --> fenix [label = "Update instance and instance group constraints anytime and when created"] + === --- === + infra-admin -> fenix [label = "Maintenance session \n for hosts", note="Start the maintenance process"]; + fenix -> app-manager [label = "MAINTENANCE"]; + app-manager -> fenix [label = "ACK_MAINTENANCE"]; + fenix --> app-manager [label = "IN_SCALE", note="Optional down scale"]; + app-manager --> fenix [label = "Remove instance related constraints of scaled down instance"] + app-manager --> fenix [label = "ACK_IN_SCALE"] + fenix --> app-manager [label = "PREPARE_MAINTENANCE", note="If there is not empty host Fenix makes one"] + app-manager --> fenix [label = "ACK_PREPARE_MAINTENANCE"] + fenix --> app-manager [label = "ADMIN_ACTION_DONE"] + === Repeated for every compute === + fenix -> app-manager [label = "PLANNED_MAINTENANCE", note="If VM-s are on the host. Migrate or Live migrate"] + app-manager -> fenix [label = "ACK_PLANNED_MAINTENANCE"] + fenix --> app-manager [label = "ADMIN_ACTION_DONE"] + fenix --> app-manager [label = "IN_MAINTENANCE"] + ... Actual maintenance happens here ... + fenix --> app-manager [label = "MAINTENANCE_COMPLETE"] + === --- === + fenix --> app-manager [label = "MAINTENANCE_COMPLETE", note="Maintenance is done"] + app-manager --> fenix [label = "Add instance constraints of instances possibly added when scaling up when maintenance is completed"] + app-manager --> fenix [label = "ACK_MAINTENANCE_COMPLETE", note="Up scale"] + + } + + +Alternatives +------------ + +N/A + +Data model impact +----------------- + +Fenix database will need to have new tables to support instance and +instance group objects. + +REST API impact +--------------- + +API PUT ``/v1/instance/{instance_id}`` is used to update instance object:: + + { + "instance_id": "instance_UUId string", + "project_id": "Project UUID string", + "name": "Name string", + "max_interruption_time": 120, # seconds + # How long live migration can take + "migration_type": "LIVE_MIGRATION", + # LIVE_MIGRATION, MIGRATION or OWN_ACTION + # Own action is create new and delete old instance. + # Note! VNF need to obey resource_mitigation with own action + # This affects to order of delete old and create new to not over + # commit the resources. + "resource_mitigation": "True", # True or False + # Current instance needs double allocation when being migrated. + # This is true also if instance first scaled out and only then the old + # instance is removed. It must be True also if VNF needed to scale + # down, since we go over that scaled down capacity. + "lead_time": 60 # seconds + # How long time VNF needs for 'migration_type' operation. VNF needs to + # report back to Fenix as soon as it is ready, but at least within this + # time. Reporting as fast as can is crucial for optimizing + # infrastructure upgrade/maintenance. + } + +API DELETE ``/v1/instance/{instance_id}`` is used to delete instance object. + +API PUT ``/v1/instance_group/{group_id}`` is used to update instance group +object:: + + { + "group_id": "group_UUID string", + "project_id": "Project UUID string", + "name": "Name string", + "anti_affinity_group": "True", # True or False + "max_instances_per_host": 2, # 1..N + # Describes how many instance can be on same host with + # anti_affinity_group: True + # Already exist in OpenStack as 'max_server_per_host', but might not + # exist in different clouds. + "max_impacted_members": 2, # 1..N + # Maximum amount of instances that can be impacted + # Note! This can be dynamic to VNF load + "recovery_time": 10, # seconds + # max_impacted_members needs to take into account counting previous + # action members before the recovery time passes + # Note! regardless anti_affinity + "resource_mitigation": "True", # True or False + # Instances in group needs double allocation when affected. + # This is true in migrations, but also if instance first scaled out and + # only then the old instance removed. + # It must be True also if VNF needed to scale down, since we go over + # that scaled down capacity. + } + +API DELETE ``/v1/instance_group/{instance_id}`` is used to delete instance +group object. + + +Notifications impact +-------------------- + +Event type ``maintenance.planned`` notification will need changes. + +New ``state`` value ``INSTANCE_ACTION_FALLBACK`` should be added to tell live +migration was not possible and Fenix will force the migration to complete. +After that the normal ``INSTANCE_ACTION_DONE`` or ``INSTANCE_ACTION_FAILED`` +will be expected. + +``instance_ids`` is currently limited to either single ``instance_id`` or +a link to get all affected instances. Now this should be always a single +instance, but in ``state`` value of ``MAINTENANCE`` or ``SCALE_IN``. +``MAINTENANCE`` should always have the link to Fenix API to get all instances +that may be affected during the maintenance session. ``SCALE_IN`` can mention +only one exact instance as it maybe be needed to allow other pinned instance +to have a target host with needed resources. This can happen in small edge +deployment. Empty string indicates VNF can decide how it scales down. Workflow +may then need to have several ``SCALE_IN`` notifications to finally have enough +unused resources to execute workflow further. ``state`` having value +``MAINTENANCE_COMPLETE`` should have empty string as ``instance_ids`` value. In +this ``state`` VNF should scale back to instances it had in the beginning of +the maintenance session. + +Other end user impact +--------------------- + +VNFD and EM needs to support defining and updating instance and instance group +variables + +Other deployer impact +--------------------- + +VNFM needs to proxy updating instance and instance group +variables + + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + Tomi Juvonen + +Work Items +---------- + +* APIs to set instance and instance group objects +* Example workflow +* Testing +* Documentation changes + + +Dependencies +============ + +There can be enhancements later on to other projects. Anyhow initially needed +functionality can be handled completely inside Fenix. + + +Testing +======= + +There is huge amount of combinations of VNF deployments and used variables can +be changed during the operations. Fenix will support all there variables and +their changes. Fenix workflow is always an example and limits to what it can +support and is tested against. The main thing to test is that all variables and +their changes are supported and validated. The testing of VNF deployment might +be limited to example use case supported by example workflow. + + +Documentation Impact +==================== + +Fenix documentation needs to be updated after the implementation is ready. + + +References +========== + +.. _`ETSI FEAT03 related documentation`: https://nfvwiki.etsi.org/index.php?title=Feature_Tracking#FEAT03:_NFVI_software_modification