Browse Source

Documentation changes for VNF workflow implementation

- New VNF workflow
- ETSI FEAT03 changes

story: 2006838
Task: #37843

Change-Id: I2cdcbbb3f68a71004e59427c6c1a48e38d4ae2cb
Signed-off-by: Tomi Juvonen <tomi.juvonen@nokia.com>
changes/29/699929/4
Tomi Juvonen 2 years ago
parent
commit
237e4ed0c9
  1. 110
      doc/source/api-ref/v1/parameters.yaml
  2. 225
      doc/source/api-ref/v1/project.inc
  3. 4
      doc/source/api-ref/v1/samples/input-from-project-instance-to-maintenance-session-put.json
  4. 10
      doc/source/api-ref/v1/samples/instance-constraints.json
  5. 10
      doc/source/api-ref/v1/samples/instance-group-constraints.json
  6. 3
      doc/source/install/index.rst
  7. 30
      doc/source/notification/notifications.rst
  8. 59
      doc/source/specifications/ussuri-etsi-feat03.rst
  9. 97
      doc/source/user/advanced_workflow.rst
  10. 37
      doc/source/user/architecture.rst
  11. 1
      doc/source/user/index.rst

110
doc/source/api-ref/v1/parameters.yaml

@ -6,6 +6,13 @@
# Variables in path #
#############################################################################
group-uuid-path:
description: |
Instance group uuid. Should match with OpenStack server group if one exists.
in: path
required: true
type: string
session_id:
description: |
Session ID
@ -64,6 +71,20 @@ action-plugins:
required: true
type: list of dictionaries
boolean:
description: |
Boolean
in: body
required: true
type: boolean
group-uuid:
description: |
Instance group uuid. Should match with OpenStack server group if one exists.
in: body
required: true
type: string
hosts:
description: |
Hosts to be maintained. An empty list can indicate hosts are to be
@ -72,6 +93,13 @@ hosts:
required: true
type: list of strings
instance-action:
description: |
Action string
in: body
required: true
type: string
instance-actions:
description: |
instance ID : action string
@ -79,6 +107,13 @@ instance-actions:
required: true
type: dictionary
instance-group:
description: |
Instance group name. Should match with OpenStack server group if one exists.
in: body
required: true
type: string
instance-ids:
description: |
List of instance IDs.
@ -86,6 +121,23 @@ instance-ids:
required: true
type: list of strings
instance-name:
description: |
Instance name.
in: body
required: true
type: string
lead-time:
description: |
How long lead time VNF needs for 'migration_type' operation. VNF needs to
report back to Fenix as soon as it is ready, but at least within this
time. Reporting as fast as can is crucial for optimizing
infrastructure upgrade/maintenance.
in: body
required: true
type: integer
maintenance-workflow-start-time:
description: |
Maintenance workflow start time.
@ -93,6 +145,33 @@ maintenance-workflow-start-time:
required: true
type: string
max-impacted-members:
description: |
Maximum amount of instances that can be impacted.
Note! This can be dynamic to VNF load. This is important to know how many
instances can be scaled down and still have this value above zero to be able
to move VMs between nodes.
in: body
required: true
type: integer
max-instances-per-host:
description: |
Describes how many instance can be on same host if
anti_affinity_group: True
Already exist in OpenStack as 'max_server_per_host', but might not
exist in different clouds.
in: body
required: true
type: integer
max-interruption-time:
description: |
Seconds of how long live migration can take.
in: body
required: true
type: integer
metadata:
description: |
Metadata; like hints to projects
@ -100,6 +179,37 @@ metadata:
required: true
type: dictionary
migration-type:
description: |
LIVE_MIGRATION, MIGRATION or OWN_ACTION
Own action is create new and delete old instance.
Note! VNF need to obey resource_mitigation with own action
This affects to order of delete old and create new to not over
commit the resources.
in: body
required: true
type: string
recovery-time:
description: |
VNF recovery time after operation to instance. Workflow needs to take
into account recovery_time for previous instance moved and only then
start moving next obyeing max_impacted_members
Note! regardless anti_affinity group or not
in: body
required: true
type: integer
resource-mitigation:
description: |
Instance needs double allocation when being migrated.
This is true also if instance first scaled out and only then the old
instance is removed. It must be True also if VNF needed to scale
down, since we go over that scaled down capacity.
in: body
required: true
type: boolean
upgrade-list:
description: |
List of needed SW upgrade packages:

225
doc/source/api-ref/v1/project.inc

@ -69,3 +69,228 @@ Response codes
.. rest_status_code:: success status.yaml
- 200
============================
Project with NFV constraints
============================
These APIs are for VNFs, VNMF and EM that are made to support ETSI defined
standard VIM interface for sophisticated interaction to optimize rolling
maintenance, upgrade, scaling and lifecycle management. These interface
enhancements guarantees zero impact to VNF service during these operations
and defining real time constraints for optimal operation performace.
Input from project instance to maintenance session
==================================================
.. rest_method:: PUT /v1/maintenance/{session_id}/{project_id}/{instance_id}
When using workflow utilizing ETSI constraints the 'state' 'PREPARE_MAINTENANCE'
and 'PLANNED_MAINTENANCE' notifications will be instance specific. This means
also the reply needs to be instance specific instead the project specific above.
Request
-------
.. rest_parameters:: parameters.yaml
- session_id: uuid-path
- project_id: uuid-path
- instance_id: uuid-path
- instance_action: instance-action
- state: workflow-state-reply
.. literalinclude:: samples/input-from-project-instance-to-maintenance-session-put.json
:language: javascript
Response codes
--------------
.. rest_status_code:: success status.yaml
- 200
Get instance constraints saved in Fenix DB
==========================================
.. rest_method:: GET /v1/instance/{instance_id}
Get instance constraints saved in Fenix DB. Initially this information
is coming from VNF(M) and needs to be syncronized to Fenix.
Request
-------
.. rest_parameters:: parameters.yaml
- project_id: uuid
- instance_id: uuid-path
- instance_id: uuid
- group_id: group-uuid
- name: instance-name
- migration_type: migration-type
- max_interruption_time: max-interruption-time
- resource_mitigation: resource-mitigation
- lead_time: lead-time
.. literalinclude:: samples/instance-constraints.json
:language: javascript
Response codes
--------------
.. rest_status_code:: success status.yaml
- 200
Update instance constraints saved to Fenix DB
=============================================
.. rest_method:: PUT /v1/instance/{instance_id}
Update instance constraints to Fenix DB. Initially this information
is coming from VNF(M) and needs to be syncronized to Fenix.
Request
-------
.. rest_parameters:: parameters.yaml
- project_id: uuid
- instance_id: uuid-path
- instance_id: uuid
- group_id: group-uuid
- name: instance-name
- migration_type: migration-type
- max_interruption_time: max-interruption-time
- resource_mitigation: resource-mitigation
- lead_time: lead-time
.. literalinclude:: samples/instance-constraints.json
:language: javascript
Response codes
--------------
.. rest_status_code:: success status.yaml
- 200
Delete instance constraints from Fenix DB
=========================================
.. rest_method:: DELETE /v1/instance/{instance_id}
When instance is deleted, the constraints should also be deleted from
the Fenix DB. As Fenix is aware of existing instances, this could later be
enhanced so that Fenix houskeeping could take care of removing deleted
instances.
Request
-------
.. rest_parameters:: parameters.yaml
- instance_id: uuid-path
Response codes
--------------
.. rest_status_code:: success status.yaml
- 200
Get instance group constraints saved in Fenix DB
================================================
.. rest_method:: GET /v1/instance_group/{group_id}
Get instance group constraints saved in Fenix DB. Initially this information
is coming from VNF(M) and needs to be syncronized to Fenix.
Request
-------
.. rest_parameters:: parameters.yaml
- group_id: group-uuid-path
- group_id: group-uuid
- project_id: uuid
- instance_id: uuid-path
- instance_id: uuid
- name: instance-name
- migration_type: migration-type
- max_interruption_time: max-interruption-time
- resource_mitigation: resource-mitigation
- lead_time: lead-time
.. literalinclude:: samples/instance-group-constraints.json
:language: javascript
Response codes
--------------
.. rest_status_code:: success status.yaml
- 200
Update instance group constraints saved to Fenix DB
===================================================
.. rest_method:: PUT /v1/instance_group/{group_id}
Update instance group constraints to Fenix DB. Initially this information
is coming from VNF(M) and needs to be syncronized to Fenix.
Request
-------
.. rest_parameters:: parameters.yaml
- group_id: group-uuid-path
- group_id: group-uuid
- project_id: uuid
- name: instance-group
- anti_affinity_group: boolean
- max_instances_per_host: max-instances-per-host
- max_impacted_members: max-impacted-members
- recovery_time: recovery-time
- resource_mitigation: resource-mitigation
.. literalinclude:: samples/instance-group-constraints.json
:language: javascript
Response codes
--------------
.. rest_status_code:: success status.yaml
- 200
Delete instance group constraints from Fenix DB
===============================================
.. rest_method:: DELETE /v1/instance_group/{group_id}
When instance group is deleted, the constraints should also be deleted from
the Fenix DB. As Fenix is aware of existing instances, this could later be
enhanced so that Fenix houskeeping could take care of removing deleted
instances.
Request
-------
.. rest_parameters:: parameters.yaml
- group_id: group-uuid-path
Response codes
--------------
.. rest_status_code:: success status.yaml
- 200

4
doc/source/api-ref/v1/samples/input-from-project-instance-to-maintenance-session-put.json

@ -0,0 +1,4 @@
{
"instance_action": "MIGRATE",
"state": "ACK_PLANNED_MAINTENANCE"
}

10
doc/source/api-ref/v1/samples/instance-constraints.json

@ -0,0 +1,10 @@
{
"instance_id": "28d226f3-8d06-444f-a3f1-c586d2e7cb39",
"project_id": "1ad1154137ac41799cefd5caebae379b",
"group_id": "a01d192c-328e-4708-9b3c-9d716cd24a92",
"instance_name": "VM1",
"max_interruption_time": 120,
"migration_type": "LIVE_MIGRATION",
"resource_mitigation": True,
"lead_time": 40
}

10
doc/source/api-ref/v1/samples/instance-group-constraints.json

@ -0,0 +1,10 @@
{
"project_id": "1ad1154137ac41799cefd5caebae379b",
"group_id": "a01d192c-328e-4708-9b3c-9d716cd24a92",
"name": "vm_ha_group",
"anti_affinity_group": "True",
"max_instances_per_host": 1,
"max_impacted_members": 1,
"recovery_time": 15,
"resource_mitigation": True,
}

3
doc/source/install/index.rst

@ -10,8 +10,7 @@ fenix service installation guide
verify.rst
next-steps.rst
The fenix service provides in-service host maintenance for distributed
applications via communication with the VNFM during the maintenance procedure.
The fenix service (fenix) provides...
This chapter assumes a working setup of OpenStack following the
`OpenStack Installation Tutorial

30
doc/source/notification/notifications.rst

@ -119,12 +119,15 @@ payload
+-----------------+------------+------------------------------------------------------------------------+
| instance_ids | string | Link to Fenix maintenance session and project specific API to get |
| | | instance IDs related to current maintenance workflow 'state'. |
| | | A special case is with the 'state' 'INSTANCE_ACTION_DONE' that is for |
| | | a single instance only. In this case the single instance ID is |
| | | provided directly here. |
| | | A special case is with the 'state' 'INSTANCE_ACTION_DONE' where the |
| | | value is a single instance_id only. When using Telco workflow with |
| | | ETSI defined constraints value is also just a single instance_id in |
| | | the 'state' 'PREPARE_MAINTENANCE' and 'PLANNED_MAINTENANCE'. |
+-----------------+------------+------------------------------------------------------------------------+
| reply_url | string | Link to Fenix maintenance session and project specific API to send the |
| | | reply corresponding to this notification |
| | | reply corresponding to this notification. When using Telco workflow |
| | | with ETSI defined constraints reply URL is instance specific in the |
| | | the 'state' 'PREPARE_MAINTENANCE' and 'PLANNED_MAINTENANCE'. |
+-----------------+------------+------------------------------------------------------------------------+
| state | string | Maintenance workflow session state. Can have different values: |
| | | - 'MAINTENANCE' to tell project about a created maintenance session. |
@ -156,7 +159,7 @@ payload
| | | of hardware into use. |
+-----------------+------------+------------------------------------------------------------------------+
Example:
Example of notification for many instances:
.. code-block:: json
@ -173,6 +176,23 @@ Example:
"metadata": {"openstack_release": "Queens"}
}
Example of notification for single instances. Note the instance specific
'reply_url':
.. code-block:: json
{
"service": "fenix",
"allowed_actions": ["MIGRATE", "LIVE_MIGRATE", "OWN_ACTION"],
"instance_ids": ["28d226f3-8d06-444f-a3f1-c586d2e7cb39"],
"reply_url": "http://0.0.0.0:12347/v1/maintenance/76e55df8-1c51-11e8-9928-0242ac110002/ead0dbcaf3564cbbb04842e3e54960e3/28d226f3-8d06-444f-a3f1-c586d2e7cb39",
"state": "PREPARE_MAINTENANCE",
"session_id": "76e55df8-1c51-11e8-9928-0242ac110002",
"reply_at": "2018-02-28T06:40:16",
"actions_at": "2018-02-29T00:00:00",
"project_id": "ead0dbcaf3564cbbb04842e3e54960e3",
"metadata": {"openstack_release": "Queens"}
}
.. [1] http://docs.openstack.org/developer/oslo.messaging/notifier.html
.. [2] https://docs.openstack.org/aodh/latest/admin/telemetry-alarms.html#event-based-alarm

59
doc/source/specifications/ussuri-etsi-feat03.rst

@ -74,7 +74,7 @@ that time.
fenix -> app-manager [label = "MAINTENANCE"];
app-manager -> fenix [label = "ACK_MAINTENANCE"];
fenix --> app-manager [label = "IN_SCALE", note="Optional down scale"];
app-manager --> fenix [label = "Remove instance related constraints of scaled down instance"]
app-manager --> fenix [label = "Remove instance related constraints of scaled down instances. Update instance groups constraints to match scaling"]
app-manager --> fenix [label = "ACK_IN_SCALE"]
fenix --> app-manager [label = "PREPARE_MAINTENANCE", note="If there is not empty host Fenix makes one"]
app-manager --> fenix [label = "ACK_PREPARE_MAINTENANCE"]
@ -88,7 +88,7 @@ that time.
fenix --> app-manager [label = "MAINTENANCE_COMPLETE"]
=== --- ===
fenix --> app-manager [label = "MAINTENANCE_COMPLETE", note="Maintenance is done"]
app-manager --> fenix [label = "Add instance constraints of instances possibly added when scaling up when maintenance is completed"]
app-manager --> fenix [label = "Add instance constraints of instances possibly added when scaling up when maintenance is completed. Update instance groups constraints to match scaling"]
app-manager --> fenix [label = "ACK_MAINTENANCE_COMPLETE", note="Up scale"]
}
@ -108,12 +108,17 @@ instance group objects.
REST API impact
---------------
API PUT ``/v1/instance/{instance_id}`` is used to update instance object::
All APIs will have 200 OK as return. Error codes defined during implementation.
API PUT ``/v1/instance/{instance_id}`` is used to update instance object.
API GET ``/v1/instance/{instance_id}`` is used to get instance object.
``PUT`` API should have this structure as input and ``GET`` API as return::
{
"instance_id": "instance_UUId string",
"project_id": "Project UUID string",
"name": "Name string",
"group_id": "group_UUID string",
"instance_name": "Name string",
"max_interruption_time": 120, # seconds
# How long live migration can take
"migration_type": "LIVE_MIGRATION",
@ -128,7 +133,7 @@ API PUT ``/v1/instance/{instance_id}`` is used to update instance object::
# instance is removed. It must be True also if VNF needed to scale
# down, since we go over that scaled down capacity.
"lead_time": 60 # seconds
# How long time VNF needs for 'migration_type' operation. VNF needs to
# How long lead time VNF needs for 'migration_type' operation. VNF needs to
# report back to Fenix as soon as it is ready, but at least within this
# time. Reporting as fast as can is crucial for optimizing
# infrastructure upgrade/maintenance.
@ -142,7 +147,35 @@ object::
{
"group_id": "group_UUID string",
"project_id": "Project UUID string",
"name": "Name string",
"group_name": "Name string",
"anti_affinity_group": "True", # True or False
"max_instances_per_host": 2, # 1..N
# Describes how many instance can be on same host with
# anti_affinity_group: True
# Already exist in OpenStack as 'max_server_per_host', but might not
# exist in different clouds.
"max_impacted_members": 2, # 1..N
# Maximum amount of instances that can be impacted
# Note! This can be dynamic to VNF load
"recovery_time": 10, # seconds
# max_impacted_members needs to take into account counting previous
# action members before the recovery time passes
# Note! regardless anti_affinity
"resource_mitigation": "True", # True or False
# Instances in group needs double allocation when affected.
# This is true in migrations, but also if instance first scaled out and
# only then the old instance removed.
# It must be True also if VNF needed to scale down, since we go over
# that scaled down capacity.
}
API GET ``/v1/instance_group/{group_id}`` is used to get instance group.
compared to ``PUT`` this strcuture has also the ``instance_ids``::
{
"group_id": "group_UUID string",
"project_id": "Project UUID string",
"group_name": "Name string",
"anti_affinity_group": "True", # True or False
"max_instances_per_host": 2, # 1..N
# Describes how many instance can be on same host with
@ -162,11 +195,25 @@ object::
# only then the old instance removed.
# It must be True also if VNF needed to scale down, since we go over
# that scaled down capacity.
"instance_ids": [] # List of instances belonging to this group
}
API DELETE ``/v1/instance_group/{instance_id}`` is used to delete instance
group object.
New API is needed for project instance specific reply:
This API will not be used to reply to 'state' 'PREPARE_MAINTENANCE' and
'PLANNED_MAINTENANCE' notifications that will be instance specific.
PUT ``/v1/maintenance/<session_id>/<project_id>/<instance_id>``::
{
"instance_action": "MIGRATE",
"state": "ACK_PLANNED_MAINTENANCE"
}
Notifications impact
--------------------

97
doc/source/user/advanced_workflow.rst

@ -0,0 +1,97 @@
.. _advanced_workflow:
=======================
Fenix Advanced Workflow
=======================
Example advanced workflow is implemented as 'fenix/workflow/workflows/vnf.py'.
This workflow utilizes the ETSI defined instance and instance group constraints.
Later there needs to be a workflow also with NUMA and CPU pinning. That will
be very similar, but it will need more specific placement decisions which
mean scaling has to be for exact instances and moving operations have to be
calculated to have the same pinning obeyed.
Workflow states are similar to 'default' workflow, but there is some
differences addressed below.
The major difference is that VNFM is supposed to update VNF instance and
instance group constrains dynamically always to match VNF current state.
Constraints can be seen in API documentation as APIs are used to update the
constraints to Fenix DB. Constraints help Fenix workflow to optimize the
workflow as fast as it can, when it knows how many instances can be affected
and all other constraints that also makes sure there is zero impact to VNF
service.
States
======
MAINTENANCE
-----------
Difference to default workflow here is that by the time the maintenance is called
and we enter to this first state all VNFs affected needs to have instance and
instance group constraints updated to Fenix. A perfect VNFM side implementation
should always make sure the changes in VNF will be reflected here.
SCALE_IN
--------
As Fenix is now aware of all the constraints, it can optimize many things. One
is to scale exact instances as we know max_impacted_members for each instance
group, we can optimize how much we scale down to have optimal amount of empty
compute nodes while still have optimal amount of instances left as
max_impacted_members. Other thing here is when using NUMA and CPU pinning.
We definitely need to dictate the scaled down instances as we need
exact NUMA and CPUs free to be able to have empty compute host. Also when
making the move operations to pinned instances we know it will always succeed.
A special need might also be in edge could system, where there is very few
compute host available.
After Fenix workflow has made its math, it may suggest the instances to be
scaled. If VNFM reject this, retry can let VNFM decide how it scales down,
while it might not be optimal.
VNFM needs to update instance and instance group constraints after scaling.
PREPARE_MAINTENANCE
-------------------
After state 'SCALE_IN' the empty compute capacity can be scattered. Now workflow
need to make math of how to get empty compute nodes in the best possible way.
As we have all the constraints we can do operations parallel for different
compute nodes, VNFs and their instances in different instance groups.
Compared to default workflow 'maintenance.planned' notification is always for
single instance only.
START_MAINTENANCE
-----------------
Biggest enhancement here is that hosts can be handled parallel if feasible.
PLANNED_MAINTENANCE
-------------------
As we have all the constraints we can do operations parallel for different
compute nodes, VNFs and their instances in different instance groups.
Compared to default workflow 'maintenance.planned' notification is always for
single instance only.
MAINTENANCE_COMPLETE
--------------------
This is same as in default workflow, but VNFM needs to update instance and
instance group constraints after scaling.
MAINTENANCE_DONE
----------------
This will now make the maintenance session idle until infrastructure admin will
delete it.
MAINTENANCE_FAILED
------------------
This will now make the maintenance session idle until infrastructure admin will
fix and continue the session or delete it.

37
doc/source/user/architecture.rst

@ -102,6 +102,9 @@ removed.
High level sequence diagram
===========================
This is the original design idagram not utilizing the ETSI defined
instance and instance group constraints.
.. seqdiag::
seqdiag {
@ -126,3 +129,37 @@ High level sequence diagram
app-manager --> fenix [label = "ACK_MAINTENANCE_COMPLETE", note="Up scale"]
}
This advanced idagram utilizing the ETSI defined instance and instance group
constraints.
.. seqdiag::
seqdiag {
activation = none;
app-manager --> fenix [label = "Update instance and instance group constraints anytime and when created"]
=== --- ===
infra-admin -> fenix [label = "Maintenance session \n for hosts", note="Start the maintenance process"];
fenix -> app-manager [label = "MAINTENANCE"];
app-manager -> fenix [label = "ACK_MAINTENANCE"];
fenix --> app-manager [label = "IN_SCALE", note="Optional down scale"];
app-manager --> fenix [label = "Remove instance related constraints of scaled down instances. Update instance groups constraints to match scaling"]
app-manager --> fenix [label = "ACK_IN_SCALE"]
fenix --> app-manager [label = "PREPARE_MAINTENANCE", note="If there is not empty host Fenix makes one"]
app-manager --> fenix [label = "ACK_PREPARE_MAINTENANCE"]
fenix --> app-manager [label = "ADMIN_ACTION_DONE"]
=== Repeated for every compute ===
fenix -> app-manager [label = "PLANNED_MAINTENANCE", note="If VM-s are on the host. Migrate or Live migrate"]
app-manager -> fenix [label = "ACK_PLANNED_MAINTENANCE"]
fenix --> app-manager [label = "ADMIN_ACTION_DONE"]
fenix --> app-manager [label = "IN_MAINTENANCE"]
... Actual maintenance happens here ...
fenix --> app-manager [label = "MAINTENANCE_COMPLETE"]
=== --- ===
fenix --> app-manager [label = "MAINTENANCE_COMPLETE", note="Maintenance is done"]
app-manager --> fenix [label = "Add instance constraints of instances possibly added when scaling up when maintenance is completed. Update instance groups constraints to match scaling"]
app-manager --> fenix [label = "ACK_MAINTENANCE_COMPLETE", note="Up scale"]
}

1
doc/source/user/index.rst

@ -7,3 +7,4 @@ Users guide
architecture
baseworkflow
advanced_workflow
Loading…
Cancel
Save