12 KiB
ETSI NFVI software modification specification
https://storyboard.openstack.org/#!/story/2006557
Implement the needed interfacing between VNFM and Fenix that is specified in ETSI FEAT03 related documentation etsi. Limit current changes to instances and instance groups.
Problem description
This feature addresses the support for the coordination of the NFVI software modification process with the VNFs hosted on the NFVI in order to minimize impact on service availability.
Use Cases
Guarantee a zero impact to VNF service during Fenix infrastructure maintenance, upgrade and scaling workflow operation. This implies that VNF and VNFM supports the ETSI specification and Fenix interaction.
Proposed change
Implement APIs to set VNF specific instance and instance group variables.
New APIs are to have VNF project instance and instance group data changed in the Fenix database. These constraints might be set in VNFD or the VNF element manager can change these any time according to VNF current load level. Having the constraints gives the ability to optimize the infrastructure maintenance operation as we can scale down the VNFs as much as possible and therefore to able to maintain parallel as many compute nodes as possible. Instance grouping can be instances belonging to certain anti-affinity group, but all instances need to be grouped, so we know how many of those are at least needed and how many of those can be exposed to maintenance at the same time. If nothing else, group mean instance of a certain flavor.
Make an example workflow that supports the usage of these APIs. Workflow should implement one example rolling maintenance use case. Existing Fenix interaction towards VNFM will be utilized with small changes.
The variables common to instance and instance group can be overridden in the instance object. Both objects can be updated at any time. Update can be considered in any action that is not currently not ongoing. Existing timer would not be updated. These objects are not enough to optimize infrastructure workflow. The existing Fenix interaction is also needed to optimize the maintenance window as small as possible. Also this allows upgrading the VNF with new infrastructure capabilities and with no additional impact on VNF service availability if done at the same time as the infrastructure upgrade.
This diagram will illustrate the existing Fenix workflow where application manager updates instance and instance group constraints always when instances are created or deleted. Constraints can also be updated anytime if the level of VNF service will allow different amount of instances at that time.
- seqdiag {
-
activation = none; app-manager --> fenix [label = "Update instance and instance group constraints anytime and when created"] === --- === infra-admin -> fenix [label = "Maintenance session n for hosts", note="Start the maintenance process"]; fenix -> app-manager [label = "MAINTENANCE"]; app-manager -> fenix [label = "ACK_MAINTENANCE"]; fenix --> app-manager [label = "IN_SCALE", note="Optional down scale"]; app-manager --> fenix [label = "Remove instance related constraints of scaled down instances. Update instance groups constraints to match scaling"] app-manager --> fenix [label = "ACK_IN_SCALE"] fenix --> app-manager [label = "PREPARE_MAINTENANCE", note="If there is not empty host Fenix makes one"] app-manager --> fenix [label = "ACK_PREPARE_MAINTENANCE"] fenix --> app-manager [label = "ADMIN_ACTION_DONE"] === Repeated for every compute === fenix -> app-manager [label = "PLANNED_MAINTENANCE", note="If VM-s are on the host. Migrate or Live migrate"] app-manager -> fenix [label = "ACK_PLANNED_MAINTENANCE"] fenix --> app-manager [label = "ADMIN_ACTION_DONE"] fenix --> app-manager [label = "IN_MAINTENANCE"] ... Actual maintenance happens here ... fenix --> app-manager [label = "MAINTENANCE_COMPLETE"] === --- === fenix --> app-manager [label = "MAINTENANCE_COMPLETE", note="Maintenance is done"] app-manager --> fenix [label = "Add instance constraints of instances possibly added when scaling up when maintenance is completed. Update instance groups constraints to match scaling"] app-manager --> fenix [label = "ACK_MAINTENANCE_COMPLETE", note="Up scale"]
}
Alternatives
N/A
Data model impact
Fenix database will need to have new tables to support instance and instance group objects.
REST API impact
All APIs will have 200 OK as return. Error codes defined during implementation.
API PUT /v1/instance/{instance_id}
is used to update
instance object. API GET /v1/instance/{instance_id}
is used
to get instance object. PUT
API should have this structure
as input and GET
API as return:
{
"instance_id": "instance_UUId string",
"project_id": "Project UUID string",
"group_id": "group_UUID string",
"instance_name": "Name string",
"max_interruption_time": 120, # seconds
# How long live migration can take
"migration_type": "LIVE_MIGRATION",
# LIVE_MIGRATION, MIGRATION or OWN_ACTION
# Own action is create new and delete old instance.
# Note! VNF need to obey resource_mitigation with own action
# This affects to order of delete old and create new to not over
# commit the resources.
"resource_mitigation": "True", # True or False
# Current instance needs double allocation when being migrated.
# This is true also if instance first scaled out and only then the old
# instance is removed. It must be True also if VNF needed to scale
# down, since we go over that scaled down capacity.
"lead_time": 60 # seconds
# How long lead time VNF needs for 'migration_type' operation. VNF needs to
# report back to Fenix as soon as it is ready, but at least within this
# time. Reporting as fast as can is crucial for optimizing
# infrastructure upgrade/maintenance.
}
API DELETE /v1/instance/{instance_id}
is used to delete
instance object.
API PUT /v1/instance_group/{group_id}
is used to update
instance group object:
{
"group_id": "group_UUID string",
"project_id": "Project UUID string",
"group_name": "Name string",
"anti_affinity_group": "True", # True or False
"max_instances_per_host": 2, # 1..N
# Describes how many instance can be on same host with
# anti_affinity_group: True
# Already exist in OpenStack as 'max_server_per_host', but might not
# exist in different clouds.
"max_impacted_members": 2, # 1..N
# Maximum amount of instances that can be impacted
# Note! This can be dynamic to VNF load
"recovery_time": 10, # seconds
# max_impacted_members needs to take into account counting previous
# action members before the recovery time passes
# Note! regardless anti_affinity
"resource_mitigation": "True", # True or False
# Instances in group needs double allocation when affected.
# This is true in migrations, but also if instance first scaled out and
# only then the old instance removed.
# It must be True also if VNF needed to scale down, since we go over
# that scaled down capacity.
}
API GET /v1/instance_group/{group_id}
is used to get
instance group. compared to PUT
this strcuture has also the
instance_ids
:
{
"group_id": "group_UUID string",
"project_id": "Project UUID string",
"group_name": "Name string",
"anti_affinity_group": "True", # True or False
"max_instances_per_host": 2, # 1..N
# Describes how many instance can be on same host with
# anti_affinity_group: True
# Already exist in OpenStack as 'max_server_per_host', but might not
# exist in different clouds.
"max_impacted_members": 2, # 1..N
# Maximum amount of instances that can be impacted
# Note! This can be dynamic to VNF load
"recovery_time": 10, # seconds
# max_impacted_members needs to take into account counting previous
# action members before the recovery time passes
# Note! regardless anti_affinity
"resource_mitigation": "True", # True or False
# Instances in group needs double allocation when affected.
# This is true in migrations, but also if instance first scaled out and
# only then the old instance removed.
# It must be True also if VNF needed to scale down, since we go over
# that scaled down capacity.
"instance_ids": [] # List of instances belonging to this group
}
API DELETE /v1/instance_group/{instance_id}
is used to
delete instance group object.
New API is needed for project instance specific reply:
This API will not be used to reply to 'state' 'PREPARE_MAINTENANCE' and 'PLANNED_MAINTENANCE' notifications that will be instance specific.
PUT
/v1/maintenance/<session_id>/<project_id>/<instance_id>
:
{
"instance_action": "MIGRATE",
"state": "ACK_PLANNED_MAINTENANCE"
}
Notifications impact
Event type maintenance.planned
notification will need
changes.
New state
value INSTANCE_ACTION_FALLBACK
should be added to tell live migration was not possible and Fenix will
force the migration to complete. After that the normal
INSTANCE_ACTION_DONE
or INSTANCE_ACTION_FAILED
will be expected.
instance_ids
is currently limited to either single
instance_id
or a link to get all affected instances. Now
this should be always a single instance, but in state
value
of MAINTENANCE
or SCALE_IN
.
MAINTENANCE
should always have the link to Fenix API to get
all instances that may be affected during the maintenance session.
SCALE_IN
can mention only one exact instance as it maybe be
needed to allow other pinned instance to have a target host with needed
resources. This can happen in small edge deployment. Empty string
indicates VNF can decide how it scales down. Workflow may then need to
have several SCALE_IN
notifications to finally have enough
unused resources to execute workflow further. state
having
value MAINTENANCE_COMPLETE
should have empty string as
instance_ids
value. In this state
VNF should
scale back to instances it had in the beginning of the maintenance
session.
Other end user impact
VNFD and EM needs to support defining and updating instance and instance group variables
Other deployer impact
VNFM needs to proxy updating instance and instance group variables
Implementation
Assignee(s)
- Primary assignee:
-
Tomi Juvonen <tomi.juvonen@nokia.com>
Work Items
- APIs to set instance and instance group objects
- Example workflow
- Testing
- Documentation changes
Dependencies
There can be enhancements later on to other projects. Anyhow initially needed functionality can be handled completely inside Fenix.
Testing
There is huge amount of combinations of VNF deployments and used variables can be changed during the operations. Fenix will support all there variables and their changes. Fenix workflow is always an example and limits to what it can support and is tested against. The main thing to test is that all variables and their changes are supported and validated. The testing of VNF deployment might be limited to example use case supported by example workflow.
Documentation Impact
Fenix documentation needs to be updated after the implementation is ready.