This spec provides a sample implementation of Fault Management Interface, including data model conversion related to Prometheus monitoring data, based on the realization of AutoHeal of Master/Worker node in Kubernetes Cluster. Implements: blueprint prometheus-monitoring Change-Id: Ia18bc8361a441852ef20ee5e9fc32027b59376e1
53 KiB
Prometheus monitoring and AutoHeal support for Kubernetes Cluster VNF via FM Interface
https://blueprints.launchpad.net/tacker/+spec/prometheus-monitoring
Problem description
It is possible to heal a Master node or Worker node which is a component of a deployed Kubernetes Cluster as a VNF instance with VNF Lifecycle Management interface defined in ETSI NFV-SOL003 v2.6.11 as described in spec "MgmtDriver with Kubernetes cluster management OSS" 2.
The Fault Management Interface is specified in ETSI NFV-SOL 003 v3.3.13, and the architecture is such that NFVO takes the initiative in making decisions on the implementation of operations such as heal based on failure information.
On the other hand, in the existing spec "Alert server for Prometheus with Kubernetes cluster VNF_sample" 4, it is targeted that the pod or Kubernetes Cluster deployed as VNF is automatically scaled or healed in VNF by the autonomous monitoring mechanism with prometheus.
This spec provides a sample implementation of Fault Management Interface, including data model conversion related to Prometheus monitoring data, based on the realization of AutoHeal of Master/Worker node in Kubernetes Cluster.
The Fault Management Interface considers Polling mode and Notify mode monitoring methods. The former is a method in which NFVO periodically inquires about monitoring information from VNFM, and the latter is a method in which VNFM notifies NFVO in the Subscribe/Notify subscription model. In either case, it is necessary that the monitoring information conforming to the data model specified in SOL0035 is managed on the Tacker DB.
Note
- The Kubernetes Cluster VNF in this spec is based on this spec 6. The Kubernetes Cluster VNF is a single Master cluster, so it currently does not support heal operations on the Master node. It may be supported after HA is supported in the future, so in this spec, Master/Worker nodes are all included in Prometheus' monitoring targets for the future HA support.
- If there are no special instructions, the Client described in this spec default to NFVO, and VNFM default to Tacker.
Proposed change
The following changes are needed:
Add support for Fault Management interface specified in SOL0037
- Add VNF Fault Management interface:
- Query Alarm(s)
- GET /vnffm/v1/alarms to query all alarms.
- GET /vnffm/v1/alarms/{alarmId} to query the specified alarm.
- Acknowledge Alarm
- PATCH /vnffm/v1/alarms/{alarmId} to change target Alarm to confirmed.
- Subscribe
- POST /vnffm/v1/subscriptions to create a new subscription.
- Query Subscription(s)
- GET /vnffm/v1/subscriptions to query all subscription.
- GET /vnffm/v1/subscriptions/{subscriptionId} to query the specified subscription.
- Delete Subscription
- DELETE /vnffm/v1/subscriptions/{subscriptionId} to delete the specified subscription.
- Query Alarm(s)
- Send request to Client
- POST <Client URI from subscriptions> to notify Client that Tacker received an alarm.
- GET <Client URI from subscriptions> to confirm that the URI of Client is correct.
- Add VNF Fault Management interface:
Add support for Modify Restful API for communications between Tacker and Alertmanager
- Modify POST /alert/vnf_instances/{vnf_instance_id} to receive the alert sent from Alertmanager.
Create new DB tables
- Convert from Prometheus specific data model to SOL0038 compliant data model.
- Store the subscription of FM in the DB.
MgmtDriver changes
- instantiate_end
- Prometheus is included in Kubernetes Cluster VNF
- Deploy Prometheus.
- Set up Prometheus.
- Send monitor/alert rule files to Prometheus.
- Add the hostname corresponding to each VM in the
InstantiatedVnfInfo.vnfcResourceInfo.metadata
field.
- Prometheus is not included in Kubernetes Cluster VNF
- Set up Prometheus.
- Send monitor/alert rule files to Prometheus.
- Add the hostname corresponding to each VM in the
InstantiatedVnfInfo.vnfcResourceInfo.metadata
field.
- Prometheus is included in Kubernetes Cluster VNF
- terminate_end
- Prometheus is not included in Kubernetes Cluster VNF
- Delete monitor/alert rules in Prometheus.
- Prometheus is not included in Kubernetes Cluster VNF
- scale_start
- The following are added for scale-in operation:
- Remove monitoring target of Worker nodes.
- Delete the hostname corresponding to the VM in the
InstantiatedVnfInfo.vnfcResourceInfo.metadata
field.
- The following are added for scale-in operation:
- scale_end
- The following are added for scale-out operation:
- Set up Prometheus in new Worker nodes.
- Add new monitoring target of Worker nodes.
- Add the hostname corresponding to the VM in the
InstantiatedVnfInfo.vnfcResourceInfo.metadata
field.
- The following are added for scale-out operation:
- heal_start
- Heal of the entire VNF
- Same change as terminate_end.
- Heal of the entire VNF
- heal_end
- Heal of the entire VNF
- Same change as instantiate_end.
- Heal of the entire VNF
Note
When heal specifies with VNFC, the IP address and hostname of VNFC will not change, and the settings of Prometheus and the hostname saved in the DB do not need to be updated, so heal_start and heal_end do not need to be changed.
- instantiate_end
Tacker Alert Server
The Prometheus Plugin is an optional feature. Tacker will decide whether to enable it according to the content of the configuration file. The detail of configuration file is described in 9.
Instantiate Kubernetes cluster VNF
This section describes how to instantiate a Kubernetes cluster with Prometheus server for AutoHeal.
The process is almost the same with "Instantiate Kubernetes cluster VNF" chapter of the spec 10.
There are three differences:
- One is that for
InstantiatedVnfInfo.vnfcResourceInfo
.- In
instantiate_end
, add the hostname corresponding to each VM in the metadata field. Additional example:"metadata": { "hostname": "worker193" }
- In
- This monitoring does not need to deploy
metrics-server
. - The other is the addition of new Alert rule.
- The following two examples show the additional Alert rule, nothing else changes.
Sample individual Alert rule config:
alert: KubeNodeNotReady
expr: kube_node_status_condition{job="prometheus",
condition="Ready",status="true"} == 0
for: 15m
labels:
receiver_type: tacker
function_type: vnffm
vnf_instance_id: c61314d0-f583-4ab3-a457-46426bce02d3
perceived_severity: WARNING
event_type: EQUIPMENT_ALARM
annotations:
probable_cause: The server cannot be connected.
(Optional) Sample individual Alert rule ConfigMap:
kind: ConfigMap
apiVersion: v1
metadata:
name: tacker-samplevnf-rules
data:
tacker-samplevnf-rules.yaml: |
groups:
- name: kubernetes-apps
rules:
- alert: KubeNodeNotReady
annotations:
probable_cause: The server cannot be connected.
expr: |
kube_node_status_condition{
job="prometheus",condition="Ready",status="true"} == 0
for: 15m
labels:
receiver_type: tacker
function_type: vnffm
vnf_instance_id: c61314d0-f583-4ab3-a457-46426bce02d3
perceived_severity: WARNING event_type: EQUIPMENT_ALARM
AutoHeal Worker nodes on alert trigger
When the Prometheus server detects that the Worker node of Kubernetes Cluster has failed, it will send alert messages to Tacker. After Tacker receives the alert, it will convert the alert to alarm and store it in the DB. At this time, according to ETSI NFV-SOL 003 v3.3.111 7. VNF Fault Management interface, it supports two AutoHeal modes, Polling and Notify.
Design of heal operation in Polling mode
The following is a schematic diagram of heal in Polling mode:
+------------------------+
| |
| Client (NFVO etc.) |
| |
+--+--------^+-----------+
6. Heal | || 5. Query Alarms and return result
+------|--------||------------------------------------------------------------+
| | || VNFM |
| +---|--------||---------+ +---------------------------------+ |
| | | || Tacker | | Tacker | |
| | | || Server | | Conductor | |
| | | +----+v------+ | | +------------+ | +--------+ |
| | | | VnfFm +------------------> VnfFm +----------> Tacker | |
| | | | Controller | | | +----------> Driver +----------> DB | |
| | | +------------+ | | | +------------+ | +--------+ |
+----------------+ | | | | | | 4. Convert alert to alarm | |
| | 3. POST | | | | | | | |
| Alertmanager | alert | | | +------------+ | | | +------------+ | |
| +---------------------> Prometheus +-------+ +-----> VmfLcm | | |
+--+-------------+ | | | | Plugin | | | | | Driver +--+ | |
^ | | | +------------+ | | | +---------+--+ | | |
| 2. Trigger alert | | | | | | | | | |
| | | | | | | | | | |
| | | | +------------+ | | | +---------v--+ | | |
| | | +---> Vnflcm +------------+ | MgmtDriver | | | |
| | | | Controller | | | | | | | |
| | | +------------+ | | +-+----------+ | | |
+--+-------------+ | | | | | | | |
| | | | | | | +----------v-+ | |
| Prometheus | | | | | | | OpenStack | | |
| Server | | | | | | | Infra | | |
| | | | | | | | Driver | | |
+--+-------------+ | | | | | +----+-------+ | |
| | | | | | | | |
| | +-----------------------+ +---------------|-------|---------+ |
| 1. Pull metrics +--------------------------------------------|-------|------------------------+
| | |
| +--------------------------------------------|-------|------------+
| | Kubernetes Cluster VNF | | |
| | +---------------+---------+-------+ |
| | 7. Delete failed | | 8. Create new |
| | Worker node | | Worker node |
| | +--------v----+ +------v------+ +-------------+ |
| | | +--------+ | | +--------+ | | +--------+ | |
+----------------------------------> | Worker | | | | Worker | | | | Master | | |
| | +--------+ | | +--------+ | | +--------+ | |
| | VM | | VM | | VM | |
| +-------------+ +-------------+ +-------------+ |
+-----------------------------------------------------------------+
+-----------------------------------------------------------------+
| Hardware Resources |
+-----------------------------------------------------------------+
- Prometheus server collects metrics from Kubernetes Cluster.
- Alertmanager receives alert sent from Prometheus Server.
- Alertmanager sends POST request to Tacker with specified URL.
- Tacker receives informed alert, converts it to alarm, and saves it to DB.
- The Client sends a request at regular intervals to query the alarm in the Tacker. Tacker searches DB with the query condition specified by the Client, and returns the alarm that matches the condition to the Client.
- The Client recognizes the failure of the Worker node from the alarm and sends a heal request of the failure node to the Tacker.
- Heal operation is triggered, old Worker node is deleted.
- New Worker node is created.
Note
The URL that Alertmanager sends to Tacker's Prometheus Plugin follows the spec12.
Request parameters for operation in Polling mode
The detail of API is described at REST API impact.
Sequence for operation in Polling mode
The following describes the processing flow of the Tacker after the Alertmanager sends the alert.
- seqdiag {
-
node_width = 100; edge_length = 150;
"Alertmanager" "Prometheus-Plugin" "VnfFmDriver" "Tacker DB"
- "Alertmanager" -> "Prometheus-Plugin"
-
[label = "1. Send alert to the specified URL"];
- "Prometheus-Plugin" -> "Prometheus-Plugin"
-
[label = "2. Check items of prometheus_plugin from cfg.CONF.tacker", note = "If prometheus_plugin is False, asynchronous task is over"];
- "Prometheus-Plugin" -> "Prometheus-Plugin"
-
[label = "3. Determine whether the alert is AutoHeal or AutoScale", note = "If it is scale, refer to the processing flow of AutoScale"];
- "Prometheus-Plugin" -> "Tacker DB"
-
[label = "4. Find the corresponding ComputeResource from the DB"];
- "Prometheus-Plugin" <-- "Tacker DB"
-
[label = "InstantiatedVnfInfo.vnfcResourceInfo.computeResource"];
- "Prometheus-Plugin" -> "Prometheus-Plugin"
-
[label = "5. Convert received alert to alarm"];
- "Prometheus-Plugin" -> "VnfFmDriver"
-
[label = "6. Execute VnfFmDriver"];
- "VnfFmDriver" -> "Tacker DB"
-
[label = "7. Save alarm to DB"];
"VnfFmDriver" <-- "Tacker DB" "Prometheus-Plugin" <-- "VnfFmDriver"
}
Alertmanager receives alert sent from Prometheus Server and inform the alert to specified URL(Tacker Server).
Note
Alert management of Prometheus format is defined in Prometheus Server.
Prometheus Plugin obtains values from cfg.CONF.tacker.prometheus_plugin to determine whether to enable this function.
Prometheus Plugin judges what processing to perform according to the function_type field of the labels in the alert.
- When the
function_type
field oflabels
isvnffm
, AutoHeal processing is performed. - When the
function_type
field oflabels
is notvnffm
, AutoScale processing is performed.
- When the
Prometheus Plugin finds the corresponding computeResource in
InstantiatedVnfInfo.vnfcResourceInfo
according to the value of the node label in the alert.Prometheus Plugin converts the alert to an alarm.
Prometheus Plugin calls VnfFmDriver and sends the alarm to it.
VnfFmDriver saves the alarm in the DB.
The following describes the Client's processing flow for Tacker using Polling mode to AutoHeal.
- seqdiag {
-
node_width = 100; edge_length = 150;
"Client" "Tacker-server" "Tacker-conductor" "VnfFmDriver" "Tacker DB"
- "Client" -> "Tacker-server"
-
[label = "8. Query alarms"];
- "Tacker-server" -> "Tacker-conductor"
-
[label = "9. Trigger synchronization task"];
- "Tacker-conductor" -> "VnfFmDriver"
-
[label = "10. Query alarms"];
- "VnfFmDriver" -> "Tacker DB"
-
[label = "11. Get alarms from DB according to conditions"];
- "VnfFmDriver" <-- "Tacker DB"
-
[label = "Alarms"];
- "Tacker-conductor" <-- "VnfFmDriver"
-
[label = "Alarms"];
- "Tacker-server" <-- "Tacker-conductor"
-
[label = "Alarms"];
- "Client" <-- "Tacker-server"
-
[label = "Alarms"];
- "Client" -> "Client"
-
[label = "12. Get VNFC information from alarm", note = "If no alarm is returned, the processing is over"];
- "Client" -> "Tacker-server"
-
[label = "13. Heal specified vnfc"];
- "Tacker-server" -> "Tacker-conductor"
-
[label = "14. Trigger asynchronous task", note = "The same with the default heal operation"];
- "Client" <-- "Tacker-server"
-
[label = "Response 202 Accepted"];
}
- The Client sends a request to the Tacker to query the alarms of the specified conditions.
- The query alarms are processed synchronously.
- Tacker-conductor calls VnfFmDriver to query the alarm.
- VnfFmDriver filters out the alarms that meet the conditions according to the conditions in the request, and returns the result.
- After the Client obtains the VNFC information from the alarm, it sends a request to the Tacker to heal the VNFC.
- From this step, it is completely the same with the default heal
operation.
- The only difference is that for
InstantiatedVnfInfo.vnfcResourceInfo
, inheal_end
, update the hostname of the VM in the metadata field.
- The only difference is that for
Design of heal operation in Notify mode
The following is a schematic diagram of heal in Notify mode:
+------------------------+
| | 6. Send notify alarm
| Client (NFVO etc.) <-------------------+
| | |
+--+---------+-----------+ |
7. Heal | | 1. Create subscription |
+------|---------|-------------------------------|----------------------------+
| | | | VNFM |
| +---|---------|---------+ +-------------------|-------------+ |
| | | | Tacker | | | Tacker | |
| | | | Server | | | Conductor | |
| | | +-----v------+ | | +-----+------+ | +--------+ |
| | | | VnfFm +------------------> VnfFm +----------> Tacker | |
| | | | Controller | | | +----------> Driver +----------> DB | |
| | | +------------+ | | | +------------+ | +--------+ |
+----------------+ | | | | | | 5. Convert alert to alarm | |
| | 4. POST | | | | | | | |
| Alertmanager | alert | | | +------------+ | | | +------------+ | |
| +---------------------> Prometheus +-------+ +-----> VmfLcm | | |
+--+-------------+ | | | | Plugin | | | | | Driver +--+ | |
^ | | | +------------+ | | | +---------+--+ | | |
| 3. Trigger alert | | | | | | | | | |
| | | | | | | | | | |
| | | | +------------+ | | | +---------v--+ | | |
| | | +---> Vnflcm +------------+ | MgmtDriver | | | |
| | | | Controller | | | | | | | |
| | | +------------+ | | +-+----------+ | | |
+--+-------------+ | | | | | | | |
| | | | | | | +----------v-+ | |
| Prometheus | | | | | | | OpenStack | | |
| Server | | | | | | | Infra | | |
| | | | | | | | Driver | | |
+--+-------------+ | | | | | +----+-------+ | |
| | | | | | | | |
| | +-----------------------+ +---------------|-------|---------+ |
| 2. Pull metrics +--------------------------------------------|-------|----------------------- +
| | |
| +--------------------------------------------|-------|------------+
| | Kubernetes Cluster VNF | | |
| | +---------------+---------+-------+ |
| | 8. Delete failed | | 9. Create new |
| | Worker node | | Worker node |
| | +--------v----+ +------v--- --+ +-------------+ |
| | | +--------+ | | +--------+ | | +--------+ | |
+----------------------------------> | Worker | | | | Worker | | | | Master | | |
| | +--------+ | | +--------+ | | +--------+ | |
| | VM | | VM | | VM | |
| +-------------+ +-------------+ +-------------+ |
+-----------------------------------------------------------------+
+-----------------------------------------------------------------+
| Hardware Resources |
+-----------------------------------------------------------------+
1.The Client sends a request to the Tacker to create a subscription.
Note
During the create subscription, Tacker sends a Test Notification request to the Client's Callback url. The specific processing of the request is performed by the Client. By default, it is normal for the Client to return HTTP 204 No Content.
2-5.The processing is the same as 1-4 of the Polling mode.
6.VnfFmDriver finds all subscriptions in the DB and matches the alerts to them. If there is a subscription that can match successfully, the alarm is sent to the specified path of the Client. If the match is not successful, the processing ends.
7-9.The processing is the same as 6-8 of the Polling mode.
Request parameters for operation in Notify mode
The detail of API is described at REST API impact.
Sequence for operation in Notify mode
The following describes the Client's processing flow for Tacker using Notify mode to AutoHeal.
- seqdiag {
-
node_width = 90; edge_length = 100;
"Client" "Alertmanager" "Prometheus-Plugin" "Tacker-server" "Tacker-conductor" "VnfFmDriver" "Tacker DB"
- "Client" -> "Tacker-server"
-
[label = "1. Create subscription"];
- "Tacker-server" -> "Tacker-conductor"
-
[label = "Trigger synchronization task"];
- "Tacker-conductor" -> "VnfFmDriver"
-
[label = "execute VnfFmDriver"];
- "VnfFmDriver" -> "VnfFmDriver"
-
[label = "Get the callback_uri in the subscription"];
- "VnfFmDriver" -> "Client"
-
[label = "Send a GET request to the callback_uri in the Client."];
- "VnfFmDriver" <-- "Client"
-
[label = "Response 204 No Content"];
- "VnfFmDriver" -> "Tacker DB"
-
[label = "Save subscription to DB"];
"VnfFmDriver" <-- "Tacker DB" "Tacker-conductor" <-- "VnfFmDriver" "Tacker-server" <-- "Tacker-conductor" "Client" <-- "Tacker-server" [label = "Response 201 Created"]; "Alertmanager" -> "Prometheus-Plugin" [label = "2. Send alert to the specified URL"]; "Prometheus-Plugin" -> "Prometheus-Plugin" [label = "3. Check items of prometheus_plugin from cfg.CONF.tacker", note = "If prometheus_plugin is False, asynchronous task is over"]; "Prometheus-Plugin" -> "Prometheus-Plugin" [label = "4. Determine whether the alert is AutoHeal or AutoScale", note = "If it is scale, refer to the processing flow of AutoScale"]; "Prometheus-Plugin" -> "Tacker DB" [label = "5. Find the corresponding ComputeResource from the DB"]; "Prometheus-Plugin" <-- "Tacker DB" [label = "InstantiatedVnfInfo.vnfcResourceInfo.computeResource"]; "Prometheus-Plugin" -> "Prometheus-Plugin" [label = "6. Convert received alert to alarm"]; "Prometheus-Plugin" -> "VnfFmDriver" [label = "7. execute VnfFmDriver"]; "VnfFmDriver" -> "Tacker DB" [label = "8. Save alarm to DB"]; "VnfFmDriver" <-- "Tacker DB" "VnfFmDriver" -> "Tacker DB" [label = "9. Get subscriptions from DB"]; "VnfFmDriver" <-- "Tacker DB" "VnfFmDriver" -> "VnfFmDriver" [label = "10. Determine whether the alarm matches the subscriptions.", note = "If it does not match, the processing ends"]; "VnfFmDriver" -> "Client" [label = "11. Send a Notify Alarm request to the Client"]; "VnfFmDriver" <-- "Client" [label = "Response 204 No Content"]; "Prometheus-Plugin" <-- "VnfFmDriver" "Client" -> "Client" [label = "12. Get VNFC information from alarm", note = "If no alarm is returned, the processing is over"]; "Client" -> "Tacker-server" [label = "13. Heal specified vnfc"]; "Tacker-server" -> "Tacker-conductor" [label = "14. Trigger asynchronous task", note = "The same with the default heal operation"]; "Client" <-- "Tacker-server" [label = "Response 202 Accepted"];
}
- 1.The Client sends a request to create a subscription to the Tacker. After Tacker receives the subscription, it will get the callback_uri in it. In order to verify the correctness of the callback_uri, VnfFmDriver sends a request to the callback_uri address of Client. After getting the normal response HTTP 204 No Content from the Client, the Tacker will save the subscription to the DB.
- 2-8.The processing is the same as 1-7 of the Polling method.
- 9.VnfFmDriver gets all the subscriptions in the DB.
- 10.VnfFmDriver judges whether the alarm can be matched with subscriptions, if it does not match, the processing ends.
- 11.If the match is successful, VnfFmDriver sends a Notify Alarm request to the Client's callback_uri address. After the Client receives the request and processes it, it returns HTTP 204 No Content by default.
- 12-14.The processing is the same as 12-14 of the Polling method.
Scale For Kubernetes cluster
The process is almost the same with "Scale-out/in Worker nodes of Kubernetes cluster VNF" chapter of the spec13.
There is only one difference:
- For the processing of
InstantiatedVnfInfo.vnfcResourceInfo
.- When scale in, in
scale_start
, delete the hostname corresponding to the VM in the metadata field. - When scale out, in
scale_end
, add the hostname corresponding to the VM in the metadata field.
- When scale in, in
Terminate For Kubernetes cluster
The process is almost the same with "Terminate For Kubernetes cluster" chapter of the spec 14.
Alternatives
None
Data model impact
Add below new db table in 'Tacker' database.
vnf_fm_alarms
`id` varchar(36) Pri
`managed_object_id` varchar(36) NOT NULL
`faulty_resource` JSON NULL
`faulty_resource_type` varchar(32) NULL
`alarm_raised_time` datetime NOT NULL
`alarm_changed_time` datetime NULL
`alarm_cleared_time` datetime NULL
`alarm_acknowledged_time` datetime NULL
`ack_state` varchar(32) NOT NULL
`perceived_severity` varchar(32) NOT NULL
`event_time` datetime NOT NULL
`event_type` varchar(32) NOT NULL
`fault_type` varchar(255) NULL
`probable_cause` varchar(255) NOT NULL
`is_root_cause` boolean NOT NULL
`correlated_alarm_ids` text NULL
`fault_details` text NULL
`_links` JSON NOT NULL
This table will have `id` as primary key.
`managed_object_id` will be foreign
key of `vnf_instances`.`id`.
vnf_fm_subscriptions
`id` varchar(36) Pri
`fm_notifications_filter` JSON NULL
`callback_uri` varchar(255) NOT NULL
`_links` JSON NOT NULL
`subscription_authentication` JSON NULL
This table will have `id` as primary key.
REST API impact
The following Restful APIs are in compliance with ETSI NFV-SOL 003 v3.3.115 7.VNF Fault Management interface.
Add new API - GET /vnffm/v1/alarms
Query all alarms in the Tacker.
Allow users to filter out alarms based on : id, managedObjectId, rootCauseFaultyResource/faultyResourceType, eventType, perceivedSeverity, probableCause query parameter in the request.
For example, below URL query parameter will matching alarms with perceivedSeverity=WARNING
GET /vnffm/v1/alarms?filter=(eq,perceivedSeverity,WARNING)
Method type: GET
Normal http response code : 200 OK
Expected error http response codes
400 Bad Request: Shall be returned when the attribute-based filter expression is invalid or there are too many alarms in the query, the response is too big. 401 NotAuthorized: Shall be returned when authentication fails. 403 Forbidden: Shall be returned when user is not authorized to call this REST API. 404 NotFound: Shall be returned when the URL is wrong. 405 Method Not Allowed: Shall be returned when the HTTP method is not supported. 500 Internal Server Error: Shall be returned when an internal error occurs.
Request - There is no request body.
Response - Return
Alarms
parameter as shown below[ { "id": "78a39661-60a8-4824-b989-88c1b0c3534a", "managedObjectId": "c61314d0-f583-4ab3-a457-46426bce02d3", "rootCauseFaultyResource": { "faultyResource": { "vimConnectionId": "0d57e928-86a4-4445-a4bd-1634edae73f3", "resourceId": "4e6ccbe1-38ec-4b1b-a278-64de09ba01b3", "vimLevelResourceType": "OS::Nova::Server" }, "faultyResourceType": "COMPUTE" }, "alarmRaisedTime": "2021-09-03 10:21:03", "alarmChangedTime": "", "alarmClearedTime": "", "alarmAcknowledgedTime": "", "ackState": "UNACKNOWLEDGED", "perceivedSeverity": "WARNING", "eventTime": "2021-09-03 10:06:03", "eventType": "EQUIPMENT_ALARM", "faultType": "", "probableCause": "The server cannot be connected.", "isRootCause": "false", "correlatedAlarmIds": [], "faultDetails": [], "_links": { "self": "/vnffm/v1/alarms/78a39661-60a8-4824-b989-88c1b0c3534a", "objectInstance": "" } } ]
Add new API - GET /vnffm/v1/alarms/{alarmId}
Query the alarm specified in the Tacker.
Method type: GET
Normal http response code : 200 OK
Expected error http response codes
401 NotAuthorized: Shall be returned when authentication fails. 403 Forbidden: Shall be returned when user is not authorized to call this REST API. 404 NotFound: Shall be returned when the alarmId does not exist. 405 Method Not Allowed: Shall be returned when the HTTP method is not supported. 500 Internal Server Error: Shall be returned when an internal error occurs.
Request - There is no request body.
Response - Return
Alarm
parameter as shown below{ "id": "78a39661-60a8-4824-b989-88c1b0c3534a", "managedObjectId": "c61314d0-f583-4ab3-a457-46426bce02d3", "rootCauseFaultyResource": { "faultyResource": { "vimConnectionId": "0d57e928-86a4-4445-a4bd-1634edae73f3", "resourceId": "4e6ccbe1-38ec-4b1b-a278-64de09ba01b3", "vimLevelResourceType": "OS::Nova::Server" }, "faultyResourceType": "COMPUTE" }, "alarmRaisedTime": "2021-09-03 10:21:03", "alarmChangedTime": "", "alarmClearedTime": "", "alarmAcknowledgedTime": "", "ackState": "UNACKNOWLEDGED", "perceivedSeverity": "WARNING", "eventTime": "2021-09-03 10:06:03", "eventType": "EQUIPMENT_ALARM", "faultType": "", "probableCause": "The server cannot be connected.", "isRootCause": "false", "correlatedAlarmIds": [], "faultDetails": [], "_links": { "self": "/vnffm/v1/alarms/78a39661-60a8-4824-b989-88c1b0c3534a", "objectInstance": "" } }
Add new API - PATCH /vnffm/v1/alarms/{alarmId}
Modify the confirmation status of the alarm specified in the Tacker.
Method type: PATCH
Content-Type: application/mergepatch+json
Normal http response code : 200 OK
Expected error http response codes
401 NotAuthorized: Shall be returned when authentication fails. 403 Forbidden: Shall be returned when user is not authorized to call this REST API. 404 NotFound: Shall be returned when the alarmId does not exist. 405 Method Not Allowed: Shall be returned when the HTTP method is not supported. 409 Conflict: Shall be returned when the ackState of alarm is already the value to be set. 412 Precondition failed: Shall be returned when the ETag in the http header does not match. 500 Internal Server Error: Shall be returned when an internal error occurs.
Request - Receive
AlarmModifications
parameter as shown below{ "ackState": "ACKNOWLEDGED" }
Response - Return
AlarmModifications
parameter as shown below{ "ackState": "ACKNOWLEDGED" }
Add new API - POST /vnffm/v1/subscriptions
Create a new subscription in the Tacker.
Method type: POST
Normal http response code : 201 Created
Expected error http response codes
303 See Other: Shall be returned when the same callback URI and the same filter already exists. 401 NotAuthorized: Shall be returned when authentication fails. 403 Forbidden: Shall be returned when user is not authorized to call this REST API. 404 NotFound: Shall be returned when the URL is wrong. 405 Method Not Allowed: Shall be returned when the HTTP method is not supported. 500 Internal Server Error: Shall be returned when an internal error occurs.
Request - Receive
FmSubscriptionRequest
parameter as shown below{ "filter": { "vnfInstanceSubscriptionFilter": { "vnfdIds": [ "b1db0ce7-ebca-1fb7-95ed-4840d70a9923" ], "vnfProductsFromProviders": [], "vnfInstanceIds": [ "02e46e91-2722-4f2d-af91-313f5981a199" ], "vnfInstanceNames": [] }, "notificationTypes": [ "AlarmNotification" ], "faultyResourceTypes": [ "COMPUTE" ], "perceivedSeverities": [ "WARNING" ], "eventTypes": [ "EQUIPMENT_ALARM" ], "probableCauses": [ "The server cannot be connected." ] }, "callbackUri": "/nfvo/notify/alarm", "authentication": { "authType": [ "BASIC" ], "paramsBasic": { "userName": "nfvo", "password": "nfvopwd" }, "paramsOauth2ClientCredentials": {} } }
Response - Return
FmSubscription
parameter as shown below{ "id": "fb782658-af96-47e7-9faa-90ba8416e426", "filter": { "vnfInstanceSubscriptionFilter": { "vnfdIds": [ "b1db0ce7-ebca-1fb7-95ed-4840d70a9923" ], "vnfProductsFromProviders": [], "vnfInstanceIds": [ "02e46e91-2722-4f2d-af91-313f5981a199" ], "vnfInstanceNames": [] }, "notificationTypes": [ "AlarmNotification" ], "faultyResourceTypes": [ "COMPUTE" ], "perceivedSeverities": [ "WARNING" ], "eventTypes": [ "EQUIPMENT_ALARM" ], "probableCauses": [ "The server cannot be connected." ] }, "callbackUri": "/nfvo/notify/alarm", "_links": { "self": "/vnffm/v1/subscriptions/fb782658-af96-47e7-9faa-90ba8416e426" } }
Note
When Tacker sends a request to the Client's callbackUri, the Client processes it and returns HTTP 204 No Content by default.
Add new API - GET /vnffm/v1/subscriptions
Query all subscriptions in the Tacker.
Allow users to filter out subscriptions based on
FmSubscription
query parameter in the request.For example, below URL query parameter will matching subscriptions with perceivedSeverities=WARNING
GET /vnffm/v1/subscriptions?filter=(eq,filter/perceivedSeverity,WARNING)
Method type: GET
Normal http response code : 200 OK
Expected error http response codes
400 Bad Request: Shall be returned when the attribute-based filter expression is invalid or there are too many subscriptions in the query, the response is too big. 401 NotAuthorized: Shall be returned when authentication fails. 403 Forbidden: Shall be returned when user is not authorized to call this REST API. 404 NotFound: Shall be returned when the URL is wrong. 405 Method Not Allowed: Shall be returned when the HTTP method is not supported. 500 Internal Server Error: Shall be returned when an internal error occurs.
Request - There is no request body.
Response - Return
FmSubscription
parameter as shown below[ { "id": "fb782658-af96-47e7-9faa-90ba8416e426", "filter": { "vnfInstanceSubscriptionFilter": { "vnfdIds": [ "b1db0ce7-ebca-1fb7-95ed-4840d70a9923" ], "vnfProductsFromProviders": [], "vnfInstanceIds": [ "02e46e91-2722-4f2d-af91-313f5981a199" ], "vnfInstanceNames": [] }, "notificationTypes": [ "AlarmNotification" ], "faultyResourceTypes": [ "COMPUTE" ], "perceivedSeverities": [ "WARNING" ], "eventTypes": [ "EQUIPMENT_ALARM" ], "probableCauses": [ "The server cannot be connected." ] }, "callbackUri": "/nfvo/notify/alarm", "_links": { "self": "/vnffm/v1/subscriptions/fb782658-af96-47e7-9faa-90ba8416e426" } } ]
Add new API - GET /vnffm/v1/subscriptions/{subscriptionId}
Query the subscription specified in the Tacker.
Method type: GET
Normal http response code : 200 OK
Expected error http response codes
401 NotAuthorized: Shall be returned when authentication fails. 403 Forbidden: Shall be returned when user is not authorized to call this REST API. 404 NotFound: Shall be returned when the subscriptionId does not exist. 405 Method Not Allowed: Shall be returned when the HTTP method is not supported. 500 Internal Server Error: Shall be returned when an internal error occurs.
Request - There is no request body.
Response - Return
FmSubscription
parameter as shown below{ "id": "fb782658-af96-47e7-9faa-90ba8416e426", "filter": { "vnfInstanceSubscriptionFilter": { "vnfdIds": [ "b1db0ce7-ebca-1fb7-95ed-4840d70a9923" ], "vnfProductsFromProviders": [], "vnfInstanceIds": [ "02e46e91-2722-4f2d-af91-313f5981a199" ], "vnfInstanceNames": [] }, "notificationTypes": [ "AlarmNotification" ], "faultyResourceTypes": [ "COMPUTE" ], "perceivedSeverities": [ "WARNING" ], "eventTypes": [ "EQUIPMENT_ALARM" ], "probableCauses": [ "The server cannot be connected." ] }, "callbackUri": "/nfvo/notify/alarm", "_links": { "self": "/vnffm/v1/subscriptions/fb782658-af96-47e7-9faa-90ba8416e426" } }
Add new API - DELETE /vnffm/v1/subscriptions/{subscriptionId}
Delete the subscription specified in the Tacker.
Method type: DELETE
Normal http response code : 204 No Content
Expected error http response codes
401 NotAuthorized: Shall be returned when authentication fails. 403 Forbidden: Shall be returned when user is not authorized to call this REST API. 404 NotFound: Shall be returned when the subscriptionId does not exist. 405 Method Not Allowed: Shall be returned when the HTTP method is not supported. 500 Internal Server Error: Shall be returned when an internal error occurs.
Request - There is no request body.
Response - There is no response body.
Add new request POST <Client URI from subscriptions>
Notify Client that Tacker received an alarm.
Method type: POST
Normal http response code : 204 No Content
The error http response codes are determined by the Client.
Request - There are two types of request bodies.
AlarmNotification
as shown below
{ "id": "87bea1ed-6ced-403e-8640-2c631eb55d08", "notificationType": "AlarmNotification", "subscriptionId": "fb782658-af96-47e7-9faa-90ba8416e426", "timeStamp": "2021-09-03 10:21:03", "alarm": { "id": "78a39661-60a8-4824-b989-88c1b0c3534a", "managedObjectId": "c61314d0-f583-4ab3-a457-46426bce02d3", "rootCauseFaultyResource": { "faultyResource": { "vimConnectionId": "0d57e928-86a4-4445-a4bd-1634edae73f3", "resourceId": "4e6ccbe1-38ec-4b1b-a278-64de09ba01b3", "vimLevelResourceType": "OS::Nova::Server" }, "faultyResourceType": "COMPUTE" }, "alarmRaisedTime": "2021-09-03 10:21:03", "alarmChangedTime": "", "alarmClearedTime": "", "alarmAcknowledgedTime": "", "ackState": "UNACKNOWLEDGED", "perceivedSeverity": "WARNING", "eventTime": "2021-09-03 10:06:03", "eventType": "EQUIPMENT_ALARM", "faultType": "", "probableCause": "The server cannot be connected.", "isRootCause": "false", "correlatedAlarmIds": [], "faultDetails": [], "_links": { "self": { "href": "/vnffm/v1/alarms/78a39661-60a8-4824-b989-88c1b0c3534a" }, "objectInstance": { "href": "/vnffm/v1/vnf_instances/c61314d0-f583-4ab3-a457-46426bce02d3" } } }, "_links": { "subscription": { "href": "/vnffm/v1/subscriptions/fb782658-af96-47e7-9faa-90ba8416e426" } } }
AlarmClearedNotification
as shown below
{ "id": "87bea1ed-6ced-403e-8640-2c631eb55d08", "notificationType": "AlarmClearedNotification", "subscriptionId": "fb782658-af96-47e7-9faa-90ba8416e426", "timeStamp": "2021-09-03 10:21:03", "alarmId": "78a39661-60a8-4824-b989-88c1b0c3534a", "alarmClearedTime": "2021-09-03 10:36:03", "_links": { "subscription": { "href": "/vnffm/v1/subscriptions/fb782658-af96-47e7-9faa-90ba8416e426" }, "alarm": { "href": "/vnffm/v1/alarms/78a39661-60a8-4824-b989-88c1b0c3534a" } } }
Response - There is no response body.
Add new request GET <Client URI from subscriptions>
Confirm that the URI of Client is correct.
Method type: GET
Normal http response code : 204 No Content
The error http response codes are determined by the Client.
Request - There is no response body.
Response - There is no response body.
Modify POST /alert/vnf_instances/{vnf_instance_id}
Receive the alert sent from Alertmanager.
The additional processing determines whether to perform AutoScale or AutoHeal based on the information in the alert.
When the
function_type
field oflabels
isvnffm
, AutoHeal processing is performed.When the
function_type
field oflabels
is notvnffm
, AutoScale processing is performed.Sample of alert contents with Prometheus format is below
{ "receiver": "example", "status": "firing", "alerts": [ { "status": "firing", "labels": { "receiver_type": "tacker", "function_type": "vnffm", "vnf_instance_id": "c61314d0-f583-4ab3-a457-46426bce02d3", "node": "worker193", "perceived_severity": "WARNING", "event_type": "EQUIPMENT_ALARM" }, "annotations": { "probable_cause": "The server cannot be connected." }, "startsAt": "0000-00-00T00:00:00.000000000Z", "endsAt": "0000-00-00T00:00:00.000000000Z", "generatorURL": "" } ], "groupLabels": {}, "commonLabels": { "alertname": "KubeNodeNotReady", "instance": "example_instance", "job": "example_job", "severity": "example" }, "commonAnnotations": { "description": "description of alert", "summary": "summary content" }, "externalURL": "", "version": "", "groupKey": "{}:{}" }
Security impact
None
Notifications impact
None
Other end user impact
None
Performance Impact
None
Other deployer impact
None
Developer impact
None
Implementation
Assignee(s)
- Primary assignee:
-
Masaki Ueno <masaki.ueno.up@hco.ntt.co.jp>
Hirofumi Noguchi <hirofumi.noguchi.rs@hco.ntt.co.jp>
- Other contributors:
-
Yi Feng <fengyi@fujitsu.com>
Yoshiyuki Katada <katada.yoshiyuk@fujitsu.com>
Ayumu Ueha <ueha.ayumu@fujitsu.com>
Work Items
- Implement Tacker server to support:
- Add new Rest API
GET /vnffm/v1/alarms
to query all alarms. - Add new Rest API
GET /vnffm/v1/alarms/{alarmId}
to query the specified alarm. - Add new Rest API
PATCH /vnffm/v1/alarms/{alarmId}
to change target Alarm to confirmed. - Add new Rest API
POST /vnffm/v1/subscriptions
to create a new subscription. - Add new Rest API
GET /vnffm/v1/subscriptions
to query all subscription. - Add new Rest API
GET /vnffm/v1/subscriptions/{subscriptionId}
to query the specified subscription. - Add new Rest API
DELETE /vnffm/v1/subscriptions/{subscriptionId}
to delete the specified subscription. - Add new Request
POST <Client URI from subscriptions>
to notify Client that Tacker received an alarm. - Add new Request
GET <Client URI from subscriptions>
to confirm that the URI of Client is correct. - Modify Rest API
POST /alert/vnf_instances/{vnf_instance_id}
to receive the alert sent from Alertmanager.
- Add new Rest API
- Provide a sample script to perform the following tasks:
- Add new Alert rule in configure Prometheus.
- Add the hostname corresponding to each VM in the
InstantiatedVnfInfo.vnfcResourceInfo.metadata
field.
- Add new unit and functional tests.
Dependencies
LCM operations for the Kubernetes cluster depend on the following specifications:
Instantiate/Scale/Heal/Terminate operation for the Kubernetes cluster
Depends on spec "MgmtDriver with Kubernetes cluster management OSS"16.
Testing
Unit and functional tests will be added to cover cases required in the spec.
Documentation Impact
- Complete user guide will be added to explain how to monitor by Prometheus.
- Update API documentation on the API additions mentioned in REST API impact.
References
https://www.etsi.org/deliver/etsi_gs/NFV-SOL/001_099/003/02.06.01_60/gs_nfv-sol003v020601p.pdf↩︎
https://specs.openstack.org/openstack/tacker-specs/specs/xena/k8s-mgmtdriver-kubespray.html↩︎
https://www.etsi.org/deliver/etsi_gs/NFV-SOL/001_099/003/03.03.01_60/gs_nfv-sol003v030301p.pdf↩︎
https://review.opendev.org/c/openstack/tacker-specs/+/786573/1/specs/xena/prometheus-monitoring.rst↩︎
https://www.etsi.org/deliver/etsi_gs/NFV-SOL/001_099/003/03.03.01_60/gs_nfv-sol003v030301p.pdf↩︎
https://specs.openstack.org/openstack/tacker-specs/specs/xena/k8s-mgmtdriver-kubespray.html↩︎
https://www.etsi.org/deliver/etsi_gs/NFV-SOL/001_099/003/03.03.01_60/gs_nfv-sol003v030301p.pdf↩︎
https://www.etsi.org/deliver/etsi_gs/NFV-SOL/001_099/003/03.03.01_60/gs_nfv-sol003v030301p.pdf↩︎
https://review.opendev.org/c/openstack/tacker-specs/+/786573/1/specs/xena/prometheus-monitoring.rst↩︎
https://review.opendev.org/c/openstack/tacker-specs/+/786573/1/specs/xena/prometheus-monitoring.rst↩︎
https://www.etsi.org/deliver/etsi_gs/NFV-SOL/001_099/003/03.03.01_60/gs_nfv-sol003v030301p.pdf↩︎
https://review.opendev.org/c/openstack/tacker-specs/+/786573/1/specs/xena/prometheus-monitoring.rst↩︎
https://specs.openstack.org/openstack/tacker-specs/specs/xena/k8s-mgmtdriver-kubespray.html↩︎
https://review.opendev.org/c/openstack/tacker-specs/+/786573/1/specs/xena/prometheus-monitoring.rst↩︎
https://www.etsi.org/deliver/etsi_gs/NFV-SOL/001_099/003/03.03.01_60/gs_nfv-sol003v030301p.pdf↩︎
https://specs.openstack.org/openstack/tacker-specs/specs/xena/k8s-mgmtdriver-kubespray.html↩︎