tacker-specs/specs/newton/alarm-based-monitoring-driver.rst
melissaml 19dc6f44ed Rename review.openstack.org to review.opendev.org
There are many references to review.openstack.org, and while the
redirect should work, we can also go ahead and fix them.

Change-Id: Ic20bd71e56ba31bb814003d5d9693c61d414cab5
2019-05-13 01:08:35 +08:00

12 KiB

Add alarm-based monitoring driver to Tacker

https://blueprints.launchpad.net/tacker/+spec/alarm-based-monitoring-driver

This spec describes an alarm-based monitoring driver in Tacker

Problem description

ETSI MANO architecture describes to monitor the VNF to take appropriate action such as fault management, performance management. Monitoring became an important aspect in MANO architecture. Currently, Tacker provides a very minimal support for checking the liveliness of VNF elements by means of ping or curl which helps to recover the element in case it is unreachable. But Tacker does not support monitoring of the CPU/memory usage of VNF elements. Further, it is necessary for Tacker to monitor all VNF resources as well. The reason is that the failure of VNFs happen too diversely.

Proposed change

The scope of this spec focused on:

  • designing a generic monitoring framework. Whereby, an alarm-based monitoring driver in Tacker is designed to collect alarms/events triggered by the low-level designs (Ceilometer, Monasca, custom driver). In this spec, the alarm-based monitoring driver can completely monitor any resources in OpenStack that Ceilometer can support. In real implementation, this spec aims to leverage Ceilometer to monitor CPU/memory usage inside VNF.
  • defining Monitoring Policy using the TOSCA Policy format. The monitoring policy can apply to a single VDU or multiple VDUs.
  • adding support for inserting Ceilometer Alarms into the HOT template to allow Ceilometer to trigger scaling in Heat resource groups.
The alarm-based monitoring framework:
        +-----------------------------------+
        |                                   |
        |                                   |
        |      +-----------------+          |
        |      | VNFM / TOSCA    |          |
        |      |                 |          |
        |      +--------+--------+          |
        |               |                   |
        |      +--------v--------+          |
        |      |                 |          |
        |      | alarm-framework <-----+    |
        |      |                 +---+ |    |
        |      +-+-^-------+-^---+   | |    |
        |        | |       | |       | |    |
        | +------v-++  +---v-+-+  +--v-+-+  |
        | |         |  |       |  |      |  |
        | |         |  |       |  |      |  |
        | |Ceilometer  |Monasca|  |Custom|  |
        | |         |  |       |  |      |  |
        | |         |  |       |  |      |  |
        | |         |  |       |  |      |  |
        | +---------+  +-------+  +------+  |
        +-----------------------------------+

The TOSCA scheme could be defined as the following:

tosca.policies.tacker.Monitoring

tosca.policies.tacker.Monitoring:
  derived_from: tosca.policies.Monitoring
  targets:
    type: list
    entry_schema:
      type: string
      required: true
    description: List of monitored VDUs
  triggers:
    resize_compute:
      event_type:
        type: map
        entry_schema:
          type: string
        required: true
      metrics:
        type: string
        required: true
      condition:
        type: map
        entry_schema:
          type: string
        required: false
      action:
        type: map
        entry_schema:
          type: string
        required: true

TOSCA template referred to1 could be modeled with the below details in term of auto-scaling:

tosca_definitions_version: tosca_simple_profile_for_nfv_1_0_0
description: Demo example

metadata:
template_name: sample-tosca-vnfd

topology_template:
node_templates:
   vdu1:
     type: tosca.nodes.nfv.VDU.Tacker
     capabilities:
       nfv_compute:
         properties:
           disk_size: 1 GB
           mem_size: 512 MB
           num_cpus: 2
     properties:
       image: cirros-0.3.4-x86_64-uec
       mgmt_driver: noop
       availability_zone: nova

   vdu1_cpu_usage_monitoring_policy:
       type: tosca.policies.tacker.Monitoring
       targets: [vdu1]
       triggers:
           resize_compute:
               event_type:
                   type: tosca.events.resource.utilization
                   implementation: Ceilometer
               metrics: cpu_util
               condition: utilization greater_than 70%
                   threshold: 70
                   period: 60
                   evaluations: 1
                   method: average
                   comparison: gt
               action:
                   resize: vdu1_scaling_policy

In the above template, event type is described in2 and used in3.

alarm_url will be created by webhook in Tacker as the following:

v1.0/vnfs/<vnf-uuid>/<monitoring-policy-name>/<action-name>/<params>

Where: monitoring-policy is the name of monitoring policy which is described in VNFD.

action-name is the name of action which is described in VNFD as well. Multiple actions could be supported in monitoring policy. By changing action-name, the appropriate action will be invoked and then the alarm-based monitoring driver will process this action. In above example, action-name is 'vdu1_scaling_policy'. Whereby, when the monitoring driver receives triggers from Ceilometer, it will invoke scaling action and trigger scaling automatically. The detailed scaling mechanism using the monitoring driver is defined by the scaling spec4.

params contains the information related to alarm-actions. For example, it can be used for user authentication. Whereby, Webhook handler will generate randomly a key. This helps to make sure that we have a unique url for each alarm. Alarm url will be stored in Tacker db and only these unique callbacks will be used. The expression showm below is an example of alarm url which contains user authentication

v1.0/vnfs/<vnf-uuid>/<monitoring-policy-name>/<action-name>/2w3r40-34c2d2

Here, monitoring-policy-name is the name of monitoring policy and threshold is a value which user wants to update.

Based on the different types of callbacks, we have the appropriate actions as following:

#1. if action is "Log", the monitoring driver will restore alarms into database. We have two options to display these information:

  • Use CLI. The status of alarm could be defined in the existing CLI as the following:

    tacker vnf-show [vnf-id]

  • Modify Tacker-Horizon. Add "Alarms" tab to tacker-horizon where user can know what is happening with VNF. This tab need to have some information like: [VDU-ID]-----[Alarms (CPU, MEMORY, PORT,...)]--- [Status (HIGH, LOW, DELETED,..)].

#2. If action is "Scaling", we can call API to trigger scaling. The detailed scaling

mechanism could be found in scaling spec5.

#3. If action is "respawn", this action is the same in case of ping driver.

In order to translate the monitoring policy into HOT template, we can use heat ceilometer resource type. In this approach, Tacker will create OS::Ceilometer::Alarm resource by making use of either the same template used for scale-group or separate template.

create a ceilometer resource as below with required alarm criteria:

vdu_scale_up_alarm:

    type: OS::Ceilometer::Alarm
    properties:

      meter_name: cpu_util
      statistic: avg
      period: 60
      evaluation_periods: 1
      threshold: 50
      comparison_operator: gt
      action:
        - {get_attr: tacker_alarm_url}
vdu_scale_down_alarm:

    type: OS::Ceilometer::Alarm
    properties:

      meter_name: cpu_util
      statistic: avg
      period: 600
      evaluation_periods: 1
      threshold: 15
      comparison_operator: lt
      action:
         - {get_attr: tacker_alarm_url}

Future considerations:

1. Indeed, it is necessary so that the monitoring driver could monitor beyond VDU resources. CP resources should be monitored as well. Especially, it is necessary when we have SFC in the future. The reason is that each CP will need to assign to a Neutron port. SFC is created based on the connection of Neutron ports, therefore port monitoring is necessary for high availability in SFC. The below example show port monitoring which could be done by the alarm-based monitoring driver:

tosca_definitions_version: tosca_simple_profile_for_nfv_1_0_0

description: Demo example

metadata:
template_name: sample-tosca-vnfd

topology_template:
node_templates:
   VDU1:
     type: tosca.nodes.nfv.VDU.Tacker
     properties:
       image: cirros-0.3.4-x86_64-uec
       flavor: m1.tiny
       availability_zone: nova
       mgmt_driver: noop
       config: |
         param0: key1
         param1: key2

   CP1:
    type: tosca.nodes.nfv.CP.Tacker
    properties:
       management: true
       anti_spoofing_protection: false
    requirements:
       - virtualLink:
          node: VL1
       - virtualBinding:
          node: VDU1
   CP_monitoring_policy:
     type: tosca.policies.tacker.Monitoring
     targets: [CP1]
      triggers:
        port_monitoring:
          event:
            type: tosca.events.resource.utilization
            implementation: Ceilometer
            metrics: port_bandwidth
            condition: load greater_than 80%
            period: 60
            evaluations: 1
            statistics: average
            action:
             trigger: vnffg1-ha-policy

2. In the future, Tacker users could want to update monitoring parameters like threshold. The problem is when VNF instances sustain heavy load and CPU usage reaches to the pre-defined threshold value. Alarms will be triggered to Tacker, but actually it not really necessary because the VNF instances still have the ability to work well. Tacker users now want to increase the threshold value. This could be done as the following:

tacker vnf-update --vnf-id <vnf-id> --monitoring-policy-name <monitoring policy>
                                    --threshold [threshold-value]

NOTE: The threshold need to be be parameterized in the template.

Alternatives

None

Data model impact

None

REST API impact

POST on /v1.0/vnfs/<vnf-uuid>/<monitoring-policy>/<action-name>/<params>

Security

Need security between OpenStack Ceilometer and Tacker6.

Notifications impact

Ceilometer triggers alarms to the alarm-based monitoring driver in Tacker.

Other end user impact

None

Performance impact

None

Other deployer impact

None

Developer impact

None

Implementation

Assignee(s)

Primary assignee:

Tung Doan <tungdoan@dcn.ssu.ac.kr>

Kanagaraj Manickam <mkr1481@gmail.com>

Work Items

  1. Tosca monitoring elment model to Heat ceilometer monitoring element translation
  2. Enable the new convention in vnfd for mentioning to the alarm based monitoring parameters
  3. create a sample TOSCA template
  4. Create a new monitoring driver for alarm based monitoring with configurable parameter to use either of the approach mentioned above.
  5. Enable to log Ceilometer alarms and report to users.
  6. Enhance the horizon to show the live monitoring parameters.

Dependencies

In case we use heat ceilometer to describe the monitoring policy, make sure that monitoring strategy is supported by Ceilometer. Testing ========

  1. Monitoring in case of high CPU usage
  • Create vnfd from the alarm-based VNFD template
  • Create vnf from the vnfd
  • Stress VM which VNF is running on. The purpose is to make CPU usage reach threshold.
  • Use CLI/Horizon to show alarms/events related to VNF VM.

Reference


  1. http://docs.oasis-open.org/tosca/tosca-nfv/v1.0/tosca-nfv-v1.0.pdf↩︎

  2. https://www.oasis-open.org/committees/download.php/56812/2015-10-27%20OpenStack%20Tokyo%20-%20Senlin-TOSCA%20vBrownBag-final.pdf↩︎

  3. https://github.com/openstack/tosca-parser/blob/master/toscaparser/tests/data/policies/tosca_policy_template.yaml#L60↩︎

  4. https://review.opendev.org/#/c/318577/↩︎

  5. https://review.opendev.org/#/c/318577/↩︎

  6. https://github.com/openstack/ceilometer/blob/stable/liberty/ceilometer/alarm/notifier/rest.py#L84↩︎