Add router max bandwidth metrics to ovs plugin

This patch will help operators determine if a node that
hosts routers has enough active routers to potentially exceed
the network capacity of the host.  It can also be used to decide
if routers should be moved around/rebalanced.

Three new metrics will be published if the 'publish_router_capacity'
flag is set to true in the ovs.yaml file (off by default):

  vrouter.max_bw_kb (customer metric)
  ovs.vrouter.max_bw_kb (ops metric)
  ovs.vrouter.host_max_bw_kb (ops metric)

This functionality depends on nova flavor bandwith quotas
being configured:

https://wiki.openstack.org/wiki/InstanceResourceQuota#Bandwidth_limits

At some point in the future when neutron supports router QOS,
this functionality should be extended to also honor that.

Change-Id: Ie00260d99bdd1f761dca28d2a5779b906a196564
This commit is contained in:
Brad Klein 2017-01-09 16:14:33 -07:00
parent 8615610f1c
commit 962888a9ff
3 changed files with 177 additions and 22 deletions

View File

@ -9,6 +9,7 @@
- [Per-Router Metrics](#per-router-metrics)
- [Per-DHCP port Metrics](#per-dhcp-port-metrics)
- [Per-DHCP Rate Metrics](#per-dhcp-rate-metrics)
- [Per-Host Metrics](#per-host-metrics)
- [Mapping Metrics to Configuration Parameters](#mapping-metrics-to-configuration-parameters)
- [Router Metric Dimensions](#router-metric-dimensions)
- [OVS Port Metric Dimensions](#ovs-port-metric-dimensions)
@ -101,18 +102,19 @@ instances:
## Per-Router Metrics
| Name | Description |
| -------------------------|-----------------------------------------------------------------|
| vrouter.in_bytes |Inbound bytes for the router (if `network_use_bits` is false) |
| vrouter.out_bytes | Outgoing bytes for the router (if `network_use_bits` is false) |
| vrouter.in_bits | Inbound bits for the router (if `network_use_bits` is true) |
| vrouter.out_bits | Outgoing bits for the router (if `network_use_bits` is true) |
| vrouter.in_packets | Incoming packets for the router |
| vrouter.out_packets | Outgoing packets for the router |
| vrouter.in_dropped | Incoming dropped packets for the router |
| vrouter.out_dropped | Outgoing dropped packets for the router |
| vrouter.in_errors | Number of incoming errors for the router |
| vrouter.out_errors | Number of outgoing errors for the router |
| Name | Description |
| -------------------------|--------------------------------------------------------------------------------------------------------------------------|
| vrouter.in_bytes | Inbound bytes for the router (if `network_use_bits` is false) |
| vrouter.out_bytes | Outgoing bytes for the router (if `network_use_bits` is false) |
| vrouter.in_bits | Inbound bits for the router (if `network_use_bits` is true) |
| vrouter.out_bits | Outgoing bits for the router (if `network_use_bits` is true) |
| vrouter.in_packets | Incoming packets for the router |
| vrouter.out_packets | Outgoing packets for the router |
| vrouter.in_dropped | Incoming dropped packets for the router |
| vrouter.out_dropped | Outgoing dropped packets for the router |
| vrouter.in_errors | Number of incoming errors for the router |
| vrouter.out_errors | Number of outgoing errors for the router |
| vrouter.max_bw_kb | Maximum bandwidth possible for the router based on the instances using the router (if `publish_router_capacity` is true) |
## Per-DHCP port Metrics
@ -144,8 +146,14 @@ instances:
| vswitch.out_error_sec | Outgoing errors per second for the DHCP port |
| vswitch.in_error_sec | Incoming errors per second for the DHCP port |
## Per-Host Metrics
| Name | Description |
| ---------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ovs.vrouter.host_max_bw_kb | Maximum bandwidth possible for routers on the the host based on the instances using those routers (if `publish_router_capacity` is true). Only published to the operations tenant. |
## Mapping Metrics to Configuration Parameters
Configuration parameters can be used to control which metrics are reported by ovs plugin. There are 3 parameters currently in ovs config file: use_rate_metrics, use_absolute_metrics and use_health_metrics.
Configuration parameters can be used to control which metrics are reported by ovs plugin. There are 4 parameters currently in ovs config file: use_rate_metrics, use_absolute_metrics, use_health_metrics and publish_router_capacity.
| Tuning Knob | Admin Metric Name | Tenant Metric Name |
@ -166,6 +174,8 @@ Configuration parameters can be used to control which metrics are reported by ov
| | ovs.vrouter.in_errors | vrouter.in_errors |
| | ovs.vrouter.out_dropped | vrouter.out_dropped |
| | ovs.vrouter.out_errors | vrouter.out_errors |
| publish_router_capacity (default: False) | ovs.vrouter.max_bw_kb | vrouter.max_bw_kb |
| | ovs.vrouter.host_max_bw_kb | N/A |
NOTE:

View File

@ -17,6 +17,7 @@ import time
from copy import deepcopy
from monasca_agent.collector.checks import AgentCheck
from neutronclient.v2_0 import client as neutron_client
from novaclient import client as nova_client
OVS_CMD = """\
%s --columns=name,external_ids,statistics,options \
@ -48,6 +49,7 @@ class OvsCheck(AgentCheck):
self.use_absolute_metrics = self.init_config.get('use_absolute_metrics')
self.use_rate_metrics = self.init_config.get('use_rate_metrics')
self.use_health_metrics = self.init_config.get('use_health_metrics')
self.publish_router_capacity = self.init_config.get('publish_router_capacity')
if include_re is None:
include_re = 'qg.*'
else:
@ -132,6 +134,7 @@ class OvsCheck(AgentCheck):
# let's publish.
#
tried_one_update = False
host_router_max_bw = 0
for ifx, value in ifx_deltas.iteritems():
port_uuid = value['port_uuid']
@ -141,7 +144,7 @@ class OvsCheck(AgentCheck):
# file for a missing port uuid once per wakeup.
#
tried_one_update = True
log_msg = "port_uuid {0} not in port cache -- updating."
log_msg = "port_uuid {0} not in port cache -- updating."
self.log.info(log_msg.format(port_uuid))
port_cache = self._update_port_cache()
if not port_cache:
@ -175,6 +178,13 @@ class OvsCheck(AgentCheck):
this_dimensions = dims_base.copy()
this_dimensions.update(ifx_dimensions)
customer_dimensions = this_dimensions.copy()
del customer_dimensions['hostname']
ops_dimensions = this_dimensions.copy()
ops_dimensions.update({'tenant_id': tenant_id})
if tenant_name:
ops_dimensions.update({'tenant_name': tenant_name})
for metric_name, idx in self._get_metrics_map(measure).iteritems():
# POST to customer project
interface_stats_key = self._get_interface_stats_key(idx, metric_name, measure, ifx)
@ -185,8 +195,6 @@ class OvsCheck(AgentCheck):
# a value to publish for that metric this round.
#
continue
customer_dimensions = this_dimensions.copy()
del customer_dimensions['hostname']
if is_router_port:
metric_name_rate = "vrouter.{0}_sec".format(metric_name)
metric_name_abs = "vrouter.{0}".format(metric_name)
@ -195,10 +203,6 @@ class OvsCheck(AgentCheck):
metric_name_abs = "vswitch.{0}".format(metric_name)
if not self.use_health_metrics and interface_stats_key in HEALTH_METRICS:
continue
ops_dimensions = this_dimensions.copy()
ops_dimensions.update({'tenant_id': tenant_id})
if tenant_name:
ops_dimensions.update({'tenant_name': tenant_name})
if self.use_rate_metrics:
self.gauge(metric_name_rate, value[interface_stats_key],
dimensions=customer_dimensions,
@ -220,6 +224,15 @@ class OvsCheck(AgentCheck):
# POST to operations project
self.gauge("ovs.{0}".format(metric_name_abs), abs_value,
dimensions=ops_dimensions)
self._publish_max_bw_metrics(port_info, customer_dimensions,
ops_dimensions)
host_router_max_bw += self._get_port_cache_max_bw(port_info)
if host_router_max_bw > 0:
self.gauge('ovs.vrouter.host_max_bw_kb', host_router_max_bw,
dimensions=dims_base)
self._update_counter_cache(ctr_cache,
math.ceil(time.time() - time_start), measure)
@ -275,6 +288,24 @@ class OvsCheck(AgentCheck):
return data
return None
def _get_nova_client(self):
username = self.init_config.get('admin_user')
password = self.init_config.get('admin_password')
tenant_name = self.init_config.get('admin_tenant_name')
auth_url = self.init_config.get('identity_uri')
region_name = self.init_config.get('region_name')
nc = nova_client.Client(2, username,
password,
tenant_name,
auth_url,
endpoint_type='internalURL',
service_type="compute",
region_name=region_name)
return nc
def _get_neutron_client(self):
username = self.init_config.get('admin_user')
@ -429,6 +460,8 @@ class OvsCheck(AgentCheck):
if tenant_name:
port_cache[port_uuid]['tenant_name'] = tenant_name
port_cache = self._add_max_bw_to_port_cache(port_cache,
all_ports_data)
port_cache['last_update'] = int(time.time())
# Write the updated cache
@ -442,6 +475,115 @@ class OvsCheck(AgentCheck):
format(self.port_cache_file, e))
return port_cache
def _get_port_cache_max_bw(self, port_info):
if port_info['is_router_port'] and 'max_bw_kb' in port_info:
return port_info['max_bw_kb']
else:
return 0
def _publish_max_bw_metrics(self, port_info, cust_dims, ops_dims):
max_bw_kb = self._get_port_cache_max_bw(port_info)
if not self.publish_router_capacity or max_bw_kb == 0:
return
metric_name = 'vrouter.max_bw_kb'
self.gauge(metric_name, max_bw_kb, dimensions=cust_dims,
delegated_tenant=ops_dims['tenant_id'],
hostname='SUPPRESS')
self.gauge("ovs.{0}".format(metric_name), max_bw_kb,
dimensions=ops_dims)
def _get_max_flavor_bw(self, flavor_keys):
avg_bw = 0
peak_bw = 0
burst_bw = 0
#
# we'll sum inbound and outbound for the max possible throughput
#
avg_re = re.compile('quota:vif_.*bound_average')
peak_re = re.compile('quota:vif_.*bound_peak')
burst_re = re.compile('quota:vif_.*bound_burst')
for key in flavor_keys:
if re.match(avg_re, key):
avg_bw += int(flavor_keys[key])
elif re.match(peak_re, key):
peak_bw += int(flavor_keys[key])
elif re.match(burst_re, key):
burst_bw += int(flavor_keys[key])
return max(avg_bw, peak_bw, burst_bw)
def _add_max_bw_to_port_cache(self, port_cache, all_ports_data):
if not self.publish_router_capacity:
return port_cache
tmp_port_cache = deepcopy(port_cache)
#
# No need to do a flavor get multiple times
# for the same flavor when rebuilding the cache.
#
flavor_cache = {}
try:
if not hasattr(self, 'nova_client'):
self.nova_client = self._get_nova_client()
for uuid in port_cache:
router_max_bw_kb = 0
port = port_cache[uuid]
if not port['is_router_port']:
continue
inst_ids = self._get_instance_ids(all_ports_data, port['device_uuid'])
for instance_id in inst_ids:
instance = self.nova_client.servers.get(instance_id)
flavor_id = instance.flavor['id']
if flavor_id not in flavor_cache:
flavor = self.nova_client.flavors.get(instance.flavor['id'])
flavor_cache[flavor_id] = flavor.get_keys()
router_max_bw_kb += self._get_max_flavor_bw(flavor_cache[flavor_id])
if router_max_bw_kb > 0:
tmp_port_cache[uuid]['max_bw_kb'] = router_max_bw_kb
except Exception as e:
msg = "Unable to get the nova instance and flavor info: {0}"
self.log.error(msg.format(e))
return tmp_port_cache
def _get_instance_ids(self, ports, router_uuid):
subnet_ids = self._get_port_ids(ports,
'network:router_interface',
[router_uuid],
'device_id',
'network_id')
instance_ids = self._get_port_ids(ports,
'compute:',
subnet_ids,
'network_id',
'device_id')
return instance_ids
def _get_port_ids(self, ports, owner, uuids, match_field, return_field):
return_uuids = []
if len(uuids) == 0:
return return_uuids
for port in ports:
if port[match_field] in uuids and owner in port['device_owner']:
return_uuids.append(port[return_field])
return return_uuids
def _load_port_cache(self):
"""Load the cache map of router/dhcp port uuids to router uuid, name,
and tenant name.

View File

@ -29,12 +29,14 @@ use_absolute_metrics = True
use_rate_metrics = True
# If set, will submit the health metrics
use_health_metrics = True
# If set, router max capacity metrics will be published
publish_router_capacity = False
# Acceptable arguments
acceptable_args = ['admin_user', 'admin_password', 'admin_tenant_name',
'identity_uri', 'cache_dir', 'neutron_refresh', 'ovs_cmd',
'network_use_bits', 'check_router_ha', 'region_name',
'included_interface_re', 'conf_file_path', 'use_absolute_metrics',
'use_rate_metrics', 'use_health_metrics']
'use_rate_metrics', 'use_health_metrics', 'publish_router_capacity']
# Arguments which must be ignored if provided
ignorable_args = ['admin_user', 'admin_password', 'admin_tenant_name',
'identity_uri', 'region_name', 'conf_file_path']
@ -128,7 +130,8 @@ class Ovs(monasca_setup.detection.Plugin):
'included_interface_re': included_interface_re,
'use_absolute_metrics': use_absolute_metrics,
'use_rate_metrics': use_rate_metrics,
'use_health_metrics': use_health_metrics}
'use_health_metrics': use_health_metrics,
'publish_router_capacity': publish_router_capacity}
for option in cfg_needed:
init_config[option] = neutron_cfg.get(cfg_section, cfg_needed[option])