The filename of the spec should match the blueprint in launchpad, this is what the move-implemented-specs script works with and needs, and our template/review process should enforce. This change fixes the newton specs that had busted names and updates the redirects file to preserve any links to the old names. You can test this by running: tox -r -e move-implemented-specs -- -n -v newton And there won't be any warnings at the end. Change-Id: I806bdebde1c5bf8e48dfdbf4813342a9c4de5ad3
4.3 KiB
Report host memory b/w as a metric in Nova
https://blueprints.launchpad.net/nova/+spec/memory-bw
This spec proposes to introduce host memory b/w as a host metric. Memory b/w can be a essential piece in determining VM performance bottlenecks and further can be used for better NUMA based placements.
Using Linux platform interface like linux perf APIs, nova-compute should be able to expose host's memory bandwidth utilization on every NUMA node. This memory b/w can be leveraged in Openstack by exposing it as a monitor.
This will follow a similar approach as the already existing monitor for CPU.(cpu_monitor.py)
Problem description
Workload optimization for high CPU/Memory intensive workload can be challenging. This applies to workloads running Redis/Hadoop etc. Host Memory B/W utilization data is a key indicator to denote the memory bus overload and can be exposed via the Linux Perf APIs. This metric can then be leveraged for better placement/optimization of high CPU/memory intensive workloads.
Use Cases
- Get memory b/w stats as a metric data by adding a new subclass of BaseResourceMonitor.
Project Priority
None
Proposed change
Performance co-pilot (PCP) is a system performance and analysis framework available with most of the popular distros. The linux perf APIs are called via the PCP tool. The PCPD daemon can be used to obtain/fetch values of the Nest/Uncore memory PMU counters on each NUMA node.
PCP provides the python bindings that would be called via openstack monitor code in nova to obtain the desired values for memory bandwidth utilization.
Estimated changes are going to be in the following places:
Extend the Resource monitor framework to implement a optional monitor for Memory B/W utilization, much in line with the CPU monitor.
Define two methods in the virt driver parent class and implement them in the livirt driver:
- `get_max_memory_bw`: Returns the maximum memory bandwidth for each NUMA node.
- `get_memory_bw_counter_agg`: Returns the value of the aggregated counter values associated with memory bandwidth for each NUMA node.
Nova shall calculate the diff of the aggregated counter values over two calls and calculate the rate. This rate will be compared against the maximum bw value to obtain the utilization. get_max_memory_bw shall be called only once during the initialization of the monitor.
The unit of representation of the rate will be made consistent with the value obtained from the counters.
Introduce a nova object model representation of the data.
Alternatives
The alternative is to call the perf APIs directly but that introduces platform specific dependencies. PMU counter names and the math to derive memory bandwidth shall vary across platforms and types of hardware. This gap shall be bridged by PCP.
Data model impact
None
REST API impact
None
Security impact
None.
Notifications impact
None
Other end user impact
None
Performance Impact
The performance impact is negligible since the data is aggregated by the hardware and accessed via PCP. Openstack will call this API once a minute with an option to increase the interval.
Other deployer impact
The following packages should be added to the system:
- pcp
- python-pcp
Developer impact
None
Implementation
Assignee(s)
- Primary assignee:
-
Sudipta Biswas sbiswas7
- Other assignee:
-
Pradipta Banerjee bpradipt
Work Items
- Use pcp python bindings to obtain the memory bw utilization.
- Perform data sampling in the monitoring code.
- Create metrics plugin to sample the memory b/w data.
Dependencies
None
Testing
The changes will be exercised through unit tests. A functional test shall be added and 'skipped' if pcp is not installed.
Documentation Impact
None