Merge "outlet Temperature based migration strategy spec"

This commit is contained in:
Jenkins 2016-01-11 09:55:24 +00:00 committed by Gerrit Code Review
commit b00ca96e03
1 changed files with 178 additions and 0 deletions

View File

@ -0,0 +1,178 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
Outlet Temperature Based Strategy
==========================================
https://blueprints.launchpad.net/watcher/+spec/outlet-temperature-based-strategy
Outlet(Exhaust Air) temperature is a new thermal telemetry which can be used
to measure the server's thermal/workload status.
This spec proposes a new Watcher migration strategy based on the outlet
temperature of servers. This strategy makes decisions to migrate workloads
to the servers with good thermal condition (lowest outlet temperature) when
the outlet temperature of source servers reach a configurable threshold.
Note: "server" in this document means "hypervisor".
Problem description
===================
In current Data Center infrastructure, the cooling air supply to servers can
be different. When a server is overloaded or the supply air is too hot, the
outlet temperature telemetry can be used to detect the problem. In order to
have the server in a reliable thermal condition, some of the server's
workloads should be migrated to other server with safer thermal conditions.
Use Cases
----------
As an administrator, I want to be able to trigger an audit that controls the
temperature and perform workload load balancing.
In order to :
* Reduce the total power consumption spent on cooling.
* Increase the lifespan of the data center because cooling effectiveness is a
first order factor.
Project Priority
-----------------
Not relevant because Watcher is not in the big tent so far.
Proposed change
===============
Watcher already has its decision framework, so this strategy should be a new
class which extend the base strategy class.
* Set the threshold in 2 steps : hard coded first, then through the template.
* Create a new Python class to extend the "BaseStrategy" class.
* Use the Ceilometer client to get Outlet temperature metrics of hypervisors.
* Use the Nova objects framework to get free CPU/Memory/Disk of hypervisors.
* An algorithm to detect if the threshold of Outlet temperature has been
reached and to choose the migration target server. It will filter the viable
targets according to the free resource information of hypervisors from
previous step.
Alternatives
------------
No alternative
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
There used to be some performance issues regarding the query of metrics from
the Ceilometer database. This is one of the reason why it was rarely used in
production environment. These issues may now be solved thanks to an
abstraction layer which enables anybody to change the underlying metrics
storage backend easily.
There is also a performance issue when you query the Nova DB to get cpu
usage metrics.
Other deployer impact
---------------------
None
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
<junjie-huang>
Work Items
----------
1. function to use Ceilometer client to get outlet temperature of hypervisors.
2. function to filter servers by Nova basic metrics(free CPU/Memory/Disk)
3. Rewrite execute function to add the algorithm to detect if the threshold
of outlet T has been reached and choose the target hypervisor, generate
action plan.
Dependencies
============
* https://wiki.openstack.org/wiki/Ceilometer/blueprints/APIv2
* https://blueprints.launchpad.net/ceilometer/+spec/api-v2-improvement
* http://docs.openstack.org/admin-guide-cloud/telemetry-measurements.html
* http://docs.openstack.org/developer/python-novaclient/api.html
Testing
=======
Unit tests and functional test, will use a fake metrics set for running
functional test.
Documentation Impact
====================
A documentation explaining how to use this new optimization strategy.
References
==========
http://www.intel.com/content/www/us/en/servers/ipmi/ipmi-home.html
History
=======
None