nova-specs/specs/juno/implemented/on-demand-compute-update.rst
Michael Still f0b8204072 Re-organize juno specs
As discussed at our nova meetings, reorganize the juno specs into
three directories:

 - proposed: things proposed which weren't approved
 - approved: things we approved but didn't implement
 - implemented: things approved and implemented

The first I suspect is the most controversial. I've done this
because I worry about the case where a future developer wants to
pick up something dropped by a previous developer, but has trouble
finding previous proposed specifications on the topic. Note that
the actual proposed specs for Juno are adding in a later commit.

Change-Id: Idcf55ca37a83d7098dcb7c2971240c4e8fd23dc8
2014-10-07 07:49:59 +11:00

185 lines
5.8 KiB
ReStructuredText

..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=================================================
Change compute updates from periodic to on demand
=================================================
Include the URL of your launchpad blueprint:
https://blueprints.launchpad.net/nova/+spec/on-demand-compute-update
Currently, all compute nodes update status info in the DB on a periodic
basis (the period is currently 60 seconds). Given that the status of
the node only changes at specific points (mainly image
creation/destruction) this leads to significant DB overhead on a large
system. This BP changes the update mechanism to only update the DB when
a node state changes, specifically at node startup, instance creation
and instance destruction.
Problem description
===================
The status information about compute nodes is updated into the DB on
a periodic basis. This means that every compute node in the system
updates a row in the DB once every 60 seconds (the default period for
this update). This is unnecessary and a scalability problem given that
the status info is mostly static and doesn't change very often, mainly
it changes when an instance is created or destroyed.
Proposed change
===============
Compute node only sends an update if its status changes. On a periodic
interval (using the current default period of 60 seconds) the compute
node will compare its status with the status saved from the last update.
Only if the state or claims have changed will the DB be updated.
The advantage to this method is that it should significantly cut down
on the number of DB updates while changing almost nothing about the way
the system currently works, compute node changes will still take 60
seconds before they are updated but any change, for any reason, will
ultimately be reported.
One potential issue with this is a possible sensitivity concern, updates
shouldn't be constantly sent if, for example, steady state system activity
causes something like RAM usage to change slightly. Some experiments will
have to be run to decide if adding in a sensitivity control for certain
status metrics is needed. This can be a follow on optimization, if it's
necessary, since this design is no worse than the current mechanism.
Alternatives
------------
Another idea would be to only update the DB at certain well defined events.
The update code would only be called at system start up, instance creation
and instance deletion. This would reduce the latency for status updates
(the DB is modified as soon as the system state changes) but it suffers
from some disadvantages:
1) Finding all of the appropriate events to record a status change. Are
statup/creation/destruction the only places where the system state changes,
maybe there are other events that should be tracked.
2) Future changes could add new events that change status and recognizing
that an update is needed is an easy mistake to miss.
3) Status could change unknown to the nova code, imagine something like a
hot plug add of memory.
Data model impact
-----------------
There is no change to the data that is being stored in the DB, all of the
current fields are stored exactly as before, this BP is just changing the
frequency at which those fields are updated.
REST API impact
---------------
None.
Security impact
---------------
None.
Notifications impact
--------------------
None.
Other end user impact
---------------------
None.
Performance Impact
------------------
As measured by DB updates this change will clearly cause no more DB updates
than the current technique and, assuming instances are created on a node
at a rate of less then one every 60 seconds, should cause much fewer DB
updates.
Other deployer impact
---------------------
None.
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Don Dugger <donald.d.dugger@intel.com>
Other contributors:
Work Items
----------
Change should be fairly simple, change the current update code to only
update the DB if a 'state_modified' routine returns true. The
'state_modified' routine returns false if the current state matches the
last recorded state. Otherwise, it saves the current state as the last
recorded state and returns true.
The 'state_modified' routine maintains an in memory copy of the current
status of the compute node. This copy of the state is compared with the
current state and the update to the DB only happens if the copy and the
current state differ. Note that the in memory copy is initialized to
zero values on node startup so that the first periodic update call will
find a miss match between the two states and the DB will be updated.
Based upon experiments it has been determined that a simple comparison
of the entire current vs. the saved state is sufficient. If, in the
future, more rapidly changing data that shouldn't be stored in the DB
is added to the compute node state then 'state_modified' can be easily
changed to ignore such data that isn't needed in the DB.
Dependencies
============
None.
Testing
=======
A unit test will be created to make sure that the DB is updated when the
compute node status changes and is not updated when the status doesn't
change.
Documentation Impact
====================
Section 5 of the Associate Training Guide
http://docs.openstack.org/training-guides/content/associate-computer-node.html
is slightly incorrect and should be fixed. It currently says "All compute
nodes (also known as hosts in terms of OpenStack) periodically publish their
status, resources available and hardware capabilities to nova-scheduler
through the queue." This should be modified to reflect that the status is
updated in the DB which is then queried by the scheduler. (Note this is a
generic fix that is really unrelated to this blueprint.)
References
==========
None.