cleanup event-alarm timeout spec
Change-Id: I81051249e5759f23b76515f36dc21e9071c4c86f
This commit is contained in:
parent
1df158ea36
commit
0caae9fda4
@ -8,19 +8,20 @@
|
|||||||
Event Alarm Timeout
|
Event Alarm Timeout
|
||||||
===================
|
===================
|
||||||
|
|
||||||
https://blueprints.launchpad.net/ceilometer/+spec/event-alarm-timeout
|
https://blueprints.launchpad.net/aodh/+spec/event-alarm-timeout
|
||||||
|
|
||||||
This BP adds timeout mechanism for event-alarm. End user can specify a timeout,
|
This BP adds timeout mechanism for event-alarm. End users can specify a
|
||||||
0 (no timeout) by default, for each event-alarm, and alarm status becomes
|
timeout, 0 (no timeout) by default, for each event-alarm. The alarm status
|
||||||
'TIMEOUT' after timeout without receiving desired event.
|
becomes 'TIMEOUT' after timeout reached without receiving desired event.
|
||||||
|
|
||||||
|
|
||||||
Problem description
|
Problem description
|
||||||
===================
|
===================
|
||||||
|
|
||||||
After event-alarm introduced in Liberty, end user or operator could set alarm
|
After event-alarm were introduced in Liberty, end users or operators could set
|
||||||
for desired event and get alarmed when receive them. But in some circumstances,
|
alarm for desired event and get alarmed when it receive them. But in some
|
||||||
operator want otherwise: know when not receive desired event.
|
circumstances, operator want to know otherwise: when desired event is not
|
||||||
|
received.
|
||||||
|
|
||||||
For example, "compute.instance.create.end" is the final event sent to message
|
For example, "compute.instance.create.end" is the final event sent to message
|
||||||
bus to indicate success of instance creation. Not receiving it after a long
|
bus to indicate success of instance creation. Not receiving it after a long
|
||||||
@ -31,9 +32,9 @@ Unfortunately, current event-alarm doesn't support it.
|
|||||||
Proposed change
|
Proposed change
|
||||||
===============
|
===============
|
||||||
|
|
||||||
When creating event-alarm, adds a new parameter 'timeout' to define a time
|
When creating event-alarm, a new parameter 'timeout' is proposed to define a
|
||||||
length, so that alarm only gets fired when receiving desired event in such
|
expiry time length, so that alarm gets fired when desired event is not received
|
||||||
length. Otherwise, alarm status becomes 'TIMEOUT'.
|
in expected time. Otherwise, alarm status becomes 'TIMEOUT'.
|
||||||
|
|
||||||
Currently, 3 states are supported in alarm: 'UNKNOWN', 'ALARM' and 'OK', so a
|
Currently, 3 states are supported in alarm: 'UNKNOWN', 'ALARM' and 'OK', so a
|
||||||
new state 'TIMEOUT' will be added to reflect timeout situation.
|
new state 'TIMEOUT' will be added to reflect timeout situation.
|
||||||
@ -44,8 +45,8 @@ evaluator asks its timeout thread/process to handle timeout request. In this
|
|||||||
way, avoid new process in AODH api and make all alarm handling jobs inside
|
way, avoid new process in AODH api and make all alarm handling jobs inside
|
||||||
evaluator.
|
evaluator.
|
||||||
|
|
||||||
Synchronization handling is critical in evaluator, as both evaluator original
|
Synchronization handling is critical in evaluator, as both evaluators original
|
||||||
process and timeout process need change status for same alarm. To avoid
|
process and timeout process can change status for same alarm. To avoid
|
||||||
complicated lock, timeout process just send a 'alarm.timeout.end' event with
|
complicated lock, timeout process just send a 'alarm.timeout.end' event with
|
||||||
related alarm/project id to 'alarm.all' topic, where evaluator original process
|
related alarm/project id to 'alarm.all' topic, where evaluator original process
|
||||||
handle it along with desired event.
|
handle it along with desired event.
|
||||||
@ -62,10 +63,6 @@ things:
|
|||||||
* sends out 'alarm.timeout.end' event
|
* sends out 'alarm.timeout.end' event
|
||||||
* pick up nearest timeout request and start sleeping for it
|
* pick up nearest timeout request and start sleeping for it
|
||||||
|
|
||||||
In future, we need timeout thread disaster-recovery capability, that is, no loss
|
|
||||||
of timeout info when evaluator crash. Need store pending timeout requests in
|
|
||||||
DB, and feed evaluator when restarting.
|
|
||||||
|
|
||||||
The final alarm status depends on the order of events. If 'timeout.end' event
|
The final alarm status depends on the order of events. If 'timeout.end' event
|
||||||
comes first, alarm status becomes 'TIMEOUT' and following desired event is
|
comes first, alarm status becomes 'TIMEOUT' and following desired event is
|
||||||
ignored. Otherwise, alarm status becomes 'ALARM' and following 'timeout.end'
|
ignored. Otherwise, alarm status becomes 'ALARM' and following 'timeout.end'
|
||||||
@ -231,6 +228,10 @@ Future lifecycle
|
|||||||
|
|
||||||
To be maintained by edwin-zhai for bug fixing and enhancement.
|
To be maintained by edwin-zhai for bug fixing and enhancement.
|
||||||
|
|
||||||
|
In future, we need timeout thread disaster-recovery capability, that is, no loss
|
||||||
|
of timeout info when evaluator crash. Need store pending timeout requests in
|
||||||
|
DB, and feed evaluator when restarting.
|
||||||
|
|
||||||
|
|
||||||
Dependencies
|
Dependencies
|
||||||
============
|
============
|
||||||
|
Loading…
x
Reference in New Issue
Block a user