Convert README to reStructuredText
* Add PyPI validation check for README.rst [1] * Add docutils to test-requirements.txt * Add lower bound for jira [1] https://docs.openstack.org/project-team-guide/project-setup/python.html#running-the-style-checks Change-Id: I5d90ccb1b919c4bab66b468a8ddb714ffc5f1635 Story: 2001980 Task: 20013
This commit is contained in:
parent
086428009c
commit
20d6557744
109
README.md
109
README.md
@ -1,109 +0,0 @@
|
|||||||
Team and repository tags
|
|
||||||
========================
|
|
||||||
|
|
||||||
[](https://governance.openstack.org/tc/reference/tags/index.html)
|
|
||||||
|
|
||||||
<!-- Change things from this point on -->
|
|
||||||
|
|
||||||
# Notification Engine
|
|
||||||
|
|
||||||
This engine reads alarms from Kafka and then notifies the customer using their configured notification method.
|
|
||||||
Multiple notification and retry engines can run in parallel up to one per available Kafka partition. Zookeeper
|
|
||||||
is used to negotiate access to the Kafka partitions whenever a new process joins or leaves the working set.
|
|
||||||
|
|
||||||
# Architecture
|
|
||||||
The notification engine generates notifications using the following steps:
|
|
||||||
1. Reads Alarms from Kafka, with no auto commit. - KafkaConsumer class
|
|
||||||
2. Determine notification type for an alarm. Done by reading from mysql. - AlarmProcessor class
|
|
||||||
3. Send Notification. - NotificationProcessor class
|
|
||||||
4. Successful notifications are added to a sent notification topic. - NotificationEngine class
|
|
||||||
5. Failed notifications are added to a retry topic. - NotificationEngine class
|
|
||||||
6. Commit offset to Kafka - KafkaConsumer class
|
|
||||||
|
|
||||||
The notification engine uses three Kafka topics:
|
|
||||||
1. alarm_topic: Alarms inbound to the notification engine.
|
|
||||||
2. notification_topic: Successfully sent notifications.
|
|
||||||
3. notification_retry_topic: Unsuccessful notifications.
|
|
||||||
|
|
||||||
A retry engine runs in parallel with the notification engine and gives any
|
|
||||||
failed notification a configurable number of extra chances at succeess.
|
|
||||||
|
|
||||||
The retry engine generates notifications using the following steps:
|
|
||||||
1. Reads Notification json data from Kafka, with no auto commit. - KafkaConsumer class
|
|
||||||
2. Rebuild the notification that failed. - RetryEngine class
|
|
||||||
3. Send Notification. - NotificationProcessor class
|
|
||||||
4. Successful notifictions are added to a sent notification topic. - RetryEngine class
|
|
||||||
5. Failed notifications that have not hit the retry limit are added back to the retry topic. - RetryEngine class
|
|
||||||
6. Failed notifications that have hit the retry limit are discarded. - RetryEngine class
|
|
||||||
6. Commit offset to Kafka - KafkaConsumer class
|
|
||||||
|
|
||||||
The retry engine uses two Kafka topics:
|
|
||||||
1. notification_retry_topic: Notifications that need to be retried.
|
|
||||||
2. notification_topic: Successfully sent notifications.
|
|
||||||
|
|
||||||
## Fault Tolerance
|
|
||||||
When reading from the alarm topic no committing is done. The committing is done only after processing. This allows
|
|
||||||
the processing to continue even though some notifications can be slow. In the event of a catastrophic failure some
|
|
||||||
notifications could be sent but the alarms not yet acknowledged. This is an acceptable failure mode, better to send a
|
|
||||||
notification twice than not at all.
|
|
||||||
|
|
||||||
The general process when a major error is encountered is to exit the daemon which should allow the other processes to
|
|
||||||
renegotiate access to the Kafka partitions. It is also assumed the notification engine will be run by a process
|
|
||||||
supervisor which will restart it in case of a failure. This way any errors which are not easy to recover from are
|
|
||||||
automatically handled by the service restarting and the active daemon switching to another instance.
|
|
||||||
|
|
||||||
Though this should cover all errors there is risk that an alarm or set of alarms can be processed and notifications
|
|
||||||
sent out multiple times. To minimize this risk a number of techniques are used:
|
|
||||||
|
|
||||||
- Timeouts are implemented with all notification types.
|
|
||||||
- An alarm TTL is utilized. Any alarm older than the TTL is not processed.
|
|
||||||
|
|
||||||
# Operation
|
|
||||||
Yaml config file by default is in '/etc/monasca/notification.yaml', a sample is in this project.
|
|
||||||
|
|
||||||
## Monitoring
|
|
||||||
statsd is incorporated into the daemon and will send all stats to statsd server launched by monasca-agent.
|
|
||||||
Default host and port points at **localhost:8125**.
|
|
||||||
|
|
||||||
- Counters
|
|
||||||
- ConsumedFromKafka
|
|
||||||
- AlarmsFailedParse
|
|
||||||
- AlarmsNoNotification
|
|
||||||
- NotificationsCreated
|
|
||||||
- NotificationsSentSMTP
|
|
||||||
- NotificationsSentWebhook
|
|
||||||
- NotificationsSentPagerduty
|
|
||||||
- NotificationsSentFailed
|
|
||||||
- NotificationsInvalidType
|
|
||||||
- AlarmsFinished
|
|
||||||
- PublishedToKafka
|
|
||||||
- Timers
|
|
||||||
- ConfigDBTime
|
|
||||||
- SendNotificationTime
|
|
||||||
|
|
||||||
# Future Considerations
|
|
||||||
- More extensive load testing is needed
|
|
||||||
- How fast is the mysql db? How much load do we put on it. Initially I think it makes most sense to read notification
|
|
||||||
details for each alarm but eventually I may want to cache that info.
|
|
||||||
- How expensive are commits to Kafka for every message we read? Should we commit every N messages?
|
|
||||||
- How efficient is the default Kafka consumer batch size?
|
|
||||||
- Currently we can get ~200 notifications per second per NotificationEngine instance using webhooks to a local
|
|
||||||
http server. Is that fast enough?
|
|
||||||
- Are we putting too much load on Kafka at ~200 commits per second?
|
|
||||||
|
|
||||||
# License
|
|
||||||
|
|
||||||
Copyright (c) 2014 Hewlett-Packard Development Company, L.P.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
you may not use this file except in compliance with the License.
|
|
||||||
You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software
|
|
||||||
distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
|
||||||
implied.
|
|
||||||
See the License for the specific language governing permissions and
|
|
||||||
limitations under the License.
|
|
139
README.rst
Normal file
139
README.rst
Normal file
@ -0,0 +1,139 @@
|
|||||||
|
Team and repository tags
|
||||||
|
========================
|
||||||
|
|
||||||
|
|Team and repository tags|
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
<!-- Change things from this point on -->
|
||||||
|
|
||||||
|
Notification Engine
|
||||||
|
===================
|
||||||
|
|
||||||
|
This engine reads alarms from Kafka and then notifies the customer using
|
||||||
|
the configured notification method. Multiple notification and retry
|
||||||
|
engines can run in parallel, up to one per available Kafka partition.
|
||||||
|
Zookeeper is used to negotiate access to the Kafka partitions whenever a
|
||||||
|
new process joins or leaves the working set.
|
||||||
|
|
||||||
|
Architecture
|
||||||
|
============
|
||||||
|
|
||||||
|
The notification engine generates notifications using the following
|
||||||
|
steps:
|
||||||
|
|
||||||
|
1. Read Alarms from Kafka, with no auto commit. -
|
||||||
|
monasca\_common.kafka.KafkaConsumer class
|
||||||
|
2. Determine notification type for an alarm. Done by reading from mysql. - AlarmProcessor class
|
||||||
|
3. Send notification. - NotificationProcessor class
|
||||||
|
4. Add successful notifications to a sent notification topic. - NotificationEngine class
|
||||||
|
5. Add failed notifications to a retry topic. - NotificationEngine class
|
||||||
|
6. Commit offset to Kafka - KafkaConsumer class
|
||||||
|
|
||||||
|
The notification engine uses three Kafka topics:
|
||||||
|
|
||||||
|
1. alarm\_topic: Alarms inbound to the notification engine.
|
||||||
|
2. notification\_topic: Successfully sent notifications.
|
||||||
|
3. notification\_retry\_topic: Failed notifications.
|
||||||
|
|
||||||
|
A retry engine runs in parallel with the notification engine and gives
|
||||||
|
any failed notification a configurable number of extra chances at
|
||||||
|
success.
|
||||||
|
|
||||||
|
The retry engine generates notifications using the following steps:
|
||||||
|
|
||||||
|
1. Read notification json data from Kafka, with no auto commit. - KafkaConsumer class
|
||||||
|
2. Rebuild the notification that failed. - RetryEngine class
|
||||||
|
3. Send notification. - NotificationProcessor class
|
||||||
|
4. Add successful notifications to a sent notification topic. - RetryEngine class
|
||||||
|
5. Add failed notifications that have not hit the retry limit back to the retry topic. -
|
||||||
|
RetryEngine class
|
||||||
|
6. Discard failed notifications that have hit the retry limit. - RetryEngine class
|
||||||
|
7. Commit offset to Kafka. - KafkaConsumer class
|
||||||
|
|
||||||
|
The retry engine uses two Kafka topics:
|
||||||
|
|
||||||
|
1. notification\_retry\_topic: Notifications that need to be retried.
|
||||||
|
2. notification\_topic: Successfully sent notifications.
|
||||||
|
|
||||||
|
Fault Tolerance
|
||||||
|
---------------
|
||||||
|
|
||||||
|
When reading from the alarm topic, no committing is done. The committing
|
||||||
|
is done only after processing. This allows the processing to continue
|
||||||
|
even though some notifications can be slow. In the event of a
|
||||||
|
catastrophic failure some notifications could be sent but the alarms
|
||||||
|
have not yet been acknowledged. This is an acceptable failure mode,
|
||||||
|
better to send a notification twice than not at all.
|
||||||
|
|
||||||
|
The general process when a major error is encountered is to exit the
|
||||||
|
daemon which should allow the other processes to renegotiate access to
|
||||||
|
the Kafka partitions. It is also assumed that the notification engine
|
||||||
|
will be run by a process supervisor which will restart it in case of a
|
||||||
|
failure. In this way, any errors which are not easy to recover from are
|
||||||
|
automatically handled by the service restarting and the active daemon
|
||||||
|
switching to another instance.
|
||||||
|
|
||||||
|
Though this should cover all errors, there is the risk that an alarm or
|
||||||
|
a set of alarms can be processed and notifications are sent out multiple
|
||||||
|
times. To minimize this risk a number of techniques are used:
|
||||||
|
|
||||||
|
- Timeouts are implemented for all notification types.
|
||||||
|
- An alarm TTL is utilized. Any alarm older than the TTL is not
|
||||||
|
processed.
|
||||||
|
|
||||||
|
Operation
|
||||||
|
=========
|
||||||
|
|
||||||
|
``oslo.config`` is used for handling configuration options. A sample
|
||||||
|
configuration file ``etc/monasca/notification.conf.sample`` can be
|
||||||
|
generated by running:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
tox -e genconfig
|
||||||
|
|
||||||
|
Monitoring
|
||||||
|
----------
|
||||||
|
|
||||||
|
StatsD is incorporated into the daemon and will send all stats to the
|
||||||
|
StatsD server launched by monasca-agent. Default host and port points to
|
||||||
|
**localhost:8125**.
|
||||||
|
|
||||||
|
- Counters
|
||||||
|
|
||||||
|
- ConsumedFromKafka
|
||||||
|
- AlarmsFailedParse
|
||||||
|
- AlarmsNoNotification
|
||||||
|
- NotificationsCreated
|
||||||
|
- NotificationsSentSMTP
|
||||||
|
- NotificationsSentWebhook
|
||||||
|
- NotificationsSentPagerduty
|
||||||
|
- NotificationsSentFailed
|
||||||
|
- NotificationsInvalidType
|
||||||
|
- AlarmsFinished
|
||||||
|
- PublishedToKafka
|
||||||
|
|
||||||
|
- Timers
|
||||||
|
|
||||||
|
- ConfigDBTime
|
||||||
|
- SendNotificationTime
|
||||||
|
|
||||||
|
Future Considerations
|
||||||
|
=====================
|
||||||
|
|
||||||
|
- More extensive load testing is needed:
|
||||||
|
|
||||||
|
- How fast is the mysql db? How much load do we put on it. Initially I
|
||||||
|
think it makes most sense to read notification details for each alarm
|
||||||
|
but eventually I may want to cache that info.
|
||||||
|
- How expensive are commits to Kafka for every message we read? Should
|
||||||
|
we commit every N messages?
|
||||||
|
- How efficient is the default Kafka consumer batch size?
|
||||||
|
- Currently we can get ~200 notifications per second per
|
||||||
|
NotificationEngine instance using webhooks to a local http server. Is
|
||||||
|
that fast enough?
|
||||||
|
- Are we putting too much load on Kafka at ~200 commits per second?
|
||||||
|
|
||||||
|
.. |Team and repository tags| image:: https://governance.openstack.org/tc/badges/monasca-notification.svg
|
||||||
|
:target: https://governance.openstack.org/tc/reference/tags/index.html
|
@ -4,6 +4,7 @@ bandit==1.4.0
|
|||||||
configparser==3.5.0
|
configparser==3.5.0
|
||||||
coverage==4.0
|
coverage==4.0
|
||||||
debtcollector==1.2.0
|
debtcollector==1.2.0
|
||||||
|
docutils==0.11
|
||||||
extras==1.0.0
|
extras==1.0.0
|
||||||
fixtures==3.0.0
|
fixtures==3.0.0
|
||||||
flake8==2.5.5
|
flake8==2.5.5
|
||||||
|
@ -8,7 +8,7 @@ classifier=
|
|||||||
License :: OSI Approved :: Apache Software License
|
License :: OSI Approved :: Apache Software License
|
||||||
Topic :: System :: Monitoring
|
Topic :: System :: Monitoring
|
||||||
keywords = openstack monitoring email
|
keywords = openstack monitoring email
|
||||||
description-file = README.md
|
description-file = README.rst
|
||||||
home-page = https://github.com/stackforge/monasca-notification
|
home-page = https://github.com/stackforge/monasca-notification
|
||||||
license = Apache
|
license = Apache
|
||||||
|
|
||||||
@ -35,5 +35,5 @@ universal = 1
|
|||||||
|
|
||||||
[extras]
|
[extras]
|
||||||
jira_plugin =
|
jira_plugin =
|
||||||
jira
|
jira>=1.0.3
|
||||||
Jinja2>=2.10 # BSD License (3 clause)
|
Jinja2>=2.10 # BSD License (3 clause)
|
||||||
|
@ -15,3 +15,4 @@ testrepository>=0.0.18 # Apache-2.0/BSD
|
|||||||
SQLAlchemy!=1.1.5,!=1.1.6,!=1.1.7,!=1.1.8,>=1.0.10 # MIT
|
SQLAlchemy!=1.1.5,!=1.1.6,!=1.1.7,!=1.1.8,>=1.0.10 # MIT
|
||||||
PyMySQL>=0.7.6 # MIT License
|
PyMySQL>=0.7.6 # MIT License
|
||||||
psycopg2>=2.6.2 # LGPL/ZPL
|
psycopg2>=2.6.2 # LGPL/ZPL
|
||||||
|
docutils>=0.11 # OSI-Approved Open Source, Public Domain
|
||||||
|
1
tox.ini
1
tox.ini
@ -43,6 +43,7 @@ basepython = python3
|
|||||||
commands =
|
commands =
|
||||||
{[testenv:flake8]commands}
|
{[testenv:flake8]commands}
|
||||||
{[testenv:bandit]commands}
|
{[testenv:bandit]commands}
|
||||||
|
python setup.py check --restructuredtext --strict
|
||||||
|
|
||||||
[testenv:venv]
|
[testenv:venv]
|
||||||
basepython = python3
|
basepython = python3
|
||||||
|
Loading…
x
Reference in New Issue
Block a user