Convert README to reStructuredText
* Add PyPI validation check for README.rst [1] * Add docutils to test-requirements.txt * Add lower bound for jira [1] https://docs.openstack.org/project-team-guide/project-setup/python.html#running-the-style-checks Change-Id: I5d90ccb1b919c4bab66b468a8ddb714ffc5f1635 Story: 2001980 Task: 20013
This commit is contained in:
parent
086428009c
commit
20d6557744
109
README.md
109
README.md
@ -1,109 +0,0 @@
|
||||
Team and repository tags
|
||||
========================
|
||||
|
||||
[](https://governance.openstack.org/tc/reference/tags/index.html)
|
||||
|
||||
<!-- Change things from this point on -->
|
||||
|
||||
# Notification Engine
|
||||
|
||||
This engine reads alarms from Kafka and then notifies the customer using their configured notification method.
|
||||
Multiple notification and retry engines can run in parallel up to one per available Kafka partition. Zookeeper
|
||||
is used to negotiate access to the Kafka partitions whenever a new process joins or leaves the working set.
|
||||
|
||||
# Architecture
|
||||
The notification engine generates notifications using the following steps:
|
||||
1. Reads Alarms from Kafka, with no auto commit. - KafkaConsumer class
|
||||
2. Determine notification type for an alarm. Done by reading from mysql. - AlarmProcessor class
|
||||
3. Send Notification. - NotificationProcessor class
|
||||
4. Successful notifications are added to a sent notification topic. - NotificationEngine class
|
||||
5. Failed notifications are added to a retry topic. - NotificationEngine class
|
||||
6. Commit offset to Kafka - KafkaConsumer class
|
||||
|
||||
The notification engine uses three Kafka topics:
|
||||
1. alarm_topic: Alarms inbound to the notification engine.
|
||||
2. notification_topic: Successfully sent notifications.
|
||||
3. notification_retry_topic: Unsuccessful notifications.
|
||||
|
||||
A retry engine runs in parallel with the notification engine and gives any
|
||||
failed notification a configurable number of extra chances at succeess.
|
||||
|
||||
The retry engine generates notifications using the following steps:
|
||||
1. Reads Notification json data from Kafka, with no auto commit. - KafkaConsumer class
|
||||
2. Rebuild the notification that failed. - RetryEngine class
|
||||
3. Send Notification. - NotificationProcessor class
|
||||
4. Successful notifictions are added to a sent notification topic. - RetryEngine class
|
||||
5. Failed notifications that have not hit the retry limit are added back to the retry topic. - RetryEngine class
|
||||
6. Failed notifications that have hit the retry limit are discarded. - RetryEngine class
|
||||
6. Commit offset to Kafka - KafkaConsumer class
|
||||
|
||||
The retry engine uses two Kafka topics:
|
||||
1. notification_retry_topic: Notifications that need to be retried.
|
||||
2. notification_topic: Successfully sent notifications.
|
||||
|
||||
## Fault Tolerance
|
||||
When reading from the alarm topic no committing is done. The committing is done only after processing. This allows
|
||||
the processing to continue even though some notifications can be slow. In the event of a catastrophic failure some
|
||||
notifications could be sent but the alarms not yet acknowledged. This is an acceptable failure mode, better to send a
|
||||
notification twice than not at all.
|
||||
|
||||
The general process when a major error is encountered is to exit the daemon which should allow the other processes to
|
||||
renegotiate access to the Kafka partitions. It is also assumed the notification engine will be run by a process
|
||||
supervisor which will restart it in case of a failure. This way any errors which are not easy to recover from are
|
||||
automatically handled by the service restarting and the active daemon switching to another instance.
|
||||
|
||||
Though this should cover all errors there is risk that an alarm or set of alarms can be processed and notifications
|
||||
sent out multiple times. To minimize this risk a number of techniques are used:
|
||||
|
||||
- Timeouts are implemented with all notification types.
|
||||
- An alarm TTL is utilized. Any alarm older than the TTL is not processed.
|
||||
|
||||
# Operation
|
||||
Yaml config file by default is in '/etc/monasca/notification.yaml', a sample is in this project.
|
||||
|
||||
## Monitoring
|
||||
statsd is incorporated into the daemon and will send all stats to statsd server launched by monasca-agent.
|
||||
Default host and port points at **localhost:8125**.
|
||||
|
||||
- Counters
|
||||
- ConsumedFromKafka
|
||||
- AlarmsFailedParse
|
||||
- AlarmsNoNotification
|
||||
- NotificationsCreated
|
||||
- NotificationsSentSMTP
|
||||
- NotificationsSentWebhook
|
||||
- NotificationsSentPagerduty
|
||||
- NotificationsSentFailed
|
||||
- NotificationsInvalidType
|
||||
- AlarmsFinished
|
||||
- PublishedToKafka
|
||||
- Timers
|
||||
- ConfigDBTime
|
||||
- SendNotificationTime
|
||||
|
||||
# Future Considerations
|
||||
- More extensive load testing is needed
|
||||
- How fast is the mysql db? How much load do we put on it. Initially I think it makes most sense to read notification
|
||||
details for each alarm but eventually I may want to cache that info.
|
||||
- How expensive are commits to Kafka for every message we read? Should we commit every N messages?
|
||||
- How efficient is the default Kafka consumer batch size?
|
||||
- Currently we can get ~200 notifications per second per NotificationEngine instance using webhooks to a local
|
||||
http server. Is that fast enough?
|
||||
- Are we putting too much load on Kafka at ~200 commits per second?
|
||||
|
||||
# License
|
||||
|
||||
Copyright (c) 2014 Hewlett-Packard Development Company, L.P.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
139
README.rst
Normal file
139
README.rst
Normal file
@ -0,0 +1,139 @@
|
||||
Team and repository tags
|
||||
========================
|
||||
|
||||
|Team and repository tags|
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<!-- Change things from this point on -->
|
||||
|
||||
Notification Engine
|
||||
===================
|
||||
|
||||
This engine reads alarms from Kafka and then notifies the customer using
|
||||
the configured notification method. Multiple notification and retry
|
||||
engines can run in parallel, up to one per available Kafka partition.
|
||||
Zookeeper is used to negotiate access to the Kafka partitions whenever a
|
||||
new process joins or leaves the working set.
|
||||
|
||||
Architecture
|
||||
============
|
||||
|
||||
The notification engine generates notifications using the following
|
||||
steps:
|
||||
|
||||
1. Read Alarms from Kafka, with no auto commit. -
|
||||
monasca\_common.kafka.KafkaConsumer class
|
||||
2. Determine notification type for an alarm. Done by reading from mysql. - AlarmProcessor class
|
||||
3. Send notification. - NotificationProcessor class
|
||||
4. Add successful notifications to a sent notification topic. - NotificationEngine class
|
||||
5. Add failed notifications to a retry topic. - NotificationEngine class
|
||||
6. Commit offset to Kafka - KafkaConsumer class
|
||||
|
||||
The notification engine uses three Kafka topics:
|
||||
|
||||
1. alarm\_topic: Alarms inbound to the notification engine.
|
||||
2. notification\_topic: Successfully sent notifications.
|
||||
3. notification\_retry\_topic: Failed notifications.
|
||||
|
||||
A retry engine runs in parallel with the notification engine and gives
|
||||
any failed notification a configurable number of extra chances at
|
||||
success.
|
||||
|
||||
The retry engine generates notifications using the following steps:
|
||||
|
||||
1. Read notification json data from Kafka, with no auto commit. - KafkaConsumer class
|
||||
2. Rebuild the notification that failed. - RetryEngine class
|
||||
3. Send notification. - NotificationProcessor class
|
||||
4. Add successful notifications to a sent notification topic. - RetryEngine class
|
||||
5. Add failed notifications that have not hit the retry limit back to the retry topic. -
|
||||
RetryEngine class
|
||||
6. Discard failed notifications that have hit the retry limit. - RetryEngine class
|
||||
7. Commit offset to Kafka. - KafkaConsumer class
|
||||
|
||||
The retry engine uses two Kafka topics:
|
||||
|
||||
1. notification\_retry\_topic: Notifications that need to be retried.
|
||||
2. notification\_topic: Successfully sent notifications.
|
||||
|
||||
Fault Tolerance
|
||||
---------------
|
||||
|
||||
When reading from the alarm topic, no committing is done. The committing
|
||||
is done only after processing. This allows the processing to continue
|
||||
even though some notifications can be slow. In the event of a
|
||||
catastrophic failure some notifications could be sent but the alarms
|
||||
have not yet been acknowledged. This is an acceptable failure mode,
|
||||
better to send a notification twice than not at all.
|
||||
|
||||
The general process when a major error is encountered is to exit the
|
||||
daemon which should allow the other processes to renegotiate access to
|
||||
the Kafka partitions. It is also assumed that the notification engine
|
||||
will be run by a process supervisor which will restart it in case of a
|
||||
failure. In this way, any errors which are not easy to recover from are
|
||||
automatically handled by the service restarting and the active daemon
|
||||
switching to another instance.
|
||||
|
||||
Though this should cover all errors, there is the risk that an alarm or
|
||||
a set of alarms can be processed and notifications are sent out multiple
|
||||
times. To minimize this risk a number of techniques are used:
|
||||
|
||||
- Timeouts are implemented for all notification types.
|
||||
- An alarm TTL is utilized. Any alarm older than the TTL is not
|
||||
processed.
|
||||
|
||||
Operation
|
||||
=========
|
||||
|
||||
``oslo.config`` is used for handling configuration options. A sample
|
||||
configuration file ``etc/monasca/notification.conf.sample`` can be
|
||||
generated by running:
|
||||
|
||||
::
|
||||
|
||||
tox -e genconfig
|
||||
|
||||
Monitoring
|
||||
----------
|
||||
|
||||
StatsD is incorporated into the daemon and will send all stats to the
|
||||
StatsD server launched by monasca-agent. Default host and port points to
|
||||
**localhost:8125**.
|
||||
|
||||
- Counters
|
||||
|
||||
- ConsumedFromKafka
|
||||
- AlarmsFailedParse
|
||||
- AlarmsNoNotification
|
||||
- NotificationsCreated
|
||||
- NotificationsSentSMTP
|
||||
- NotificationsSentWebhook
|
||||
- NotificationsSentPagerduty
|
||||
- NotificationsSentFailed
|
||||
- NotificationsInvalidType
|
||||
- AlarmsFinished
|
||||
- PublishedToKafka
|
||||
|
||||
- Timers
|
||||
|
||||
- ConfigDBTime
|
||||
- SendNotificationTime
|
||||
|
||||
Future Considerations
|
||||
=====================
|
||||
|
||||
- More extensive load testing is needed:
|
||||
|
||||
- How fast is the mysql db? How much load do we put on it. Initially I
|
||||
think it makes most sense to read notification details for each alarm
|
||||
but eventually I may want to cache that info.
|
||||
- How expensive are commits to Kafka for every message we read? Should
|
||||
we commit every N messages?
|
||||
- How efficient is the default Kafka consumer batch size?
|
||||
- Currently we can get ~200 notifications per second per
|
||||
NotificationEngine instance using webhooks to a local http server. Is
|
||||
that fast enough?
|
||||
- Are we putting too much load on Kafka at ~200 commits per second?
|
||||
|
||||
.. |Team and repository tags| image:: https://governance.openstack.org/tc/badges/monasca-notification.svg
|
||||
:target: https://governance.openstack.org/tc/reference/tags/index.html
|
@ -4,6 +4,7 @@ bandit==1.4.0
|
||||
configparser==3.5.0
|
||||
coverage==4.0
|
||||
debtcollector==1.2.0
|
||||
docutils==0.11
|
||||
extras==1.0.0
|
||||
fixtures==3.0.0
|
||||
flake8==2.5.5
|
||||
|
@ -8,7 +8,7 @@ classifier=
|
||||
License :: OSI Approved :: Apache Software License
|
||||
Topic :: System :: Monitoring
|
||||
keywords = openstack monitoring email
|
||||
description-file = README.md
|
||||
description-file = README.rst
|
||||
home-page = https://github.com/stackforge/monasca-notification
|
||||
license = Apache
|
||||
|
||||
@ -35,5 +35,5 @@ universal = 1
|
||||
|
||||
[extras]
|
||||
jira_plugin =
|
||||
jira
|
||||
jira>=1.0.3
|
||||
Jinja2>=2.10 # BSD License (3 clause)
|
||||
|
@ -15,3 +15,4 @@ testrepository>=0.0.18 # Apache-2.0/BSD
|
||||
SQLAlchemy!=1.1.5,!=1.1.6,!=1.1.7,!=1.1.8,>=1.0.10 # MIT
|
||||
PyMySQL>=0.7.6 # MIT License
|
||||
psycopg2>=2.6.2 # LGPL/ZPL
|
||||
docutils>=0.11 # OSI-Approved Open Source, Public Domain
|
||||
|
1
tox.ini
1
tox.ini
@ -43,6 +43,7 @@ basepython = python3
|
||||
commands =
|
||||
{[testenv:flake8]commands}
|
||||
{[testenv:bandit]commands}
|
||||
python setup.py check --restructuredtext --strict
|
||||
|
||||
[testenv:venv]
|
||||
basepython = python3
|
||||
|
Loading…
x
Reference in New Issue
Block a user