Fix loading of notification plugins
Inconsistent naming of plugins can cause an import error. This commit standardises the naming so that all imports are parsed correctly. Story: 2005545 Task: 30689 Change-Id: Ife27fed83d28d47cc99ee07c4a8c0c4dac32c2da
|2 weeks ago|
|config-generator||1 year ago|
|doc||3 days ago|
|docker||3 days ago|
|etc/monasca||4 months ago|
|monasca_notification||3 days ago|
|playbooks||1 month ago|
|releasenotes||1 month ago|
|tests||3 months ago|
|.coveragerc||2 years ago|
|.gitignore||1 year ago|
|.gitreview||4 weeks ago|
|.stestr.conf||1 year ago|
|.testr.conf||2 years ago|
|.zuul.yaml||1 month ago|
|HACKING.rst||4 years ago|
|LICENSE||5 years ago|
|README.rst||4 months ago|
|lower-constraints.txt||3 days ago|
|requirements.txt||1 month ago|
|setup.cfg||1 month ago|
|setup.py||2 years ago|
|test-requirements.txt||3 days ago|
|tox.ini||1 month ago|
This engine reads alarms from Kafka and then notifies the customer using the configured notification method. Multiple notification and retry engines can run in parallel, up to one per available Kafka partition. Zookeeper is used to negotiate access to the Kafka partitions whenever a new process joins or leaves the working set.
The notification engine generates notifications using the following steps:
The notification engine uses three Kafka topics:
A retry engine runs in parallel with the notification engine and gives any failed notification a configurable number of extra chances at success.
The retry engine generates notifications using the following steps:
The retry engine uses two Kafka topics:
When reading from the alarm topic, no committing is done. The committing is done only after processing. This allows the processing to continue even though some notifications can be slow. In the event of a catastrophic failure some notifications could be sent but the alarms have not yet been acknowledged. This is an acceptable failure mode, better to send a notification twice than not at all.
The general process when a major error is encountered is to exit the daemon which should allow the other processes to renegotiate access to the Kafka partitions. It is also assumed that the notification engine will be run by a process supervisor which will restart it in case of a failure. In this way, any errors which are not easy to recover from are automatically handled by the service restarting and the active daemon switching to another instance.
Though this should cover all errors, there is the risk that an alarm or a set of alarms can be processed and notifications are sent out multiple times. To minimize this risk a number of techniques are used:
oslo.config is used for handling configuration options. A sample configuration file
etc/monasca/notification.conf.sample can be generated by running:
tox -e genconfig
To run the service using the default config file location of `/etc/monasca/notification.conf`:
To run the service and explicitly specify the config file:
monasca-notification --config-file /etc/monasca/monasca-notification.conf
StatsD is incorporated into the daemon and will send all stats to the StatsD server launched by monasca-agent. Default host and port points to localhost:8125.
More extensive load testing is needed:
- How fast is the mysql db? How much load do we put on it. Initially I think it makes most sense to read notification details for each alarm but eventually I may want to cache that info.
- How expensive are commits to Kafka for every message we read? Should we commit every N messages?
- How efficient is the default Kafka consumer batch size?
- Currently we can get ~200 notifications per second per NotificationEngine instance using webhooks to a local http server. Is that fast enough?
- Are we putting too much load on Kafka at ~200 commits per second?