Add NOTIFY throttling and Mitaka directory

Minor typos fix

related-bug: #1436210
related-bug: #1498462
Change-Id: If15594099eb7cf74f1c534a05884c2d2501e57e6
This commit is contained in:
Federico Ceratto 2015-11-17 16:53:30 +00:00
parent 2da64ceb36
commit 3e3439c5c8
3 changed files with 140 additions and 3 deletions

View File

@ -28,6 +28,14 @@ Liberty approved specs:
specs/liberty/*
Mitaka approved specs:
.. toctree::
:glob:
:maxdepth: 1
specs/mitaka/*
==================
Indices and tables

View File

@ -14,7 +14,7 @@ database.
Problem description
===================
Once deleted, domains are not removed immediatly from the database, mostly for
Once deleted, domains are not removed immediately from the database, mostly for
billing reasons. They are flagged as deleted in the "deleted" database column
and the "deleted_at" column is populated with a timestamp.
@ -45,7 +45,7 @@ plugin. The task will select a group of domains and send a RPC call to Central.
Central will run a query against the database to purge any deleted domain if
needed and log the number of purged domains.
Configuration paramenters:
Configuration parameters:
Purging run frequency.
Default: hourly. Users might want to run it frequently to minimize the cycle duration.
@ -122,7 +122,7 @@ Milestones
----------
Target Milestone for completion:
Libery-3
Liberty-3
Work Items
----------

View File

@ -0,0 +1,129 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
http://creativecommons.org/licenses/by/3.0/legalcode
=============================
Bulk zone update throttling
=============================
https://blueprints.launchpad.net/designate/+spec/notify-throttling
Implement a mechanism to throttle the delivery of NOTIFY transactions when
a large number of zones are updated at the same time.
Problem description
===================
If a large number of zones are updated in a short time this will generate a
consequently large amount NOTIFY transaction to be sent to the nameservers
with no delay leading to a burst of incoming AXFR requests.
This might impact on bottlenecks in MiniDNS and the storage layer in terms of
CPU, I/O or network bandwidth.
A typical trigger is the update of an NS record in a Pool containing many zones.
The autonomous refreshing of zones performed by resolvers can also trigger a
similar burst of AXFR. This can happen on recently started resolvers, where the
refresh timers can share the same values across many zones.
Related to bug https://bugs.launchpad.net/designate/+bug/1498462
Proposed change
===============
Implement a mechanism for enqueuing and delayed delivery of notify transactions
at a configurable throttle speed.
Also, implement staggering of zone refresh requests by randomizing the refresh
interval.
API Changes
-----------
Expose the count of zones flagged for delayed notify in the Admin
API as "/reports/counts/zones_pending_notify".
Central Changes
---------------
Implement support for a new database column "pending_notify" and set it to
True every time a Pool NS record is updated.
Storage Changes
---------------
Add an new boolean database column "pending_notify" on Zones.
Implement a migration script to add the column to existing databases,
defaulting to False. In future, the column might default to True.
Other Changes
-------------
Implement a Task in Zone Manager to periodically fetch a set of zones that need
to receive a Notify starting with the oldest in term of last update time.
The task frequency and the maximum set size can be configured to throttle the
amount of outgoing Notify.
Zone Manager will reset the "pending_notify" flag once done.
Alternatives
------------
N/A
Implementation
==============
The throttling queue is implemented as a new database column containing a
boolean flag. See Central Changes and Storage Changes.
Also, new zones will be created with an uniformly random refresh time between a minimum and a maximum value.
Design considerations
---------------------
The throttling queue could be implemented outside of the database:
- No need to create an extra database column
- No increased database I/O
We propose using the database for the following reasons:
- Zone Manager is the best candidate to handle the delayed Notify. Currently there are no ways for Central to send a list of Zones to Zone Manager other than through the database
- The queue can support delayed Notify for changes other than Pool NS record updates
- Ability to monitor the queue size and ETA to inform the user and for debugging
- A persistent queue can survive Zone Manager unhandled exceptions or restarts
- The increased database load is negligible compared to the existing traffic
Risk analysis
-------------
- Zone Manager fails to run the Notify delivery task. The nameservers will eventually refresh the zone anyways. Impact: slow update propagation. Mitigation: expose the notification queue length to the user through Admin API and by logging.
- A big notification queue takes a considerable time to be handled. Impact: potentially prevents more urgent changes to be delivered quickly. Mitigation: encourage users to configure the throttling parameters; Provide sensible default values. Implementing a concept of notification priority seems unnecessary.
Assignee(s)
-----------
Primary assignee:
Federico Ceratto https://launchpad.net/~federico-ceratto
Milestones
----------
Target Milestone for completion:
Liberty-3
Work Items
----------
- Implement refresh time staggering
- Implement Notify throttling
- Add throttle parameters to configuration files
- Document throttling mechanism
- Write unit and functional tests
- Test throttling and staggering on devstack
Dependencies
============
N/A