Add NOTIFY throttling and Mitaka directory

Minor typos fix related-bug: #1436210 related-bug: #1498462 Change-Id: If15594099eb7cf74f1c534a05884c2d2501e57e6
2015-11-17 16:53:30 +00:00 · 2015-11-17 16:53:30 +00:00 · 3e3439c5c8
parent 2da64ceb36
commit 3e3439c5c8
3 changed files with 140 additions and 3 deletions
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@ -28,6 +28,14 @@ Liberty approved specs:

   specs/liberty/*

+Mitaka approved specs:
+
+.. toctree::
+   :glob:
+   :maxdepth: 1
+
+   specs/mitaka/*
+

 ==================
 Indices and tables
--- a/specs/liberty/deleted-domains-purging.rst
+++ b/specs/liberty/deleted-domains-purging.rst
@ -14,7 +14,7 @@ database.
 Problem description
 ===================

-Once deleted, domains are not removed immediatly from the database, mostly for
+Once deleted, domains are not removed immediately from the database, mostly for
 billing reasons. They are flagged as deleted in the "deleted" database column
 and the "deleted_at" column is populated with a timestamp.

@ -45,7 +45,7 @@ plugin. The task will select a group of domains and send a RPC call to Central.
 Central will run a query against the database to purge any deleted domain if
 needed and log the number of purged domains.

-Configuration paramenters:
+Configuration parameters:

 Purging run frequency.
  Default: hourly. Users might want to run it frequently to minimize the cycle duration.
@ -122,7 +122,7 @@ Milestones
 ----------

 Target Milestone for completion:
-  Libery-3
+  Liberty-3

 Work Items
 ----------
--- a/specs/mitaka/notify-throttling.rst
+++ b/specs/mitaka/notify-throttling.rst
@ -0,0 +1,129 @@
+..
+
+This work is licensed under a Creative Commons Attribution 3.0 Unported License.
+http://creativecommons.org/licenses/by/3.0/legalcode
+
+=============================
+Bulk zone update throttling
+=============================
+
+https://blueprints.launchpad.net/designate/+spec/notify-throttling
+
+Implement a mechanism to throttle the delivery of NOTIFY transactions when
+a large number of zones are updated at the same time.
+
+
+Problem description
+===================
+
+If a large number of zones are updated in a short time this will generate a
+consequently large amount NOTIFY transaction to be sent to the nameservers
+with no delay leading to a burst of incoming AXFR requests.
+This might impact on bottlenecks in MiniDNS and the storage layer in terms of
+CPU, I/O or network bandwidth.
+
+A typical trigger is the update of an NS record in a Pool containing many zones.
+
+The autonomous refreshing of zones performed by resolvers can also trigger a
+similar burst of AXFR. This can happen on recently started resolvers, where the
+refresh timers can share the same values across many zones.
+
+Related to bug https://bugs.launchpad.net/designate/+bug/1498462
+
+Proposed change
+===============
+
+Implement a mechanism for enqueuing and delayed delivery of notify transactions
+at a configurable throttle speed.
+
+Also, implement staggering of zone refresh requests by randomizing the refresh
+interval.
+
+API Changes
+-----------
+
+Expose the count of zones flagged for delayed notify in the Admin
+API as "/reports/counts/zones_pending_notify".
+
+Central Changes
+---------------
+
+Implement support for a new database column "pending_notify" and set it to
+True every time a Pool NS record is updated.
+
+Storage Changes
+---------------
+
+Add an new boolean database column "pending_notify" on Zones.
+Implement a migration script to add the column to existing databases,
+defaulting to False. In future, the column might default to True.
+
+Other Changes
+-------------
+
+Implement a Task in Zone Manager to periodically fetch a set of zones that need
+to receive a Notify starting with the oldest in term of last update time.
+The task frequency and the maximum set size can be configured to throttle the
+amount of outgoing Notify.
+Zone Manager will reset the "pending_notify" flag once done.
+
+Alternatives
+------------
+
+N/A
+
+Implementation
+==============
+
+The throttling queue is implemented as a new database column containing a
+boolean flag. See Central Changes and Storage Changes.
+
+Also, new zones will be created with an uniformly random refresh time between a minimum and a maximum value.
+
+
+Design considerations
+---------------------
+
+The throttling queue could be implemented outside of the database:
+- No need to create an extra database column
+- No increased database I/O
+
+We propose using the database for the following reasons:
+- Zone Manager is the best candidate to handle the delayed Notify. Currently there are no ways for Central to send a list of Zones to Zone Manager other than through the database
+- The queue can support delayed Notify for changes other than Pool NS record updates
+- Ability to monitor the queue size and ETA to inform the user and for debugging
+- A persistent queue can survive Zone Manager unhandled exceptions or restarts
+- The increased database load is negligible compared to the existing traffic
+
+Risk analysis
+-------------
+
+- Zone Manager fails to run the Notify delivery task. The nameservers will eventually refresh the zone anyways. Impact: slow update propagation. Mitigation: expose the notification queue length to the user through Admin API and by logging.
+- A big notification queue takes a considerable time to be handled. Impact: potentially prevents more urgent changes to be delivered quickly. Mitigation: encourage users to configure the throttling parameters; Provide sensible default values. Implementing a concept of notification priority seems unnecessary.
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  Federico Ceratto https://launchpad.net/~federico-ceratto
+
+Milestones
+----------
+
+Target Milestone for completion:
+  Liberty-3
+
+Work Items
+----------
+
+- Implement refresh time staggering
+- Implement Notify throttling
+- Add throttle parameters to configuration files
+- Document throttling mechanism
+- Write unit and functional tests
+- Test throttling and staggering on devstack
+
+Dependencies
+============
+
+N/A