Reload configuration files on SIGHUP signal

No need to restart the glance api service when user modifies configuration files. Operator/User can send SIGHUP signal to glance service which will reload the configuration file. Change-Id: Ie1862af08dbdd06653c1c515603b4e1a68fc7875
2014-10-09 06:09:07 -07:00 · 2014-10-09 06:09:07 -07:00 · 36839191e3
commit 36839191e3
parent b9313819c3
1 changed files with 176 additions and 0 deletions
--- a/specs/kilo/sighup-conf-reload.rst
+++ b/specs/kilo/sighup-conf-reload.rst
@ -0,0 +1,176 @@
+===========================================
+Reload configuration files on SIGHUP signal
+===========================================
+
+https://blueprints.launchpad.net/glance/+spec/sighup-conf-reload
+
+We propose to eliminate the need to restart the glance api service when
+configuration files are modified. Operator can send SIGHUP signal to glance
+service which will reload the configuration file.
+
+Problem description
+===================
+
+In a production environment, an administrator will modify the glance-api.conf
+configuration parameters like filesystem_store_datadirs when the storage
+is almost full to add more capacity by adding more disks, or to increase
+the number of workers or log configuration etc. Then they need to restart the
+glance services explicitly for these changes to be loaded. Restarting
+service would break users connected to it which is not good from users point
+of view.
+
+Proposed change
+===============
+
+Add the ability to dynamically change configuration settings of a running
+glance server with no impact to service.
+
+A running glance server consists of a parent process and one or
+more child processes.
+
+On receipt of a SIGHUP signal the parent process will:
+
+- reload the configuration
+- send a SIGHUP to the original child processes
+- start new child processes with the new configuration
+- its listening socket will not be closed
+
+On receipt of a SIGHUP signal each original child process will:
+
+- close the listening socket so as not to accept new requests
+- complete any in-flight requests
+- exit
+
+This approach is based on nginx's behaviour and avoids some of the
+disadvantages of the current oslo's Launcher reload:
+
+- Race conditions: Launcher does not shutdown eventlet cleanly, existing
+  requests can fail.
+- If all child processes are busy there can be a lengthy delay when new
+  requests are not processed.
+- Long lived pre-SIGHUP idle client connections can stall request
+  processing indefinitely.
+- Not all parameters can be changed, eg number of workers.
+- The wsgi pipeline cannot be changed, for example to enable caching.
+
+Alternatives
+------------
+
+An alternative may be to attempt to save and then restore long running tasks
+using taskflow. The process restart would then only need to deal with
+short lived requests (e.g. API DB lookups) and then no user visible downtime
+is required for regular restarts
+
+Data model impact
+-----------------
+
+None
+
+REST API impact
+---------------
+
+None
+
+Security impact
+---------------
+
+None
+
+Notifications impact
+--------------------
+
+None
+
+Other end user impact
+---------------------
+
+None
+
+Performance Impact
+------------------
+
+If the reload takes too long (e.g., >50ms) then the API requests will be
+noticeably delayed.
+
+We are proposing current worker processes to stop accepting requests and
+continue with what they are doing, while the parent process starts and
+spawn new child processes with the new configuration. So there is a
+possibility that the glance node will be running twice as many child processes
+as it is configured to run for a while. It could impact performance,
+especially if it is an underpowered node that is already configured to run
+as many child processes as it can handle without degradation.
+
+In the author's opinion, it is the responsibility of the operator to make sure
+the node will not be over-provisioned with child processes (workers). If an
+operator wants to run a node with no headroom for additional child processes,
+the author suggests that such an operator not use dynamic configuration via
+SIGHUP. Instead, such an operator should use the old fashioned technique of
+restarting the api service.
+
+.. _other_deployer:
+
+Other deployer impact
+---------------------
+
+Need to document the impact of config changes for some params like workers,
+host, port etc.
+
+
+Developer impact
+----------------
+
+None
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  stuart-mclaren
+
+Other contributors:
+  abhishek-kekane
+
+Reviewers
+---------
+
+Core reviewer(s):
+  nikhil-komawar
+  flaper87
+
+Other reviewer(s):
+  icordasc
+
+
+Work Items
+----------
+
+- Add handler for SIGHUP signal
+- Reload configuration parameters
+- Unit and functional tests for coverage
+
+
+Dependencies
+============
+
+None
+
+
+Testing
+=======
+
+None
+
+
+Documentation Impact
+====================
+
+Please refer to :ref:`other_deployer`
+
+
+References
+==========
+
+https://etherpad.openstack.org/p/sighup-conf-reload