glance-specs/specs/kilo/sighup-conf-reload.rst

===========================================
Reload configuration files on SIGHUP signal
===========================================

https://blueprints.launchpad.net/glance/+spec/sighup-conf-reload

We propose to eliminate the need to restart the glance api service when
configuration files are modified. Operator can send SIGHUP signal to glance
service which will reload the configuration file.

Problem description
===================

In a production environment, an administrator will modify the glance-api.conf
configuration parameters like filesystem_store_datadirs when the storage
is almost full to add more capacity by adding more disks, or to increase
the number of workers or log configuration etc. Then they need to restart the
glance services explicitly for these changes to be loaded. Restarting
service would break users connected to it which is not good from users point
of view.

Proposed change
===============

Add the ability to dynamically change configuration settings of a running
glance server with no impact to service.

A running glance server consists of a parent process and one or
more child processes.

On receipt of a SIGHUP signal the parent process will:

- reload the configuration
- send a SIGHUP to the original child processes
- start new child processes with the new configuration
- its listening socket will not be closed

On receipt of a SIGHUP signal each original child process will:

- close the listening socket so as not to accept new requests
- complete any in-flight requests
- exit

This approach is based on nginx's behaviour and avoids some of the
disadvantages of the current oslo's Launcher reload:

- Race conditions: Launcher does not shutdown eventlet cleanly, existing
  requests can fail.
- If all child processes are busy there can be a lengthy delay when new
  requests are not processed.
- Long lived pre-SIGHUP idle client connections can stall request
  processing indefinitely.
- Not all parameters can be changed, eg number of workers.
- The wsgi pipeline cannot be changed, for example to enable caching.

Alternatives
------------

An alternative may be to attempt to save and then restore long running tasks
using taskflow. The process restart would then only need to deal with
short lived requests (e.g. API DB lookups) and then no user visible downtime
is required for regular restarts

Data model impact
-----------------

None

REST API impact
---------------

None

Security impact
---------------

None

Notifications impact
--------------------

None

Other end user impact
---------------------

None

Performance Impact
------------------

If the reload takes too long (e.g., >50ms) then the API requests will be
noticeably delayed.

We are proposing current worker processes to stop accepting requests and
continue with what they are doing, while the parent process starts and
spawn new child processes with the new configuration. So there is a
possibility that the glance node will be running twice as many child processes
as it is configured to run for a while. It could impact performance,
especially if it is an underpowered node that is already configured to run
as many child processes as it can handle without degradation.

In the author's opinion, it is the responsibility of the operator to make sure
the node will not be over-provisioned with child processes (workers). If an
operator wants to run a node with no headroom for additional child processes,
the author suggests that such an operator not use dynamic configuration via
SIGHUP. Instead, such an operator should use the old fashioned technique of
restarting the api service.

.. _other_deployer:

Other deployer impact
---------------------

Need to document the impact of config changes for some params like workers,
host, port etc.


Developer impact
----------------

None

Implementation
==============

Assignee(s)
-----------

Primary assignee:
  stuart-mclaren

Other contributors:
  abhishek-kekane

Reviewers
---------

Core reviewer(s):
  nikhil-komawar
  flaper87

Other reviewer(s):
  icordasc


Work Items
----------

- Add handler for SIGHUP signal
- Reload configuration parameters
- Unit and functional tests for coverage


Dependencies
============

None


Testing
=======

None


Documentation Impact
====================

Please refer to :ref:`other_deployer`


References
==========

https://etherpad.openstack.org/p/sighup-conf-reload