81acd5df24
Sending signal ``SIGUSR2`` to a conductor process will now trigger a drain shutdown. This is similar to a ``SIGTERM`` graceful shutdown but the timeout is determined by ``[DEFAULT]drain_shutdown_timeout`` which defaults to ``1800`` seconds. This is enough time for running tasks on existing reserved nodes to either complete or reach their own failure timeout. During the drain period the conductor needs to be removed from the hash ring to prevent new tasks from starting. Other conductors also need to not fail reserved nodes on the draining conductor which would appear to be orphaned. This is achieved by running the conductor keepalive heartbeat for this period, but setting the ``online`` state to ``False``. When this feature was proposed, SIGINT was suggested as the signal to use to trigger a drain shutdown. However this is already used by oslo_service fast exit[1] so using this for drain would be a change in existing behaviour. [1] https://opendev.org/openstack/oslo.service/src/branch/master/oslo_service/service.py#L340 Change-Id: I777898f5a14844c9ac9967168f33d55c4f97dfb9
14 lines
777 B
YAML
14 lines
777 B
YAML
---
|
|
features:
|
|
- |
|
|
Sending signal ``SIGUSR2`` to a conductor process will now trigger a drain
|
|
shutdown. This is similar to a ``SIGTERM`` graceful shutdown but the timeout
|
|
is determined by ``[DEFAULT]drain_shutdown_timeout`` which defaults to
|
|
``1800`` seconds. This is enough time for running tasks on existing reserved
|
|
nodes to either complete or reach their own failure timeout.
|
|
|
|
During the drain period the conductor will be removed from the hash ring to
|
|
prevent new tasks from starting. Other conductors will no longer fail
|
|
reserved nodes on the draining conductor, which previously appeared to be
|
|
orphaned. This is achieved by running the conductor keepalive heartbeat for
|
|
this period, but setting the ``online`` state to ``False``. |