ironic/releasenotes/notes/drain-5eafd17e0868e21a.yaml
Steve Baker 81acd5df24 Implement drain shutdown support
Sending signal ``SIGUSR2`` to a conductor process will now trigger a
drain shutdown. This is similar to a ``SIGTERM`` graceful shutdown but
the timeout is determined by ``[DEFAULT]drain_shutdown_timeout`` which
defaults to ``1800`` seconds. This is enough time for running tasks on
existing reserved nodes to either complete or reach their own failure
timeout.

During the drain period the conductor needs to be removed from the hash
ring to prevent new tasks from starting. Other conductors also need to
not fail reserved nodes on the draining conductor which would appear to
be orphaned.  This is achieved by running the conductor keepalive
heartbeat for this period, but setting the ``online`` state to
``False``.

When this feature was proposed, SIGINT was suggested as the signal to
use to trigger a drain shutdown. However this is already used by
oslo_service fast exit[1] so using this for drain would be a change in
existing behaviour.

[1] https://opendev.org/openstack/oslo.service/src/branch/master/oslo_service/service.py#L340

Change-Id: I777898f5a14844c9ac9967168f33d55c4f97dfb9
2023-11-13 10:38:18 +13:00

14 lines
777 B
YAML

---
features:
- |
Sending signal ``SIGUSR2`` to a conductor process will now trigger a drain
shutdown. This is similar to a ``SIGTERM`` graceful shutdown but the timeout
is determined by ``[DEFAULT]drain_shutdown_timeout`` which defaults to
``1800`` seconds. This is enough time for running tasks on existing reserved
nodes to either complete or reach their own failure timeout.
During the drain period the conductor will be removed from the hash ring to
prevent new tasks from starting. Other conductors will no longer fail
reserved nodes on the draining conductor, which previously appeared to be
orphaned. This is achieved by running the conductor keepalive heartbeat for
this period, but setting the ``online`` state to ``False``.