nova/releasenotes/notes/compute-node-auto-disable-303eb9b0fdb4f3f1.yaml
Dan Smith f93f675a85 Make compute auto-disable itself if builds are failing
This implements an auto-disable feature in nova-compute, where we
automatically set our service record to disabled if we consecutively
fail to build a certain number of instances.

While this is a very useful thing to do in general, disabling a failing
compute becomes more important in the future where scheduler retries due
to unknown failures may become either impossible or scoped to a single
cell. Since a compute that is consistently failing will look very
attractive to the scheduler, it may become a build magnet, that in the
absence of retries, would effectively kill all builds in a cloud until
fixed.

Change-Id: I02b7cd87d399d487dd1d650540f503a70bc27749
2017-05-15 09:22:30 -07:00

17 lines
754 B
YAML

---
features:
- |
The `nova-compute` worker can automatically disable itself in the
service database if consecutive build failures exceed a set threshold. The
``[compute]/consecutive_build_service_disable_threshold`` configuration option
allows setting the threshold for this behavior, or disabling it entirely if
desired.
The intent is that an admin will examine the issue before manually
re-enabling the service, which will avoid that compute node becoming a
black hole build magnet.
upgrade:
- |
The new configuration option
``[compute]/consecutive_build_service_disable_threshold``
defaults to a nonzero value, which means multiple failed builds will
result in a compute node auto-disabling itself.