f93f675a85
This implements an auto-disable feature in nova-compute, where we automatically set our service record to disabled if we consecutively fail to build a certain number of instances. While this is a very useful thing to do in general, disabling a failing compute becomes more important in the future where scheduler retries due to unknown failures may become either impossible or scoped to a single cell. Since a compute that is consistently failing will look very attractive to the scheduler, it may become a build magnet, that in the absence of retries, would effectively kill all builds in a cloud until fixed. Change-Id: I02b7cd87d399d487dd1d650540f503a70bc27749
17 lines
754 B
YAML
17 lines
754 B
YAML
---
|
|
features:
|
|
- |
|
|
The `nova-compute` worker can automatically disable itself in the
|
|
service database if consecutive build failures exceed a set threshold. The
|
|
``[compute]/consecutive_build_service_disable_threshold`` configuration option
|
|
allows setting the threshold for this behavior, or disabling it entirely if
|
|
desired.
|
|
The intent is that an admin will examine the issue before manually
|
|
re-enabling the service, which will avoid that compute node becoming a
|
|
black hole build magnet.
|
|
upgrade:
|
|
- |
|
|
The new configuration option
|
|
``[compute]/consecutive_build_service_disable_threshold``
|
|
defaults to a nonzero value, which means multiple failed builds will
|
|
result in a compute node auto-disabling itself. |