nova/nova
Dan Smith 91e29079a0 Change consecutive build failure limit to a weigher
There is concern over the ability for compute nodes to reasonably
determine which events should count against its consecutive build
failures. Since the compute may erronenously disable itself in
response to mundane or otherwise intentional user-triggered events,
this patch adds a scheduler weigher that considers the build failure
counter and can negatively weigh hosts with recent failures. This
avoids taking computes fully out of rotation, rather treating them as
less likely to be picked for a subsequent scheduling
operation.

This introduces a new conf option to control this weight. The default
is set high to maintain the existing behavior of picking nodes that
are not experiencing high failure rates, and resetting the counter as
soon as a single successful build occurs. This is minimal visible
change from the existing behavior with default configuration.

The rationale behind the default value for this weigher comes from the
values likely to be generated by its peer weighers. The RAM and Disk
weighers will increase the score by number of available megabytes of
memory (range in thousands) and disk (range in millions). The default
value of 1000000 for the build failure weigher will cause competing
nodes with similar amounts of available disk and a small (less than ten)
number of failures to become less desirable than those without, even
with many terabytes of available disk.

Change-Id: I71c56fe770f8c3f66db97fa542fdfdf2b9865fb8
Related-Bug: #1742102
2018-06-06 15:18:50 -07:00
..
api Merge "[placement] default to accept of application/json when */*" 2018-06-01 21:45:58 +00:00
cells Add instance action record for snapshot instances 2017-12-11 17:46:38 +08:00
cmd Merge "Metadata-API fails to retrieve avz for instances created before Pike" 2018-05-30 21:52:08 +00:00
common
compute Change consecutive build failure limit to a weigher 2018-06-06 15:18:50 -07:00
conductor Add user_id to RequestSpec 2018-05-01 11:08:43 -04:00
conf Change consecutive build failure limit to a weigher 2018-06-06 15:18:50 -07:00
console Merge "Convert xenapi's xvp console to processutils." 2018-05-09 04:57:55 +00:00
consoleauth
db Remove unused function 2018-05-29 16:40:31 +08:00
hacking Implement granular policy rules for placement 2018-05-17 11:12:16 -04:00
image Workaround glanceclient bug when CONF.glance.api_servers not set 2018-02-08 09:06:48 -05:00
ipv6
keymgr Remove deprecated keymgr code 2017-09-11 15:48:30 -04:00
locale Imported Translations from Zanata 2018-04-11 06:17:52 +00:00
network network: update pci request spec to handle trusted tags 2018-05-31 13:55:40 -04:00
notifications Remove deprecated monkey_patch config options 2018-05-16 11:40:41 -04:00
objects Merge "libvirt: add vf_trusted field for network metadata" 2018-06-01 12:12:53 +00:00
pci network: update pci request spec to handle trusted tags 2018-05-31 13:55:40 -04:00
policies Merge "Add host/hostId to instance action events API" 2018-04-26 20:42:07 +00:00
privsep Merge "Move image conversion to privsep." 2018-05-16 14:45:34 +00:00
scheduler Change consecutive build failure limit to a weigher 2018-06-06 15:18:50 -07:00
servicegroup
tests Change consecutive build failure limit to a weigher 2018-06-06 15:18:50 -07:00
virt Merge "libvirt: place emulator threads on CONF.compute.cpu_shared_set" 2018-06-01 17:25:12 +00:00
vnc
volume Log a more useful error when cinder auth isn't configured 2018-04-06 14:52:13 -04:00
__init__.py
availability_zones.py
baserpc.py
block_device.py Add uuid column to BlockDeviceMapping 2017-12-17 14:28:35 +00:00
cache_utils.py
config.py
context.py Remove RequestContext.instance_lock_checked 2018-04-11 11:46:19 -04:00
crypto.py Convert certificate generation to processutils. 2018-05-02 19:18:41 +10:00
debugger.py
exception.py Merge "PowerVM Driver: vSCSI Fibre Channel volume adapter" 2018-05-31 20:46:45 +00:00
exception_wrapper.py
filters.py
hooks.py
i18n.py correct referenced url in comments 2018-01-18 09:16:37 +08:00
loadables.py
manager.py conf: Remove 'db_driver' config opt 2018-03-16 17:23:16 +00:00
policy.py
profiler.py
quota.py Restrict CONF.quota.driver to DB and noop quota drivers 2018-06-01 15:44:52 +00:00
rc_fields.py Make ResourceClass.normalize_name handle sharp S 2018-04-10 12:24:40 -05:00
rpc.py Remove useless run_periodic_tasks call in ClientRouter 2018-03-20 23:54:21 +00:00
safe_utils.py
service.py Deprecate running API services under eventlet 2018-05-16 03:48:32 +00:00
service_auth.py Fix NoneType error when [service_user] is misconfigured 2017-11-28 12:22:30 -06:00
test.py Change consecutive build failure limit to a weigher 2018-06-06 15:18:50 -07:00
utils.py Remove deprecated monkey_patch config options 2018-05-16 11:40:41 -04:00
version.py
weights.py
wsgi.py Refactor WSGI apps and utils to limit imports 2018-03-06 22:05:12 +00:00