nova/nova
melanie witt 620e5da840 Lowercase ironic driver hash ring and ignore case in cache
Recently we had a customer case where attempts to add new ironic nodes
to an existing undercloud resulted in half of the nodes failing to be
detected and added to nova. Ironic API returned all of the newly added
nodes when called by the driver, but half of the nodes were not
returned to the compute manager by the driver.

There was only one nova-compute service managing all of the ironic
nodes of the all-in-one typical undercloud deployment.

After days of investigation and examination of a database dump from the
customer, we noticed that at some point the customer had changed the
hostname of the machine from something containing uppercase letters to
the same name but all lowercase. The nova-compute service record had
the mixed case name and the CONF.host (socket.gethostname()) had the
lowercase name.

The hash ring logic adds all of the nova-compute service hostnames plus
CONF.host to hash ring, then the ironic driver reports only the nodes
it owns by retrieving a service hostname from the ring based on a hash
of each ironic node UUID.

Because of the machine hostname change, the hash ring contained, for
example: {'MachineHostName', 'machinehostname'} when it should have
contained only one hostname. And because the hash ring contained two
hostnames, the driver was able to retrieve only half of the nodes as
nodes that it owned. So half of the new nodes were excluded and not
added as new compute nodes.

This adds lowercasing of hosts that are added to the hash ring and
ignores case when comparing the CONF.host to the hash ring members
to avoid unnecessary pain and confusion for users that make hostname
changes that are otherwise functionally harmless.

This also adds logging of the set of hash ring members at level DEBUG
to help enable easier debugging of hash ring related situations.

Closes-Bug: #1866380

Change-Id: I617fd59de327de05a198f12b75a381f21945afb0
(cherry picked from commit 7145100ee4)
(cherry picked from commit 588b0484bf)
(cherry picked from commit 8f8667a8dd)
(cherry picked from commit 019e3da75b)
2020-05-01 14:52:24 +00:00
..
CA
api Block deleting compute services with in-progress migrations 2020-02-12 16:21:13 +00:00
cells Add instance action record for snapshot instances 2017-12-11 17:46:38 +08:00
cmd fix cellv2 delete_host 2019-07-25 13:45:06 +03:00
common
compute Unplug VIFs as part of cleanup of networks 2020-03-27 18:02:50 +00:00
conductor Avoid circular reference during serialization 2020-03-15 11:45:04 +00:00
conf Fail to live migration if instance has a NUMA topology 2019-07-02 14:25:26 +00:00
console Mask the token used to allow access to consoles 2020-02-14 15:13:46 +01:00
consoleauth Mask the token used to allow access to consoles 2020-02-14 15:13:46 +01:00
db Add retry_on_deadlock to migration_update DB API 2020-03-11 01:22:01 +00:00
hacking trivial: Rename 'policy_check' -> 'policy' 2017-10-25 17:56:40 +01:00
image Fix regression in glance client call 2019-04-23 15:28:37 +00:00
ipv6
keymgr Remove deprecated keymgr code 2017-09-11 15:48:30 -04:00
locale Imported Translations from Zanata 2018-03-01 06:16:22 +00:00
network Improve metadata server performance with large security groups 2019-12-05 11:41:02 -05:00
notifications Merge "Remove noisy DEBUG log" into stable/queens 2018-09-21 12:01:58 +00:00
objects Fix listing deleted servers with a marker 2019-11-25 16:40:39 -05:00
pci PCI: do not force remove allocated devices 2019-02-05 23:29:54 +00:00
policies Add policy rule to block image-backed servers with 0 root disk flavor 2018-06-18 13:51:41 -04:00
privsep stable-only: fix typo in IVS related privsep method 2018-10-03 19:28:30 +00:00
scheduler Fix false ERROR message at compute restart 2019-11-29 19:41:09 +00:00
servicegroup Fix service list for disabled compute using MC driver 2018-09-16 19:12:55 +00:00
tests Lowercase ironic driver hash ring and ignore case in cache 2020-05-01 14:52:24 +00:00
virt Lowercase ironic driver hash ring and ignore case in cache 2020-05-01 14:52:24 +00:00
vnc
volume Fix exception translation when creating volume 2019-10-11 17:38:01 +08:00
__init__.py
availability_zones.py
baserpc.py
block_device.py Add uuid column to BlockDeviceMapping 2017-12-17 14:28:35 +00:00
cache_utils.py
config.py Set default of oslo.privsep.daemon logging to INFO level 2018-09-15 02:21:10 +00:00
context.py Allow cinderv2 endpoints within the request context catalog 2018-06-05 10:04:06 +01:00
crypto.py
debugger.py
exception.py Fixes multi-registry config in Quobyte driver 2019-10-07 14:00:02 +00:00
exception_wrapper.py rename binary to source in versioned notifications 2017-07-25 17:36:04 +02:00
filters.py
hooks.py
i18n.py correct referenced url in comments 2018-01-18 09:16:37 +08:00
loadables.py
manager.py
policy.py Add policy granularity to the Flavors API 2017-07-19 15:56:47 -04:00
profiler.py
quota.py Fix server_group_members quota check 2018-07-11 15:04:34 -04:00
rpc.py Remove dead code of api.fault notification sending 2017-10-09 17:29:40 +02:00
safe_utils.py Allow wrapping of closures 2017-07-20 10:07:52 +01:00
service.py Move conductor wait_until_ready() delay before manager init 2018-09-01 17:25:02 -04:00
service_auth.py Fix NoneType error when [service_user] is misconfigured 2017-11-28 12:22:30 -06:00
test.py Fix the request context in ServiceFixture 2018-09-04 19:36:26 +00:00
utils.py Make supports_direct_io work on 4096b sector size 2018-11-21 10:47:30 +00:00
version.py
weights.py
wsgi.py