ironic/releasenotes/notes/hash-ring-race-da0d584de1f46788.yaml
Dmitry Tantsur 5471157e4f Fixes a race condition in the hash ring code
The current hash ring code suffers from several problems:

1. The cache is reset on any get_topic_for call, which means that the
   cache is not used between API calls. Previously it was done to work
   around the situation when a new conductor is not visible until the API
   process restarts. Currently we refresh the hash rings periodically.

   This patch removes resetting the cache. To avoid waiting 2 minutes to
   be able to use a new driver:
   1) the hash ring cache is always rebuilt when a ring is not found,
   2) the hash_ring_reset_interval option was changed to 15 seconds.

2. The reset of the cache races with the hot path in the get_ring call.
   It is possible that the reset happens after the class-level cache
   variable is checked but before it is used, yielding None.

   This patch stores the value of the class-level variable to a local
   variable before checking, thus ensuring None is never returned.

Finally, some logging was added to the modified code to make this kind
of problems more debugable in the future.

Change-Id: I6e18c6ec23a053b59c76fcadd52b13d84d81b4fb
Story: #2003966
Task: #26896
Partial-Bug: #1792872
2018-10-05 07:36:30 -04:00

14 lines
585 B
YAML

---
fixes:
- |
Fixes a race condition in the hash ring implementation that could cause
an internal server error on any request. See `story 2003966
<https://storyboard.openstack.org/#!/story/2003966>`_ for details.
upgrade:
- |
The ``hash_ring_reset_interval`` configuration option was changed from 180
to 15 seconds. Previously, this option was essentially ignored on the API
side, becase the hash ring was reset on each API access. The lower value
minimizes the probability of a request routed to a wrong conductor when the
ring needs rebalancing.