Ignored error state cache for new requests

A possibility exists where inspector *can* fail upon inspection if
the database connectivity was lost on a prior action. This is because
the last database transport is potentially bad and fails upon load for
the transaction. The cache can then end up with an "error" state entry,
which upon retrying can fail becasue it is already in error state

Because there really are no guarentees regarding database failures,
the best thing to do is to not trust the prior cache state if it is
in error and to reset it to starting upon new introspection requests.
This prevents operators from *having* to perform process restarts to
force all loads to be from the database unless they manage to have a
multi-inspector cluster and get another inspector node to inspect in
the mean time.

Change-Id: I04ae1d54028862642d043f3a8f3af99405863325
Story: 2008344
Task: 41246
Related: rhbz#1947147
(cherry picked from commit d972dc93cd)
This commit is contained in:
Julia Kreger 2021-04-07 10:46:14 -07:00
parent d697c7c816
commit 36c5dcc1a0
3 changed files with 25 additions and 1 deletions

View File

@ -679,7 +679,13 @@ def start_introspection(uuid, **kwargs):
node_info=node_info)
state = istate.States.starting
else:
state = node_info.state
recorded_state = node_info.state
if istate.States.error == recorded_state:
# If there was a failure, return to starting state to avoid
# letting the cache block new runs from occuring.
state = istate.States.starting
else:
state = recorded_state
return add_node(uuid, state, **kwargs)

View File

@ -1276,6 +1276,17 @@ class TestStartIntrospection(test_base.NodeTest):
self.node_info.uuid)
self.assertFalse(add_node_mock.called)
@prepare_mocks
def test_ensure_start_on_error(self, fsm_event_mock,
add_node_mock):
def side_effect(*args):
self.node_info._state = istate.States.error
fsm_event_mock.side_effect = side_effect
node_cache.start_introspection(self.node.uuid)
add_node_mock.assert_called_once_with(self.node_info.uuid,
istate.States.starting)
class TestIntrospectionDataDbStore(test_base.NodeTest):
def setUp(self):

View File

@ -0,0 +1,7 @@
---
fixes:
- |
Fixes an issue where a failed inspection due to a transient failure can
prevent retry attempts to inspect to be perceived as a failure. If a prior
inspection fails and is in ``error`` state, when a new introspection is
requested, the state is now appropriately set to ``starting``.