Browse Source

NetApp SolidFire: Fix error on cluster workload rebalancing

When SolidFire is under heavy load or being upgraded, the
SolidFire cluster may automatically move connections from primary
to secondary nodes, in order to rebalance cluster workload.

Although this operation ocurrs very quickly, if an operation is made
to a volume at the same time it's being moved, there might be a
chance that API calls such as create snapshot could fail with
xNotPrimary error. Normally this will succeed on a retry of the
operation.

This patch fixes this issue by adding the xNotPrimary exception to
our list of retryable exceptions in the SolidFire driver.

Change-Id: I67dd2bfba37adcb7cda5f1cd08ff641410ec3f6b
Closes-Bug: #1891914
(cherry picked from commit 8d0c08b4bb)
(cherry picked from commit 622244a755)
(cherry picked from commit f087536e47)
changes/69/764269/4 15.5.0
Fernando Ferraz 7 months ago
parent
commit
a75f8633b4
2 changed files with 13 additions and 2 deletions
  1. +5
    -2
      cinder/volume/drivers/solidfire.py
  2. +8
    -0
      releasenotes/notes/sf-fix-error-on-cluster-rebalancing-515bf41104cd181a.yaml

+ 5
- 2
cinder/volume/drivers/solidfire.py View File

@ -244,9 +244,11 @@ class SolidFireDriver(san.SanISCSIDriver):
service is restarted
2.0.18 - Fix bug #1896112 SolidFire Driver creates duplicate volume
when API response is lost
2.0.19 - Fix bug #1891914 fix error on cluster workload rebalancing
by adding xNotPrimary to the retryable exception list
"""
VERSION = '2.0.18'
VERSION = '2.0.19'
# ThirdPartySystems wiki page
CI_WIKI_NAME = "NetApp_SolidFire_CI"
@ -282,7 +284,8 @@ class SolidFireDriver(san.SanISCSIDriver):
'xMaxSnapshotsPerNodeExceeded',
'xMaxClonesPerNodeExceeded',
'xSliceNotRegistered',
'xNotReadyForIO']
'xNotReadyForIO',
'xNotPrimary']
def __init__(self, *args, **kwargs):
super(SolidFireDriver, self).__init__(*args, **kwargs)


+ 8
- 0
releasenotes/notes/sf-fix-error-on-cluster-rebalancing-515bf41104cd181a.yaml View File

@ -0,0 +1,8 @@
---
fixes:
- |
NetApp SolidFire driver `Bug #1891914
<https://bugs.launchpad.net/cinder/+bug/1891914>`_:
Fix an error that might occur on cluster workload rebalancing or
system upgrade, when an operation is made to a volume at the same
time its connection is being moved to a secondary node.

Loading…
Cancel
Save