cinder/api-ref/source/v3/samples/volumes/v3.69
Gorka Eguileor ef741228d8 Report tri-state shared_targets for NVMe volumes
NVMe-oF drivers that share the subsystem have the same race condition
issue that iSCSI volumes that share targets do.

The race condition is caused by AER messages that trigger automatic
rescans on the connector host side in both cases.

For iSCSI we added a feature on the Open-iSCSI project that allowed
disabling these scans, and added support for it in os-brick.

Since manual scans is a new feature that may be missing in a host's
iSCSI client, cinder has a flag in volumes to indicate when they use
shared targets.  Using that flag os-brick consumers can use the
"guard_connection" context manager to ensure race conditions don't
happen.

The race condition is prevented by os-brick using manual scans if they
are available in the iSCSI client, or a file lock if not.

The problem we face now is that we also want to use the lock for NVMe-oF
volumes that share a subsystem for multiple namespaces (there is no way
to disable automatic scans), but cinder doesn't (and shouldn't) expose
the actual storage protocol on the volume resource, so we need to
leverage the "shared_targets" parameter.

So with a single boolean value we need to encode 3 possible options:

- Don't use locks because targets/subystems are not shared
- Use locks if iSCSI client doesn't support automatic connections
- Always use locks (for example for NVMe-oF)

The only option we have is using the "None" value as well. That way we
can encode 3 different cases.

But we have an additional restriction, "True" is already taken for the
iSCSI case, because there will exist volumes in the database that
already have that value stored.

And making guard_connection always lock when shared_targets is set to
True will introduce the bottleneck from bug (#1800515).

That leaves us with the "None" value to force the use of locks.

So we end up with the following tristate for "shared_targets":

- True to use lock if iSCSI initiator doesn't support manual scans
- False means that os-brick should never lock.
- None means that os-brick should always lock.

The alternative to this encoding would be to have an online data
migration for volumes to change "True" to "None", and accept that there
could be race conditions during the rolling upgrade (because os-brick on
computes will interpret "None" as "False").

Since "in theory" Cinder was only returning True or False for the
"shared_target", we add a new microversion with number 3.69 that returns
null when the value is internally set to None.

The patch also updates the database with a migration, though it looks
like it's not necessary since the DB already allows null values, but it
seems more correct to make sure that's always the case.

This patch doesn't close but #1961102 because the os-brick patch is
needed for that.

Related-Bug: #1961102
Change-Id: I8cda6d9830f39e27ac700b1d8796fe0489fd7c0a
2022-05-24 15:13:23 +02:00
..
volume-create-response.json Report tri-state shared_targets for NVMe volumes 2022-05-24 15:13:23 +02:00
volume-show-response.json Report tri-state shared_targets for NVMe volumes 2022-05-24 15:13:23 +02:00
volume-update-response.json Report tri-state shared_targets for NVMe volumes 2022-05-24 15:13:23 +02:00
volumes-list-detailed-response.json Report tri-state shared_targets for NVMe volumes 2022-05-24 15:13:23 +02:00