ironic-python-agent/releasenotes/notes/prevent-deletion-of-shared-disk-filesystems-4c17c7666d2fe3bc.yaml
Julia Kreger beb7484858 Guard shared device/cluster filesystems
Certain filesystems are sometimes used in specialty computing
environments where a shared storage infrastructure or fabric exists.
These filesystems allow for multi-host shared concurrent read/write
access to the underlying block device by *not* locking the entire
device for exclusive use. Generally ranges of the disk are reserved
for each interacting node to write to, and locking schemes are used
to prevent collissions.

These filesystems are common for use cases where high availability
is required or ability for individual computers to collaborate on a
given workload is critical, such as a group of hypervisors supporting
virtual machines because it can allow for nearly seamless transfer
of workload from one machine to another.

Similar technologies are also used for cluster quorum and cluster
durable state sharing, however that is not specifically considered
in scope.

Where things get difficult is becuase the entire device is not
exclusively locked with the storage fabrics, and in some cases locking
is handled by a Distributed Lock Manager on the network, or via special
sector interactions amongst the cluster members which understand
and support the filesystem.

As a reult of this IO/Interaction model, an Ironic-Python-Agent
performing cleaning can effectively destroy the cluster just by
attempting to clean storage which it percieves as attached locally.
This is not IPA's fault, often this case occurs when a Storage
Administrator forgot to update LUN masking or volume settings on
a SAN as it relates to an individual host in the overall
computing environment. The net result of one node cleaning the
shared volume may include restoration from snapshot, backup
storage, or may ultimately cause permenant data loss, depending
on the environment and the usage of that environment.

Included in this patch:
- IBM GPFS - Can be used on a shared block device... apparently according
             to IBM's documentation. The standard use of GPFS is more Ceph
             like in design... however GPFS is also a specially licensed
             commercial offering, so it is a red flag if this is
             encountered, and should be investigated by the environment's
             systems operator.
- Red Hat GFS2 - Is used with shared common block devices in clusters.
- VMware VMFS - Is used with shared SAN block devices, as well as
                local block devices. With shared block devices,
                ranges of the disk are locked instead of the whole
                disk, and the ranges are mapped to virtual machine
                disk interfaces.
                It is unknown, due to lack of information, if this
                will detect and prevent erasure of VMFS logical
                extent volumes.

Co-Authored-by: Jay Faulkner <jay@jvf.cc>
Change-Id: Ic8cade008577516e696893fdbdabf70999c06a5b
Story: 2009978
Task: 44985
2022-07-19 13:24:03 -07:00

21 lines
1.1 KiB
YAML

---
fixes:
- |
Previously when the ``ironic-python-agent`` would undergo erasure of block
devices during cleaning, it would automatically attempt to erase the
contents of any "Shared Device" clustered filesystems which may be in use
by distinct multiple machines over a storage fabric. In particular
IBM GPFS, Red Hat Global File System 2, and VmWare Virtual Machine File
System (VMFS), are now identified and cleaning is halted. This is important
because should an active cluster be using the this disk, cleaning could
potentially cause the cluster to go down forcing restoration from backups.
Ideally, infrastructure operators should check their environment's storage
configuration and un-map any clustered filesystems from being visible to
Ironic nodes, unless explicitly needed and expected. Please see the
Ironic-Python-Agent troubleshooting documentation for more information.
issues:
- |
Logic to guard VMFS filesystems from being destroyed *may* not recognize
VMFS extents. Operators with examples of partitioning for extent usage
are encouraged to contact the Ironic community.