beb7484858
Certain filesystems are sometimes used in specialty computing environments where a shared storage infrastructure or fabric exists. These filesystems allow for multi-host shared concurrent read/write access to the underlying block device by *not* locking the entire device for exclusive use. Generally ranges of the disk are reserved for each interacting node to write to, and locking schemes are used to prevent collissions. These filesystems are common for use cases where high availability is required or ability for individual computers to collaborate on a given workload is critical, such as a group of hypervisors supporting virtual machines because it can allow for nearly seamless transfer of workload from one machine to another. Similar technologies are also used for cluster quorum and cluster durable state sharing, however that is not specifically considered in scope. Where things get difficult is becuase the entire device is not exclusively locked with the storage fabrics, and in some cases locking is handled by a Distributed Lock Manager on the network, or via special sector interactions amongst the cluster members which understand and support the filesystem. As a reult of this IO/Interaction model, an Ironic-Python-Agent performing cleaning can effectively destroy the cluster just by attempting to clean storage which it percieves as attached locally. This is not IPA's fault, often this case occurs when a Storage Administrator forgot to update LUN masking or volume settings on a SAN as it relates to an individual host in the overall computing environment. The net result of one node cleaning the shared volume may include restoration from snapshot, backup storage, or may ultimately cause permenant data loss, depending on the environment and the usage of that environment. Included in this patch: - IBM GPFS - Can be used on a shared block device... apparently according to IBM's documentation. The standard use of GPFS is more Ceph like in design... however GPFS is also a specially licensed commercial offering, so it is a red flag if this is encountered, and should be investigated by the environment's systems operator. - Red Hat GFS2 - Is used with shared common block devices in clusters. - VMware VMFS - Is used with shared SAN block devices, as well as local block devices. With shared block devices, ranges of the disk are locked instead of the whole disk, and the ranges are mapped to virtual machine disk interfaces. It is unknown, due to lack of information, if this will detect and prevent erasure of VMFS logical extent volumes. Co-Authored-by: Jay Faulkner <jay@jvf.cc> Change-Id: Ic8cade008577516e696893fdbdabf70999c06a5b Story: 2009978 Task: 44985
21 lines
1.1 KiB
YAML
21 lines
1.1 KiB
YAML
---
|
|
fixes:
|
|
- |
|
|
Previously when the ``ironic-python-agent`` would undergo erasure of block
|
|
devices during cleaning, it would automatically attempt to erase the
|
|
contents of any "Shared Device" clustered filesystems which may be in use
|
|
by distinct multiple machines over a storage fabric. In particular
|
|
IBM GPFS, Red Hat Global File System 2, and VmWare Virtual Machine File
|
|
System (VMFS), are now identified and cleaning is halted. This is important
|
|
because should an active cluster be using the this disk, cleaning could
|
|
potentially cause the cluster to go down forcing restoration from backups.
|
|
Ideally, infrastructure operators should check their environment's storage
|
|
configuration and un-map any clustered filesystems from being visible to
|
|
Ironic nodes, unless explicitly needed and expected. Please see the
|
|
Ironic-Python-Agent troubleshooting documentation for more information.
|
|
issues:
|
|
- |
|
|
Logic to guard VMFS filesystems from being destroyed *may* not recognize
|
|
VMFS extents. Operators with examples of partitioning for extent usage
|
|
are encouraged to contact the Ironic community.
|