Merge "Guard shared device/cluster filesystems"

This commit is contained in:
Zuul
2022-07-20 08:23:55 +00:00
committed by Gerrit Code Review
8 changed files with 499 additions and 9 deletions

View File

@@ -103,3 +103,23 @@ Clean steps
Delete the RAID configuration. This step belongs to the ``raid`` interface
and must be used through the :ironic-doc:`ironic RAID feature
<admin/raid.html>`.
Cleaning safeguards
-------------------
The stock hardware manager contains a number of safeguards to prevent
unsafe conditions from occuring.
Shared Disk Cluster Filesystems
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Commonly used shared disk cluster filesystems, when detected, causes cleaning
processes on stock hardware manager methods to abort prior to destroying the
contents on the disk.
These filesystems include IBM General Parallel File System (GPFS),
VmWare Virtual Machine File System (VMFS), and Red Hat Global File System
(GFS2).
For information on troubleshooting, and disabling this check,
see :doc:`/admin/troubleshooting`.

View File

@@ -24,7 +24,7 @@ image boots. Kernel command line parameters are used to do this.
dynamic-login element example:
- Add ``sshkey="ssh-rsa BBA1..."`` to pxe_append_params setting in
- Add ``sshkey="ssh-rsa BBA1..."`` to kernel_append_params setting in
the ``ironic.conf`` file
- Restart the ironic-conductor with the command
``service ironic-conductor restart``
@@ -76,7 +76,7 @@ are used to do this.
dynamic-login element example::
Generate a ENCRYPTED_PASSWORD with the openssl passwd -1 command
Add rootpwd="$ENCRYPTED_PASSWORD" value on the pxe_append_params setting in /etc/ironic/ironic.conf
Add rootpwd="$ENCRYPTED_PASSWORD" value on the kernel_append_params setting in /etc/ironic/ironic.conf
Restart the ironic-conductor with the command service ironic-conductor restart
Users can also be added to DIB built IPA images with the devuser element [1]_
@@ -118,7 +118,7 @@ Set IPA to debug logging
Debug logging can be enabled a several different ways. The easiest way is to
add ``ipa-debug=1`` to the kernel command line. To do this:
- Append ``ipa-debug=1`` to the pxe_append_params setting in the
- Append ``ipa-debug=1`` to the kernel_append_params setting in the
``ironic.conf`` file
- Restart the ironic-conductor with the command
``service ironic-conductor restart``
@@ -161,6 +161,96 @@ If the system uses systemd then systemctl can be used to restart the service::
sudo systemctl restart ironic-python-agent.service
Cleaning halted with ProtectedDeviceError
=========================================
The IPA service has halted cleaning as one of the block devices within or
attached to the bare metal node contains a class of filesystem which **MAY**
cause irreparable harm to a potentially running cluster if accidently removed.
These filesystems *may* be used for only local storage and as a result be
safe to erase. However if a shared block device is in use, such as a device
supplied via a Storage Area Network utilizing protocols such as iSCSI or
FibreChannel. Ultimately the Host Bus Adapter (HBA) may not be an entirely
"detectable" entity given the hardware market place and aspects such as
"SmartNICs" and Converged Network Adapters with specific offload functions
to support standards like "NVMe over Fabric" (NVMe-oF).
By default, the agent will prevent these filesystems from being deleted and
will halt the cleaning process when detected. The cleaning process can be
re-triggered via Ironic's state machine once one of the documented settings
have been used to notify the agent that no action is required.
What filesystems are looked for
-------------------------------
+-------------------------------------------+
| IBM General Parallel Filesystem |
+-------------------------------------------+
| Red Hat Global Filesystem 2 |
+-------------------------------------------+
| VmWare Virtual Machine FileSystem (VMFS) |
+-------------------------------------------+
I'm okay with deleting, how do I tell IPA to clean the disk(s)?
---------------------------------------------------------------
Four potential ways exist to signal to IPA. Please note, all of these options
require access either to the ndoe in Ironic's API or ability to modify Ironic
configuration.
Via Ironic
~~~~~~~~~~
.. note:: This option requires that the version of Ironic be sufficient enough
to understand and explicitly provide this option to the Agent.
Inform Ironic to provide the option to the Agent::
baremetal node set --driver-info wipe_special_filesystems=True
Via a node's kernel_append_params setting
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This may be set on a node level by utilizing the override
``kernel_append_params`` setting which can be utilized on a node
level. Example::
baremetal node set --driver-info kernel_append_params="ipa-guard-special-filesystems=False"
Alternatively, if you wish to set this only once, you may use
the ``instance_info`` field, which is wiped upon teardown of the node.
Example::
baremetal node set --instance-info kernel_append_params="ipa-guard-special-filesystems=False"
Via Ironic's Boot time PXE parameters (Globally)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Globally, this setting may be passed by modifying the ``ironic.conf``
configuration file on your cluster by adding
``ipa-guard-special-filesystems=False`` string to the
``[pxe]kernel_append_params`` parameter.
.. warning::
If your running a multi-conductor deployment, all of your ``ironic.conf``
configuration files will need to be updated to match.
Via Ramdisk configuration
~~~~~~~~~~~~~~~~~~~~~~~~~
This option requires modifying the ramdisk, and is the most complex, but may
be advisable if you have a mixed environment cluster where shared clustered
filesystems may be a concern on some machines, but not others.
.. warning::
This requires rebuilding your agent ramdisk, and modifying the embedded
configuration file for the ironic-python-agent. If your confused at all
by this statement, this option is not for you.
Edit /etc/ironic_python_agent/ironic_python_agent.conf and set the parameter
``[DEFAULT]guard_special_filesystems`` to ``False``.
References
==========

View File

@@ -10,6 +10,13 @@ deploying your own hardware manager.
IPA ships with :doc:`GenericHardwareManager </admin/hardware_managers>`, which
implements basic cleaning and deployment methods compatible with most hardware.
.. warning::
Some functionality inherent in the stock hardware manager cleaning methods
may be useful in custom hardware managers, but should not be inherently
expected to also work in custom managers. Examples of this are clustered
filesystem protections, and cleaning method fallback logic. Custom hardware
manager maintainers should be mindful when overriding the stock methods.
How are methods executed on HardwareManagers?
---------------------------------------------
Methods that modify hardware are dispatched to each hardware manager in
@@ -149,6 +156,18 @@ function with extra parameters.
be set to whichever manager has a higher hardware support level and then
use the higher priority in the case of a tie.
In some cases, it may be necessary to create a customized cleaning step to
take a particular pattern of behavior. Those doing such work may want to
leverage file system safety checks, which are part of the stock hardware
managers.
.. code-block:: python
def custom_erase_devices(self, node, ports):
for dev in determine_my_devices_to_erase():
hardware.safety_check_block_device(node, dev.name)
my_special_cleaning(dev.name)
Custom HardwareManagers and Deploying
-------------------------------------

View File

@@ -315,6 +315,17 @@ cli_opts = [
min=0, max=99, # 100 is when IPA is booted
help='Priority of the inject_files deploy step (disabled '
'by default), an integer between 1 and .'),
cfg.BoolOpt('guard-special-filesystems',
default=APARAMS.get('ipa-guard-special-filesystems', True),
help='Guard "special" shared device filesystems from '
'cleaning by the stock hardware manager\'s cleaning '
'methods. If one of these filesystems is detected '
'during cleaning, the cleaning process will be aborted '
'and infrastructure operator intervention may be '
'required as this option is intended to prevent '
'cleaning from inadvertently destroying a running '
'cluster which may be visible over a storage fabric '
'such as FibreChannel.'),
]
CONF.register_cli_opts(cli_opts)

View File

@@ -348,3 +348,22 @@ class HeartbeatConnectionError(IronicAPIError):
def __init__(self, details):
super(HeartbeatConnectionError, self).__init__(details)
class ProtectedDeviceError(CleaningError):
"""Error raised when a cleaning is halted due to a protected device."""
message = 'Protected device located, cleaning aborted.'
def __init__(self, device, what):
details = ('Protected %(what)s located on device %(device)s. '
'This volume or contents may be a shared block device. '
'Please consult your storage administrator, and restart '
'cleaning after either detaching the volumes, or '
'instructing IPA to not protect contents. See Ironic '
'Python Agent documentation for more information.' %
{'what': what,
'device': device})
self.message = details
super(CleaningError, self).__init__(details)

View File

@@ -924,6 +924,10 @@ class HardwareManager(object, metaclass=abc.ABCMeta):
:param node: Ironic node object
:param ports: list of Ironic port objects
:raises: ProtectedDeviceFound if a device has been identified which
may require manual intervention due to the contents and
operational risk which exists as it could also be a sign
of an environmental misconfiguration.
:return: a dictionary in the form {device.name: erasure output}
"""
erase_results = {}
@@ -937,6 +941,7 @@ class HardwareManager(object, metaclass=abc.ABCMeta):
thread_pool = ThreadPool(min(max_pool_size, len(block_devices)))
for block_device in block_devices:
params = {'node': node, 'block_device': block_device}
safety_check_block_device(node, block_device.name)
erase_results[block_device.name] = thread_pool.apply_async(
dispatch_to_managers, ('erase_block_device',), params)
thread_pool.close()
@@ -1541,6 +1546,10 @@ class GenericHardwareManager(HardwareManager):
:param ports: list of Ironic port objects
:raises BlockDeviceEraseError: when there's an error erasing the
block device
:raises: ProtectedDeviceFound if a device has been identified which
may require manual intervention due to the contents and
operational risk which exists as it could also be a sign
of an environmental misconfiguration.
"""
block_devices = self.list_block_devices(include_partitions=True)
# NOTE(coreywright): Reverse sort by device name so a partition (eg
@@ -1549,6 +1558,7 @@ class GenericHardwareManager(HardwareManager):
block_devices.sort(key=lambda dev: dev.name, reverse=True)
erase_errors = {}
for dev in self._list_erasable_devices():
safety_check_block_device(node, dev.name)
try:
disk_utils.destroy_disk_metadata(dev.name, node['uuid'])
except processutils.ProcessExecutionError as e:
@@ -1572,6 +1582,10 @@ class GenericHardwareManager(HardwareManager):
:param ports: list of Ironic port objects
:raises BlockDeviceEraseError: when there's an error erasing the
block device
:raises: ProtectedDeviceFound if a device has been identified which
may require manual intervention due to the contents and
operational risk which exists as it could also be a sign
of an environmental misconfiguration.
"""
erase_errors = {}
info = node.get('driver_internal_info', {})
@@ -1579,6 +1593,7 @@ class GenericHardwareManager(HardwareManager):
LOG.debug("No erasable devices have been found.")
return
for dev in self._list_erasable_devices():
safety_check_block_device(node, dev.name)
try:
if self._is_nvme(dev):
execute_nvme_erase = info.get(
@@ -2957,3 +2972,109 @@ def get_multipath_status():
# as if we directly try and work with the global var, we will be racing
# tests endlessly.
return MULTIPATH_ENABLED
def safety_check_block_device(node, device):
"""Performs safety checking of a block device before destroying.
In order to guard against distruction of file systems such as
shared-disk file systems
(https://en.wikipedia.org/wiki/Clustered_file_system#SHARED-DISK)
or similar filesystems where multiple distinct computers may have
unlocked concurrent IO access to the entire block device or
SAN Logical Unit Number, we need to evaluate, and block cleaning
from occuring on these filesystems *unless* we have been explicitly
configured to do so.
This is because cleaning is an intentionally distructive operation,
and once started against such a device, given the complexities of
shared disk clustered filesystems where concurrent access is a design
element, in all likelihood the entire cluster can be negatively
impacted, and an operator will be forced to recover from snapshot and
or backups of the volume's contents.
:param node: A node, or cached node object.
:param device: String representing the path to the block
device to be checked.
:raises: ProtectedDeviceError when a device is identified with
one of these known clustered filesystems, and the overall
settings have not indicated for the agent to skip such
safety checks.
"""
# NOTE(TheJulia): While this seems super rare, I found out after this
# thread of discussion started that I have customers which have done
# this and wiped out SAN volumes and their contents unintentionally
# as a result of these filesystems not being guarded.
# For those not familiar with shared disk clustered filesystems, think
# of it as like your impacting a Ceph cluster, except your suddenly
# removing the underlying disks from the OSD, and the entire cluster
# goes down.
if not CONF.guard_special_filesystems:
return
di_info = node.get('driver_internal_info', {})
if not di_info.get('wipe_special_filesystems', True):
return
report, _e = il_utils.execute('lsblk', '-Pbia',
'-oFSTYPE,UUID,PTUUID,PARTTYPE,PARTUUID',
device)
lines = report.splitlines()
identified_fs_types = []
identified_ids = []
for line in lines:
device = {}
# Split into KEY=VAL pairs
vals = shlex.split(line)
if not vals:
continue
for key, val in (v.split('=', 1) for v in vals):
if key == 'FSTYPE':
identified_fs_types.append(val)
if key in ['UUID', 'PTUUID', 'PARTTYPE', 'PARTUUID']:
identified_ids.append(val)
# Ignore block types not specified
_check_for_special_partitions_filesystems(
device,
identified_ids,
identified_fs_types)
def _check_for_special_partitions_filesystems(device, ids, fs_types):
"""Compare supplied IDs, Types to known items, and raise if found.
:param device: The block device in use, specificially for logging.
:param ids: A list above IDs found to check.
:param fs_types: A list of FS types found to check.
:raises: ProtectedDeviceError should a partition label or metadata
be discovered which suggests a shared disk clustered filesystem
has been discovered.
"""
guarded_ids = {
# Apparently GPFS can used shared volumes....
'37AFFC90-EF7D-4E96-91C3-2D7AE055B174': 'IBM GPFS Partition',
# Shared volume parallel filesystem
'AA31E02A-400F-11DB-9590-000C2911D1B8': 'VMware VMFS Partition (GPT)',
'0xfb': 'VMware VMFS Partition (MBR)',
}
for key, value in guarded_ids.items():
for id_value in ids:
if key == id_value:
raise errors.ProtectedDeviceError(
device=device,
what=value)
guarded_fs_types = {
'gfs2': 'Red Hat Global File System 2',
}
for key, value in guarded_fs_types.items():
for fs in fs_types:
if key == fs:
raise errors.ProtectedDeviceError(
device=device,
what=value)

View File

@@ -1710,10 +1710,12 @@ class TestGenericHardwareManager(base.IronicAgentTest):
mocked_listdir.assert_has_calls(expected_calls)
mocked_mpath.assert_called_once_with()
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
@mock.patch.object(hardware, 'ThreadPool', autospec=True)
@mock.patch.object(hardware, 'dispatch_to_managers', autospec=True)
def test_erase_devices_no_parallel_by_default(self, mocked_dispatch,
mock_threadpool):
mock_threadpool,
mock_safety_check):
# NOTE(TheJulia): This test was previously more elaborate and
# had a high failure rate on py37 and py38. So instead, lets just
@@ -1732,10 +1734,43 @@ class TestGenericHardwareManager(base.IronicAgentTest):
calls = [mock.call(1)]
self.hardware.erase_devices({}, [])
mock_threadpool.assert_has_calls(calls)
mock_safety_check.assert_has_calls([
mock.call({}, '/dev/sdj'),
mock.call({}, '/dev/hdaa')
])
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
@mock.patch.object(hardware, 'ThreadPool', autospec=True)
@mock.patch.object(hardware, 'dispatch_to_managers', autospec=True)
def test_erase_devices_no_parallel_by_default_protected_device(
self, mocked_dispatch,
mock_threadpool,
mock_safety_check):
mock_safety_check.side_effect = errors.ProtectedDeviceError(
device='foo',
what='bar')
self.hardware.list_block_devices = mock.Mock()
self.hardware.list_block_devices.return_value = [
hardware.BlockDevice('/dev/sdj', 'big', 1073741824, True),
hardware.BlockDevice('/dev/hdaa', 'small', 65535, False),
]
calls = [mock.call(1)]
self.assertRaises(errors.ProtectedDeviceError,
self.hardware.erase_devices, {}, [])
mock_threadpool.assert_has_calls(calls)
mock_safety_check.assert_has_calls([
mock.call({}, '/dev/sdj'),
])
mock_threadpool.apply_async.assert_not_called()
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
@mock.patch('multiprocessing.pool.ThreadPool.apply_async', autospec=True)
@mock.patch.object(hardware, 'dispatch_to_managers', autospec=True)
def test_erase_devices_concurrency(self, mocked_dispatch, mocked_async):
def test_erase_devices_concurrency(self, mocked_dispatch, mocked_async,
mock_safety_check):
internal_info = self.node['driver_internal_info']
internal_info['disk_erasure_concurrency'] = 10
mocked_dispatch.return_value = 'erased device'
@@ -1760,9 +1795,15 @@ class TestGenericHardwareManager(base.IronicAgentTest):
for dev in (blkdev1, blkdev2)]
mocked_async.assert_has_calls(calls)
self.assertEqual(expected, result)
mock_safety_check.assert_has_calls([
mock.call(self.node, '/dev/sdj'),
mock.call(self.node, '/dev/hdaa'),
])
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
@mock.patch.object(hardware, 'ThreadPool', autospec=True)
def test_erase_devices_concurrency_pool_size(self, mocked_pool):
def test_erase_devices_concurrency_pool_size(self, mocked_pool,
mock_safety_check):
self.hardware.list_block_devices = mock.Mock()
self.hardware.list_block_devices.return_value = [
hardware.BlockDevice('/dev/sdj', 'big', 1073741824, True),
@@ -1782,6 +1823,10 @@ class TestGenericHardwareManager(base.IronicAgentTest):
self.hardware.erase_devices(self.node, [])
mocked_pool.assert_called_with(1)
mock_safety_check.assert_has_calls([
mock.call(self.node, '/dev/sdj'),
mock.call(self.node, '/dev/hdaa'),
])
@mock.patch.object(hardware, 'dispatch_to_managers', autospec=True)
def test_erase_devices_without_disk(self, mocked_dispatch):
@@ -2441,6 +2486,7 @@ class TestGenericHardwareManager(base.IronicAgentTest):
mock.call('/sys/fs/pstore/' + arg) for arg in pstore_entries
])
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
@mock.patch.object(il_utils, 'execute', autospec=True)
@mock.patch.object(disk_utils, 'destroy_disk_metadata', autospec=True)
@mock.patch.object(hardware.GenericHardwareManager,
@@ -2449,7 +2495,7 @@ class TestGenericHardwareManager(base.IronicAgentTest):
'_list_erasable_devices', autospec=True)
def test_erase_devices_express(
self, mock_list_erasable_devices, mock_nvme_erase,
mock_destroy_disk_metadata, mock_execute):
mock_destroy_disk_metadata, mock_execute, mock_safety_check):
block_devices = [
hardware.BlockDevice('/dev/sda', 'sata', 65535, False),
hardware.BlockDevice('/dev/md0', 'raid-device', 32767, False),
@@ -2466,7 +2512,44 @@ class TestGenericHardwareManager(base.IronicAgentTest):
mock.call('/dev/md0', self.node['uuid'])],
mock_destroy_disk_metadata.call_args_list)
mock_list_erasable_devices.assert_called_with(self.hardware)
mock_safety_check.assert_has_calls([
mock.call(self.node, '/dev/sda'),
mock.call(self.node, '/dev/md0'),
mock.call(self.node, '/dev/nvme0n1'),
mock.call(self.node, '/dev/nvme1n1')
])
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
@mock.patch.object(il_utils, 'execute', autospec=True)
@mock.patch.object(disk_utils, 'destroy_disk_metadata', autospec=True)
@mock.patch.object(hardware.GenericHardwareManager,
'_nvme_erase', autospec=True)
@mock.patch.object(hardware.GenericHardwareManager,
'_list_erasable_devices', autospec=True)
def test_erase_devices_express_stops_on_safety_failure(
self, mock_list_erasable_devices, mock_nvme_erase,
mock_destroy_disk_metadata, mock_execute, mock_safety_check):
mock_safety_check.side_effect = errors.ProtectedDeviceError(
device='foo',
what='bar')
block_devices = [
hardware.BlockDevice('/dev/sda', 'sata', 65535, False),
hardware.BlockDevice('/dev/md0', 'raid-device', 32767, False),
hardware.BlockDevice('/dev/nvme0n1', 'nvme', 32767, False),
hardware.BlockDevice('/dev/nvme1n1', 'nvme', 32767, False)
]
mock_list_erasable_devices.return_value = list(block_devices)
self.assertRaises(errors.ProtectedDeviceError,
self.hardware.erase_devices_express, self.node, [])
mock_nvme_erase.assert_not_called()
mock_destroy_disk_metadata.assert_not_called()
mock_list_erasable_devices.assert_called_with(self.hardware)
mock_safety_check.assert_has_calls([
mock.call(self.node, '/dev/sda')
])
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
@mock.patch.object(il_utils, 'execute', autospec=True)
@mock.patch.object(hardware.GenericHardwareManager,
'_is_virtual_media_device', autospec=True)
@@ -2475,7 +2558,7 @@ class TestGenericHardwareManager(base.IronicAgentTest):
@mock.patch.object(disk_utils, 'destroy_disk_metadata', autospec=True)
def test_erase_devices_metadata(
self, mock_metadata, mock_list_devs, mock__is_vmedia,
mock_execute):
mock_execute, mock_safety_check):
block_devices = [
hardware.BlockDevice('/dev/sr0', 'vmedia', 12345, True),
hardware.BlockDevice('/dev/sdb2', 'raid-member', 32767, False),
@@ -2509,7 +2592,62 @@ class TestGenericHardwareManager(base.IronicAgentTest):
mock.call(self.hardware, block_devices[2]),
mock.call(self.hardware, block_devices[5])],
mock__is_vmedia.call_args_list)
mock_safety_check.assert_has_calls([
mock.call(self.node, '/dev/sda1'),
mock.call(self.node, '/dev/sda'),
# This is kind of redundant code/pattern behavior wise
# but you never know what someone has done...
mock.call(self.node, '/dev/md0')
])
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
@mock.patch.object(il_utils, 'execute', autospec=True)
@mock.patch.object(hardware.GenericHardwareManager,
'_is_virtual_media_device', autospec=True)
@mock.patch.object(hardware.GenericHardwareManager,
'list_block_devices', autospec=True)
@mock.patch.object(disk_utils, 'destroy_disk_metadata', autospec=True)
def test_erase_devices_metadata_safety_check(
self, mock_metadata, mock_list_devs, mock__is_vmedia,
mock_execute, mock_safety_check):
block_devices = [
hardware.BlockDevice('/dev/sr0', 'vmedia', 12345, True),
hardware.BlockDevice('/dev/sdb2', 'raid-member', 32767, False),
hardware.BlockDevice('/dev/sda', 'small', 65535, False),
hardware.BlockDevice('/dev/sda1', '', 32767, False),
hardware.BlockDevice('/dev/sda2', 'raid-member', 32767, False),
hardware.BlockDevice('/dev/md0', 'raid-device', 32767, False)
]
# NOTE(coreywright): Don't return the list, but a copy of it, because
# we depend on its elements' order when referencing it later during
# verification, but the method under test sorts the list changing it.
mock_list_devs.return_value = list(block_devices)
mock__is_vmedia.side_effect = lambda _, dev: dev.name == '/dev/sr0'
mock_execute.side_effect = [
('sdb2 linux_raid_member host:1 f9978968', ''),
('sda2 linux_raid_member host:1 f9978969', ''),
('sda1', ''), ('sda', ''), ('md0', '')]
mock_safety_check.side_effect = [
None,
errors.ProtectedDeviceError(
device='foo',
what='bar')
]
self.assertRaises(errors.ProtectedDeviceError,
self.hardware.erase_devices_metadata,
self.node, [])
self.assertEqual([mock.call('/dev/sda1', self.node['uuid'])],
mock_metadata.call_args_list)
mock_list_devs.assert_called_with(self.hardware,
include_partitions=True)
mock_safety_check.assert_has_calls([
mock.call(self.node, '/dev/sda1'),
mock.call(self.node, '/dev/sda'),
])
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
@mock.patch.object(hardware.GenericHardwareManager,
'_is_linux_raid_member', autospec=True)
@mock.patch.object(hardware.GenericHardwareManager,
@@ -2519,7 +2657,7 @@ class TestGenericHardwareManager(base.IronicAgentTest):
@mock.patch.object(disk_utils, 'destroy_disk_metadata', autospec=True)
def test_erase_devices_metadata_error(
self, mock_metadata, mock_list_devs, mock__is_vmedia,
mock__is_raid_member):
mock__is_raid_member, mock_safety_check):
block_devices = [
hardware.BlockDevice('/dev/sda', 'small', 65535, False),
hardware.BlockDevice('/dev/sdb', 'big', 10737418240, True),
@@ -2553,6 +2691,10 @@ class TestGenericHardwareManager(base.IronicAgentTest):
self.assertEqual([mock.call(self.hardware, block_devices[1]),
mock.call(self.hardware, block_devices[0])],
mock__is_vmedia.call_args_list)
mock_safety_check.assert_has_calls([
mock.call(self.node, '/dev/sdb'),
mock.call(self.node, '/dev/sda')
])
@mock.patch.object(il_utils, 'execute', autospec=True)
def test__is_linux_raid_member(self, mocked_execute):
@@ -4991,3 +5133,51 @@ class TestAPIClientSaveAndUse(base.IronicAgentTest):
calls = [mock.call('list_hardware_info'),
mock.call('wait_for_disks')]
mock_dispatch.assert_has_calls(calls)
@mock.patch.object(il_utils, 'execute', autospec=True)
class TestProtectedDiskSafetyChecks(base.IronicAgentTest):
def test_special_filesystem_guard_not_enabled(self, mock_execute):
CONF.set_override('guard_special_filesystems', False)
hardware.safety_check_block_device({}, '/dev/foo')
mock_execute.assert_not_called()
def test_special_filesystem_guard_node_indicates_skip(self, mock_execute):
node = {
'driver_internal_info': {
'wipe_special_filesystems': False
}
}
mock_execute.return_value = ('', '')
hardware.safety_check_block_device(node, '/dev/foo')
mock_execute.assert_not_called()
def test_special_filesystem_guard_enabled_no_results(self, mock_execute):
mock_execute.return_value = ('', '')
hardware.safety_check_block_device({}, '/dev/foo')
def test_special_filesystem_guard_raises(self, mock_execute):
GFS2 = 'FSTYPE="gfs2"\n'
GPFS1 = 'UUID="37AFFC90-EF7D-4E96-91C3-2D7AE055B174"\n'
GPFS2 = 'PTUUID="37AFFC90-EF7D-4E96-91C3-2D7AE055B174"\n'
GPFS3 = 'PARTTYPE="37AFFC90-EF7D-4E96-91C3-2D7AE055B174"\n'
GPFS4 = 'PARTUUID="37AFFC90-EF7D-4E96-91C3-2D7AE055B174"\n'
VMFS1 = 'UUID="AA31E02A-400F-11DB-9590-000C2911D1B8"\n'
VMFS2 = 'UUID="AA31E02A-400F-11DB-9590-000C2911D1B8"\n'
VMFS3 = 'UUID="AA31E02A-400F-11DB-9590-000C2911D1B8"\n'
VMFS4 = 'UUID="AA31E02A-400F-11DB-9590-000C2911D1B8"\n'
VMFS5 = 'UUID="0xfb"\n'
VMFS6 = 'PTUUID="0xfb"\n'
VMFS7 = 'PARTTYPE="0xfb"\n'
VMFS8 = 'PARTUUID="0xfb"\n'
expected_failures = [GFS2, GPFS1, GPFS2, GPFS3, GPFS4, VMFS1, VMFS2,
VMFS3, VMFS4, VMFS5, VMFS6, VMFS7, VMFS8]
for failure in expected_failures:
mock_execute.reset_mock()
mock_execute.return_value = (failure, '')
self.assertRaises(errors.ProtectedDeviceError,
hardware.safety_check_block_device,
{}, '/dev/foo')
self.assertEqual(1, mock_execute.call_count)

View File

@@ -0,0 +1,20 @@
---
fixes:
- |
Previously when the ``ironic-python-agent`` would undergo erasure of block
devices during cleaning, it would automatically attempt to erase the
contents of any "Shared Device" clustered filesystems which may be in use
by distinct multiple machines over a storage fabric. In particular
IBM GPFS, Red Hat Global File System 2, and VmWare Virtual Machine File
System (VMFS), are now identified and cleaning is halted. This is important
because should an active cluster be using the this disk, cleaning could
potentially cause the cluster to go down forcing restoration from backups.
Ideally, infrastructure operators should check their environment's storage
configuration and un-map any clustered filesystems from being visible to
Ironic nodes, unless explicitly needed and expected. Please see the
Ironic-Python-Agent troubleshooting documentation for more information.
issues:
- |
Logic to guard VMFS filesystems from being destroyed *may* not recognize
VMFS extents. Operators with examples of partitioning for extent usage
are encouraged to contact the Ironic community.