Merge "Guard shared device/cluster filesystems"
This commit is contained in:
@@ -103,3 +103,23 @@ Clean steps
|
||||
Delete the RAID configuration. This step belongs to the ``raid`` interface
|
||||
and must be used through the :ironic-doc:`ironic RAID feature
|
||||
<admin/raid.html>`.
|
||||
|
||||
Cleaning safeguards
|
||||
-------------------
|
||||
|
||||
The stock hardware manager contains a number of safeguards to prevent
|
||||
unsafe conditions from occuring.
|
||||
|
||||
Shared Disk Cluster Filesystems
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Commonly used shared disk cluster filesystems, when detected, causes cleaning
|
||||
processes on stock hardware manager methods to abort prior to destroying the
|
||||
contents on the disk.
|
||||
|
||||
These filesystems include IBM General Parallel File System (GPFS),
|
||||
VmWare Virtual Machine File System (VMFS), and Red Hat Global File System
|
||||
(GFS2).
|
||||
|
||||
For information on troubleshooting, and disabling this check,
|
||||
see :doc:`/admin/troubleshooting`.
|
||||
|
@@ -24,7 +24,7 @@ image boots. Kernel command line parameters are used to do this.
|
||||
|
||||
dynamic-login element example:
|
||||
|
||||
- Add ``sshkey="ssh-rsa BBA1..."`` to pxe_append_params setting in
|
||||
- Add ``sshkey="ssh-rsa BBA1..."`` to kernel_append_params setting in
|
||||
the ``ironic.conf`` file
|
||||
- Restart the ironic-conductor with the command
|
||||
``service ironic-conductor restart``
|
||||
@@ -76,7 +76,7 @@ are used to do this.
|
||||
dynamic-login element example::
|
||||
|
||||
Generate a ENCRYPTED_PASSWORD with the openssl passwd -1 command
|
||||
Add rootpwd="$ENCRYPTED_PASSWORD" value on the pxe_append_params setting in /etc/ironic/ironic.conf
|
||||
Add rootpwd="$ENCRYPTED_PASSWORD" value on the kernel_append_params setting in /etc/ironic/ironic.conf
|
||||
Restart the ironic-conductor with the command service ironic-conductor restart
|
||||
|
||||
Users can also be added to DIB built IPA images with the devuser element [1]_
|
||||
@@ -118,7 +118,7 @@ Set IPA to debug logging
|
||||
Debug logging can be enabled a several different ways. The easiest way is to
|
||||
add ``ipa-debug=1`` to the kernel command line. To do this:
|
||||
|
||||
- Append ``ipa-debug=1`` to the pxe_append_params setting in the
|
||||
- Append ``ipa-debug=1`` to the kernel_append_params setting in the
|
||||
``ironic.conf`` file
|
||||
- Restart the ironic-conductor with the command
|
||||
``service ironic-conductor restart``
|
||||
@@ -161,6 +161,96 @@ If the system uses systemd then systemctl can be used to restart the service::
|
||||
|
||||
sudo systemctl restart ironic-python-agent.service
|
||||
|
||||
Cleaning halted with ProtectedDeviceError
|
||||
=========================================
|
||||
|
||||
The IPA service has halted cleaning as one of the block devices within or
|
||||
attached to the bare metal node contains a class of filesystem which **MAY**
|
||||
cause irreparable harm to a potentially running cluster if accidently removed.
|
||||
|
||||
These filesystems *may* be used for only local storage and as a result be
|
||||
safe to erase. However if a shared block device is in use, such as a device
|
||||
supplied via a Storage Area Network utilizing protocols such as iSCSI or
|
||||
FibreChannel. Ultimately the Host Bus Adapter (HBA) may not be an entirely
|
||||
"detectable" entity given the hardware market place and aspects such as
|
||||
"SmartNICs" and Converged Network Adapters with specific offload functions
|
||||
to support standards like "NVMe over Fabric" (NVMe-oF).
|
||||
|
||||
By default, the agent will prevent these filesystems from being deleted and
|
||||
will halt the cleaning process when detected. The cleaning process can be
|
||||
re-triggered via Ironic's state machine once one of the documented settings
|
||||
have been used to notify the agent that no action is required.
|
||||
|
||||
What filesystems are looked for
|
||||
-------------------------------
|
||||
|
||||
+-------------------------------------------+
|
||||
| IBM General Parallel Filesystem |
|
||||
+-------------------------------------------+
|
||||
| Red Hat Global Filesystem 2 |
|
||||
+-------------------------------------------+
|
||||
| VmWare Virtual Machine FileSystem (VMFS) |
|
||||
+-------------------------------------------+
|
||||
|
||||
I'm okay with deleting, how do I tell IPA to clean the disk(s)?
|
||||
---------------------------------------------------------------
|
||||
|
||||
Four potential ways exist to signal to IPA. Please note, all of these options
|
||||
require access either to the ndoe in Ironic's API or ability to modify Ironic
|
||||
configuration.
|
||||
|
||||
Via Ironic
|
||||
~~~~~~~~~~
|
||||
|
||||
.. note:: This option requires that the version of Ironic be sufficient enough
|
||||
to understand and explicitly provide this option to the Agent.
|
||||
|
||||
Inform Ironic to provide the option to the Agent::
|
||||
|
||||
baremetal node set --driver-info wipe_special_filesystems=True
|
||||
|
||||
Via a node's kernel_append_params setting
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This may be set on a node level by utilizing the override
|
||||
``kernel_append_params`` setting which can be utilized on a node
|
||||
level. Example::
|
||||
|
||||
baremetal node set --driver-info kernel_append_params="ipa-guard-special-filesystems=False"
|
||||
|
||||
Alternatively, if you wish to set this only once, you may use
|
||||
the ``instance_info`` field, which is wiped upon teardown of the node.
|
||||
Example::
|
||||
|
||||
baremetal node set --instance-info kernel_append_params="ipa-guard-special-filesystems=False"
|
||||
|
||||
Via Ironic's Boot time PXE parameters (Globally)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Globally, this setting may be passed by modifying the ``ironic.conf``
|
||||
configuration file on your cluster by adding
|
||||
``ipa-guard-special-filesystems=False`` string to the
|
||||
``[pxe]kernel_append_params`` parameter.
|
||||
|
||||
.. warning::
|
||||
If your running a multi-conductor deployment, all of your ``ironic.conf``
|
||||
configuration files will need to be updated to match.
|
||||
|
||||
Via Ramdisk configuration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This option requires modifying the ramdisk, and is the most complex, but may
|
||||
be advisable if you have a mixed environment cluster where shared clustered
|
||||
filesystems may be a concern on some machines, but not others.
|
||||
|
||||
.. warning::
|
||||
This requires rebuilding your agent ramdisk, and modifying the embedded
|
||||
configuration file for the ironic-python-agent. If your confused at all
|
||||
by this statement, this option is not for you.
|
||||
|
||||
Edit /etc/ironic_python_agent/ironic_python_agent.conf and set the parameter
|
||||
``[DEFAULT]guard_special_filesystems`` to ``False``.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
@@ -10,6 +10,13 @@ deploying your own hardware manager.
|
||||
IPA ships with :doc:`GenericHardwareManager </admin/hardware_managers>`, which
|
||||
implements basic cleaning and deployment methods compatible with most hardware.
|
||||
|
||||
.. warning::
|
||||
Some functionality inherent in the stock hardware manager cleaning methods
|
||||
may be useful in custom hardware managers, but should not be inherently
|
||||
expected to also work in custom managers. Examples of this are clustered
|
||||
filesystem protections, and cleaning method fallback logic. Custom hardware
|
||||
manager maintainers should be mindful when overriding the stock methods.
|
||||
|
||||
How are methods executed on HardwareManagers?
|
||||
---------------------------------------------
|
||||
Methods that modify hardware are dispatched to each hardware manager in
|
||||
@@ -149,6 +156,18 @@ function with extra parameters.
|
||||
be set to whichever manager has a higher hardware support level and then
|
||||
use the higher priority in the case of a tie.
|
||||
|
||||
In some cases, it may be necessary to create a customized cleaning step to
|
||||
take a particular pattern of behavior. Those doing such work may want to
|
||||
leverage file system safety checks, which are part of the stock hardware
|
||||
managers.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def custom_erase_devices(self, node, ports):
|
||||
for dev in determine_my_devices_to_erase():
|
||||
hardware.safety_check_block_device(node, dev.name)
|
||||
my_special_cleaning(dev.name)
|
||||
|
||||
Custom HardwareManagers and Deploying
|
||||
-------------------------------------
|
||||
|
||||
|
@@ -315,6 +315,17 @@ cli_opts = [
|
||||
min=0, max=99, # 100 is when IPA is booted
|
||||
help='Priority of the inject_files deploy step (disabled '
|
||||
'by default), an integer between 1 and .'),
|
||||
cfg.BoolOpt('guard-special-filesystems',
|
||||
default=APARAMS.get('ipa-guard-special-filesystems', True),
|
||||
help='Guard "special" shared device filesystems from '
|
||||
'cleaning by the stock hardware manager\'s cleaning '
|
||||
'methods. If one of these filesystems is detected '
|
||||
'during cleaning, the cleaning process will be aborted '
|
||||
'and infrastructure operator intervention may be '
|
||||
'required as this option is intended to prevent '
|
||||
'cleaning from inadvertently destroying a running '
|
||||
'cluster which may be visible over a storage fabric '
|
||||
'such as FibreChannel.'),
|
||||
]
|
||||
|
||||
CONF.register_cli_opts(cli_opts)
|
||||
|
@@ -348,3 +348,22 @@ class HeartbeatConnectionError(IronicAPIError):
|
||||
|
||||
def __init__(self, details):
|
||||
super(HeartbeatConnectionError, self).__init__(details)
|
||||
|
||||
|
||||
class ProtectedDeviceError(CleaningError):
|
||||
"""Error raised when a cleaning is halted due to a protected device."""
|
||||
|
||||
message = 'Protected device located, cleaning aborted.'
|
||||
|
||||
def __init__(self, device, what):
|
||||
details = ('Protected %(what)s located on device %(device)s. '
|
||||
'This volume or contents may be a shared block device. '
|
||||
'Please consult your storage administrator, and restart '
|
||||
'cleaning after either detaching the volumes, or '
|
||||
'instructing IPA to not protect contents. See Ironic '
|
||||
'Python Agent documentation for more information.' %
|
||||
{'what': what,
|
||||
'device': device})
|
||||
|
||||
self.message = details
|
||||
super(CleaningError, self).__init__(details)
|
||||
|
@@ -924,6 +924,10 @@ class HardwareManager(object, metaclass=abc.ABCMeta):
|
||||
|
||||
:param node: Ironic node object
|
||||
:param ports: list of Ironic port objects
|
||||
:raises: ProtectedDeviceFound if a device has been identified which
|
||||
may require manual intervention due to the contents and
|
||||
operational risk which exists as it could also be a sign
|
||||
of an environmental misconfiguration.
|
||||
:return: a dictionary in the form {device.name: erasure output}
|
||||
"""
|
||||
erase_results = {}
|
||||
@@ -937,6 +941,7 @@ class HardwareManager(object, metaclass=abc.ABCMeta):
|
||||
thread_pool = ThreadPool(min(max_pool_size, len(block_devices)))
|
||||
for block_device in block_devices:
|
||||
params = {'node': node, 'block_device': block_device}
|
||||
safety_check_block_device(node, block_device.name)
|
||||
erase_results[block_device.name] = thread_pool.apply_async(
|
||||
dispatch_to_managers, ('erase_block_device',), params)
|
||||
thread_pool.close()
|
||||
@@ -1541,6 +1546,10 @@ class GenericHardwareManager(HardwareManager):
|
||||
:param ports: list of Ironic port objects
|
||||
:raises BlockDeviceEraseError: when there's an error erasing the
|
||||
block device
|
||||
:raises: ProtectedDeviceFound if a device has been identified which
|
||||
may require manual intervention due to the contents and
|
||||
operational risk which exists as it could also be a sign
|
||||
of an environmental misconfiguration.
|
||||
"""
|
||||
block_devices = self.list_block_devices(include_partitions=True)
|
||||
# NOTE(coreywright): Reverse sort by device name so a partition (eg
|
||||
@@ -1549,6 +1558,7 @@ class GenericHardwareManager(HardwareManager):
|
||||
block_devices.sort(key=lambda dev: dev.name, reverse=True)
|
||||
erase_errors = {}
|
||||
for dev in self._list_erasable_devices():
|
||||
safety_check_block_device(node, dev.name)
|
||||
try:
|
||||
disk_utils.destroy_disk_metadata(dev.name, node['uuid'])
|
||||
except processutils.ProcessExecutionError as e:
|
||||
@@ -1572,6 +1582,10 @@ class GenericHardwareManager(HardwareManager):
|
||||
:param ports: list of Ironic port objects
|
||||
:raises BlockDeviceEraseError: when there's an error erasing the
|
||||
block device
|
||||
:raises: ProtectedDeviceFound if a device has been identified which
|
||||
may require manual intervention due to the contents and
|
||||
operational risk which exists as it could also be a sign
|
||||
of an environmental misconfiguration.
|
||||
"""
|
||||
erase_errors = {}
|
||||
info = node.get('driver_internal_info', {})
|
||||
@@ -1579,6 +1593,7 @@ class GenericHardwareManager(HardwareManager):
|
||||
LOG.debug("No erasable devices have been found.")
|
||||
return
|
||||
for dev in self._list_erasable_devices():
|
||||
safety_check_block_device(node, dev.name)
|
||||
try:
|
||||
if self._is_nvme(dev):
|
||||
execute_nvme_erase = info.get(
|
||||
@@ -2957,3 +2972,109 @@ def get_multipath_status():
|
||||
# as if we directly try and work with the global var, we will be racing
|
||||
# tests endlessly.
|
||||
return MULTIPATH_ENABLED
|
||||
|
||||
|
||||
def safety_check_block_device(node, device):
|
||||
"""Performs safety checking of a block device before destroying.
|
||||
|
||||
In order to guard against distruction of file systems such as
|
||||
shared-disk file systems
|
||||
(https://en.wikipedia.org/wiki/Clustered_file_system#SHARED-DISK)
|
||||
or similar filesystems where multiple distinct computers may have
|
||||
unlocked concurrent IO access to the entire block device or
|
||||
SAN Logical Unit Number, we need to evaluate, and block cleaning
|
||||
from occuring on these filesystems *unless* we have been explicitly
|
||||
configured to do so.
|
||||
|
||||
This is because cleaning is an intentionally distructive operation,
|
||||
and once started against such a device, given the complexities of
|
||||
shared disk clustered filesystems where concurrent access is a design
|
||||
element, in all likelihood the entire cluster can be negatively
|
||||
impacted, and an operator will be forced to recover from snapshot and
|
||||
or backups of the volume's contents.
|
||||
|
||||
:param node: A node, or cached node object.
|
||||
:param device: String representing the path to the block
|
||||
device to be checked.
|
||||
:raises: ProtectedDeviceError when a device is identified with
|
||||
one of these known clustered filesystems, and the overall
|
||||
settings have not indicated for the agent to skip such
|
||||
safety checks.
|
||||
"""
|
||||
|
||||
# NOTE(TheJulia): While this seems super rare, I found out after this
|
||||
# thread of discussion started that I have customers which have done
|
||||
# this and wiped out SAN volumes and their contents unintentionally
|
||||
# as a result of these filesystems not being guarded.
|
||||
# For those not familiar with shared disk clustered filesystems, think
|
||||
# of it as like your impacting a Ceph cluster, except your suddenly
|
||||
# removing the underlying disks from the OSD, and the entire cluster
|
||||
# goes down.
|
||||
|
||||
if not CONF.guard_special_filesystems:
|
||||
return
|
||||
di_info = node.get('driver_internal_info', {})
|
||||
if not di_info.get('wipe_special_filesystems', True):
|
||||
return
|
||||
report, _e = il_utils.execute('lsblk', '-Pbia',
|
||||
'-oFSTYPE,UUID,PTUUID,PARTTYPE,PARTUUID',
|
||||
device)
|
||||
|
||||
lines = report.splitlines()
|
||||
|
||||
identified_fs_types = []
|
||||
identified_ids = []
|
||||
for line in lines:
|
||||
device = {}
|
||||
# Split into KEY=VAL pairs
|
||||
vals = shlex.split(line)
|
||||
if not vals:
|
||||
continue
|
||||
for key, val in (v.split('=', 1) for v in vals):
|
||||
if key == 'FSTYPE':
|
||||
identified_fs_types.append(val)
|
||||
if key in ['UUID', 'PTUUID', 'PARTTYPE', 'PARTUUID']:
|
||||
identified_ids.append(val)
|
||||
# Ignore block types not specified
|
||||
|
||||
_check_for_special_partitions_filesystems(
|
||||
device,
|
||||
identified_ids,
|
||||
identified_fs_types)
|
||||
|
||||
|
||||
def _check_for_special_partitions_filesystems(device, ids, fs_types):
|
||||
"""Compare supplied IDs, Types to known items, and raise if found.
|
||||
|
||||
:param device: The block device in use, specificially for logging.
|
||||
:param ids: A list above IDs found to check.
|
||||
:param fs_types: A list of FS types found to check.
|
||||
:raises: ProtectedDeviceError should a partition label or metadata
|
||||
be discovered which suggests a shared disk clustered filesystem
|
||||
has been discovered.
|
||||
"""
|
||||
|
||||
guarded_ids = {
|
||||
# Apparently GPFS can used shared volumes....
|
||||
'37AFFC90-EF7D-4E96-91C3-2D7AE055B174': 'IBM GPFS Partition',
|
||||
# Shared volume parallel filesystem
|
||||
'AA31E02A-400F-11DB-9590-000C2911D1B8': 'VMware VMFS Partition (GPT)',
|
||||
'0xfb': 'VMware VMFS Partition (MBR)',
|
||||
}
|
||||
for key, value in guarded_ids.items():
|
||||
for id_value in ids:
|
||||
if key == id_value:
|
||||
raise errors.ProtectedDeviceError(
|
||||
device=device,
|
||||
what=value)
|
||||
|
||||
guarded_fs_types = {
|
||||
'gfs2': 'Red Hat Global File System 2',
|
||||
}
|
||||
|
||||
for key, value in guarded_fs_types.items():
|
||||
for fs in fs_types:
|
||||
if key == fs:
|
||||
raise errors.ProtectedDeviceError(
|
||||
device=device,
|
||||
what=value)
|
||||
|
@@ -1710,10 +1710,12 @@ class TestGenericHardwareManager(base.IronicAgentTest):
|
||||
mocked_listdir.assert_has_calls(expected_calls)
|
||||
mocked_mpath.assert_called_once_with()
|
||||
|
||||
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
|
||||
@mock.patch.object(hardware, 'ThreadPool', autospec=True)
|
||||
@mock.patch.object(hardware, 'dispatch_to_managers', autospec=True)
|
||||
def test_erase_devices_no_parallel_by_default(self, mocked_dispatch,
|
||||
mock_threadpool):
|
||||
mock_threadpool,
|
||||
mock_safety_check):
|
||||
|
||||
# NOTE(TheJulia): This test was previously more elaborate and
|
||||
# had a high failure rate on py37 and py38. So instead, lets just
|
||||
@@ -1732,10 +1734,43 @@ class TestGenericHardwareManager(base.IronicAgentTest):
|
||||
calls = [mock.call(1)]
|
||||
self.hardware.erase_devices({}, [])
|
||||
mock_threadpool.assert_has_calls(calls)
|
||||
mock_safety_check.assert_has_calls([
|
||||
mock.call({}, '/dev/sdj'),
|
||||
mock.call({}, '/dev/hdaa')
|
||||
])
|
||||
|
||||
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
|
||||
@mock.patch.object(hardware, 'ThreadPool', autospec=True)
|
||||
@mock.patch.object(hardware, 'dispatch_to_managers', autospec=True)
|
||||
def test_erase_devices_no_parallel_by_default_protected_device(
|
||||
self, mocked_dispatch,
|
||||
mock_threadpool,
|
||||
mock_safety_check):
|
||||
mock_safety_check.side_effect = errors.ProtectedDeviceError(
|
||||
device='foo',
|
||||
what='bar')
|
||||
|
||||
self.hardware.list_block_devices = mock.Mock()
|
||||
|
||||
self.hardware.list_block_devices.return_value = [
|
||||
hardware.BlockDevice('/dev/sdj', 'big', 1073741824, True),
|
||||
hardware.BlockDevice('/dev/hdaa', 'small', 65535, False),
|
||||
]
|
||||
|
||||
calls = [mock.call(1)]
|
||||
self.assertRaises(errors.ProtectedDeviceError,
|
||||
self.hardware.erase_devices, {}, [])
|
||||
mock_threadpool.assert_has_calls(calls)
|
||||
mock_safety_check.assert_has_calls([
|
||||
mock.call({}, '/dev/sdj'),
|
||||
])
|
||||
mock_threadpool.apply_async.assert_not_called()
|
||||
|
||||
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
|
||||
@mock.patch('multiprocessing.pool.ThreadPool.apply_async', autospec=True)
|
||||
@mock.patch.object(hardware, 'dispatch_to_managers', autospec=True)
|
||||
def test_erase_devices_concurrency(self, mocked_dispatch, mocked_async):
|
||||
def test_erase_devices_concurrency(self, mocked_dispatch, mocked_async,
|
||||
mock_safety_check):
|
||||
internal_info = self.node['driver_internal_info']
|
||||
internal_info['disk_erasure_concurrency'] = 10
|
||||
mocked_dispatch.return_value = 'erased device'
|
||||
@@ -1760,9 +1795,15 @@ class TestGenericHardwareManager(base.IronicAgentTest):
|
||||
for dev in (blkdev1, blkdev2)]
|
||||
mocked_async.assert_has_calls(calls)
|
||||
self.assertEqual(expected, result)
|
||||
mock_safety_check.assert_has_calls([
|
||||
mock.call(self.node, '/dev/sdj'),
|
||||
mock.call(self.node, '/dev/hdaa'),
|
||||
])
|
||||
|
||||
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
|
||||
@mock.patch.object(hardware, 'ThreadPool', autospec=True)
|
||||
def test_erase_devices_concurrency_pool_size(self, mocked_pool):
|
||||
def test_erase_devices_concurrency_pool_size(self, mocked_pool,
|
||||
mock_safety_check):
|
||||
self.hardware.list_block_devices = mock.Mock()
|
||||
self.hardware.list_block_devices.return_value = [
|
||||
hardware.BlockDevice('/dev/sdj', 'big', 1073741824, True),
|
||||
@@ -1782,6 +1823,10 @@ class TestGenericHardwareManager(base.IronicAgentTest):
|
||||
|
||||
self.hardware.erase_devices(self.node, [])
|
||||
mocked_pool.assert_called_with(1)
|
||||
mock_safety_check.assert_has_calls([
|
||||
mock.call(self.node, '/dev/sdj'),
|
||||
mock.call(self.node, '/dev/hdaa'),
|
||||
])
|
||||
|
||||
@mock.patch.object(hardware, 'dispatch_to_managers', autospec=True)
|
||||
def test_erase_devices_without_disk(self, mocked_dispatch):
|
||||
@@ -2441,6 +2486,7 @@ class TestGenericHardwareManager(base.IronicAgentTest):
|
||||
mock.call('/sys/fs/pstore/' + arg) for arg in pstore_entries
|
||||
])
|
||||
|
||||
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
|
||||
@mock.patch.object(il_utils, 'execute', autospec=True)
|
||||
@mock.patch.object(disk_utils, 'destroy_disk_metadata', autospec=True)
|
||||
@mock.patch.object(hardware.GenericHardwareManager,
|
||||
@@ -2449,7 +2495,7 @@ class TestGenericHardwareManager(base.IronicAgentTest):
|
||||
'_list_erasable_devices', autospec=True)
|
||||
def test_erase_devices_express(
|
||||
self, mock_list_erasable_devices, mock_nvme_erase,
|
||||
mock_destroy_disk_metadata, mock_execute):
|
||||
mock_destroy_disk_metadata, mock_execute, mock_safety_check):
|
||||
block_devices = [
|
||||
hardware.BlockDevice('/dev/sda', 'sata', 65535, False),
|
||||
hardware.BlockDevice('/dev/md0', 'raid-device', 32767, False),
|
||||
@@ -2466,7 +2512,44 @@ class TestGenericHardwareManager(base.IronicAgentTest):
|
||||
mock.call('/dev/md0', self.node['uuid'])],
|
||||
mock_destroy_disk_metadata.call_args_list)
|
||||
mock_list_erasable_devices.assert_called_with(self.hardware)
|
||||
mock_safety_check.assert_has_calls([
|
||||
mock.call(self.node, '/dev/sda'),
|
||||
mock.call(self.node, '/dev/md0'),
|
||||
mock.call(self.node, '/dev/nvme0n1'),
|
||||
mock.call(self.node, '/dev/nvme1n1')
|
||||
])
|
||||
|
||||
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
|
||||
@mock.patch.object(il_utils, 'execute', autospec=True)
|
||||
@mock.patch.object(disk_utils, 'destroy_disk_metadata', autospec=True)
|
||||
@mock.patch.object(hardware.GenericHardwareManager,
|
||||
'_nvme_erase', autospec=True)
|
||||
@mock.patch.object(hardware.GenericHardwareManager,
|
||||
'_list_erasable_devices', autospec=True)
|
||||
def test_erase_devices_express_stops_on_safety_failure(
|
||||
self, mock_list_erasable_devices, mock_nvme_erase,
|
||||
mock_destroy_disk_metadata, mock_execute, mock_safety_check):
|
||||
mock_safety_check.side_effect = errors.ProtectedDeviceError(
|
||||
device='foo',
|
||||
what='bar')
|
||||
block_devices = [
|
||||
hardware.BlockDevice('/dev/sda', 'sata', 65535, False),
|
||||
hardware.BlockDevice('/dev/md0', 'raid-device', 32767, False),
|
||||
hardware.BlockDevice('/dev/nvme0n1', 'nvme', 32767, False),
|
||||
hardware.BlockDevice('/dev/nvme1n1', 'nvme', 32767, False)
|
||||
]
|
||||
mock_list_erasable_devices.return_value = list(block_devices)
|
||||
|
||||
self.assertRaises(errors.ProtectedDeviceError,
|
||||
self.hardware.erase_devices_express, self.node, [])
|
||||
mock_nvme_erase.assert_not_called()
|
||||
mock_destroy_disk_metadata.assert_not_called()
|
||||
mock_list_erasable_devices.assert_called_with(self.hardware)
|
||||
mock_safety_check.assert_has_calls([
|
||||
mock.call(self.node, '/dev/sda')
|
||||
])
|
||||
|
||||
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
|
||||
@mock.patch.object(il_utils, 'execute', autospec=True)
|
||||
@mock.patch.object(hardware.GenericHardwareManager,
|
||||
'_is_virtual_media_device', autospec=True)
|
||||
@@ -2475,7 +2558,7 @@ class TestGenericHardwareManager(base.IronicAgentTest):
|
||||
@mock.patch.object(disk_utils, 'destroy_disk_metadata', autospec=True)
|
||||
def test_erase_devices_metadata(
|
||||
self, mock_metadata, mock_list_devs, mock__is_vmedia,
|
||||
mock_execute):
|
||||
mock_execute, mock_safety_check):
|
||||
block_devices = [
|
||||
hardware.BlockDevice('/dev/sr0', 'vmedia', 12345, True),
|
||||
hardware.BlockDevice('/dev/sdb2', 'raid-member', 32767, False),
|
||||
@@ -2509,7 +2592,62 @@ class TestGenericHardwareManager(base.IronicAgentTest):
|
||||
mock.call(self.hardware, block_devices[2]),
|
||||
mock.call(self.hardware, block_devices[5])],
|
||||
mock__is_vmedia.call_args_list)
|
||||
mock_safety_check.assert_has_calls([
|
||||
mock.call(self.node, '/dev/sda1'),
|
||||
mock.call(self.node, '/dev/sda'),
|
||||
# This is kind of redundant code/pattern behavior wise
|
||||
# but you never know what someone has done...
|
||||
mock.call(self.node, '/dev/md0')
|
||||
])
|
||||
|
||||
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
|
||||
@mock.patch.object(il_utils, 'execute', autospec=True)
|
||||
@mock.patch.object(hardware.GenericHardwareManager,
|
||||
'_is_virtual_media_device', autospec=True)
|
||||
@mock.patch.object(hardware.GenericHardwareManager,
|
||||
'list_block_devices', autospec=True)
|
||||
@mock.patch.object(disk_utils, 'destroy_disk_metadata', autospec=True)
|
||||
def test_erase_devices_metadata_safety_check(
|
||||
self, mock_metadata, mock_list_devs, mock__is_vmedia,
|
||||
mock_execute, mock_safety_check):
|
||||
block_devices = [
|
||||
hardware.BlockDevice('/dev/sr0', 'vmedia', 12345, True),
|
||||
hardware.BlockDevice('/dev/sdb2', 'raid-member', 32767, False),
|
||||
hardware.BlockDevice('/dev/sda', 'small', 65535, False),
|
||||
hardware.BlockDevice('/dev/sda1', '', 32767, False),
|
||||
hardware.BlockDevice('/dev/sda2', 'raid-member', 32767, False),
|
||||
hardware.BlockDevice('/dev/md0', 'raid-device', 32767, False)
|
||||
]
|
||||
# NOTE(coreywright): Don't return the list, but a copy of it, because
|
||||
# we depend on its elements' order when referencing it later during
|
||||
# verification, but the method under test sorts the list changing it.
|
||||
mock_list_devs.return_value = list(block_devices)
|
||||
mock__is_vmedia.side_effect = lambda _, dev: dev.name == '/dev/sr0'
|
||||
mock_execute.side_effect = [
|
||||
('sdb2 linux_raid_member host:1 f9978968', ''),
|
||||
('sda2 linux_raid_member host:1 f9978969', ''),
|
||||
('sda1', ''), ('sda', ''), ('md0', '')]
|
||||
mock_safety_check.side_effect = [
|
||||
None,
|
||||
errors.ProtectedDeviceError(
|
||||
device='foo',
|
||||
what='bar')
|
||||
]
|
||||
|
||||
self.assertRaises(errors.ProtectedDeviceError,
|
||||
self.hardware.erase_devices_metadata,
|
||||
self.node, [])
|
||||
|
||||
self.assertEqual([mock.call('/dev/sda1', self.node['uuid'])],
|
||||
mock_metadata.call_args_list)
|
||||
mock_list_devs.assert_called_with(self.hardware,
|
||||
include_partitions=True)
|
||||
mock_safety_check.assert_has_calls([
|
||||
mock.call(self.node, '/dev/sda1'),
|
||||
mock.call(self.node, '/dev/sda'),
|
||||
])
|
||||
|
||||
@mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
|
||||
@mock.patch.object(hardware.GenericHardwareManager,
|
||||
'_is_linux_raid_member', autospec=True)
|
||||
@mock.patch.object(hardware.GenericHardwareManager,
|
||||
@@ -2519,7 +2657,7 @@ class TestGenericHardwareManager(base.IronicAgentTest):
|
||||
@mock.patch.object(disk_utils, 'destroy_disk_metadata', autospec=True)
|
||||
def test_erase_devices_metadata_error(
|
||||
self, mock_metadata, mock_list_devs, mock__is_vmedia,
|
||||
mock__is_raid_member):
|
||||
mock__is_raid_member, mock_safety_check):
|
||||
block_devices = [
|
||||
hardware.BlockDevice('/dev/sda', 'small', 65535, False),
|
||||
hardware.BlockDevice('/dev/sdb', 'big', 10737418240, True),
|
||||
@@ -2553,6 +2691,10 @@ class TestGenericHardwareManager(base.IronicAgentTest):
|
||||
self.assertEqual([mock.call(self.hardware, block_devices[1]),
|
||||
mock.call(self.hardware, block_devices[0])],
|
||||
mock__is_vmedia.call_args_list)
|
||||
mock_safety_check.assert_has_calls([
|
||||
mock.call(self.node, '/dev/sdb'),
|
||||
mock.call(self.node, '/dev/sda')
|
||||
])
|
||||
|
||||
@mock.patch.object(il_utils, 'execute', autospec=True)
|
||||
def test__is_linux_raid_member(self, mocked_execute):
|
||||
@@ -4991,3 +5133,51 @@ class TestAPIClientSaveAndUse(base.IronicAgentTest):
|
||||
calls = [mock.call('list_hardware_info'),
|
||||
mock.call('wait_for_disks')]
|
||||
mock_dispatch.assert_has_calls(calls)
|
||||
|
||||
|
||||
@mock.patch.object(il_utils, 'execute', autospec=True)
|
||||
class TestProtectedDiskSafetyChecks(base.IronicAgentTest):
|
||||
|
||||
def test_special_filesystem_guard_not_enabled(self, mock_execute):
|
||||
CONF.set_override('guard_special_filesystems', False)
|
||||
hardware.safety_check_block_device({}, '/dev/foo')
|
||||
mock_execute.assert_not_called()
|
||||
|
||||
def test_special_filesystem_guard_node_indicates_skip(self, mock_execute):
|
||||
node = {
|
||||
'driver_internal_info': {
|
||||
'wipe_special_filesystems': False
|
||||
}
|
||||
}
|
||||
mock_execute.return_value = ('', '')
|
||||
hardware.safety_check_block_device(node, '/dev/foo')
|
||||
mock_execute.assert_not_called()
|
||||
|
||||
def test_special_filesystem_guard_enabled_no_results(self, mock_execute):
|
||||
mock_execute.return_value = ('', '')
|
||||
hardware.safety_check_block_device({}, '/dev/foo')
|
||||
|
||||
def test_special_filesystem_guard_raises(self, mock_execute):
|
||||
GFS2 = 'FSTYPE="gfs2"\n'
|
||||
GPFS1 = 'UUID="37AFFC90-EF7D-4E96-91C3-2D7AE055B174"\n'
|
||||
GPFS2 = 'PTUUID="37AFFC90-EF7D-4E96-91C3-2D7AE055B174"\n'
|
||||
GPFS3 = 'PARTTYPE="37AFFC90-EF7D-4E96-91C3-2D7AE055B174"\n'
|
||||
GPFS4 = 'PARTUUID="37AFFC90-EF7D-4E96-91C3-2D7AE055B174"\n'
|
||||
VMFS1 = 'UUID="AA31E02A-400F-11DB-9590-000C2911D1B8"\n'
|
||||
VMFS2 = 'UUID="AA31E02A-400F-11DB-9590-000C2911D1B8"\n'
|
||||
VMFS3 = 'UUID="AA31E02A-400F-11DB-9590-000C2911D1B8"\n'
|
||||
VMFS4 = 'UUID="AA31E02A-400F-11DB-9590-000C2911D1B8"\n'
|
||||
VMFS5 = 'UUID="0xfb"\n'
|
||||
VMFS6 = 'PTUUID="0xfb"\n'
|
||||
VMFS7 = 'PARTTYPE="0xfb"\n'
|
||||
VMFS8 = 'PARTUUID="0xfb"\n'
|
||||
|
||||
expected_failures = [GFS2, GPFS1, GPFS2, GPFS3, GPFS4, VMFS1, VMFS2,
|
||||
VMFS3, VMFS4, VMFS5, VMFS6, VMFS7, VMFS8]
|
||||
for failure in expected_failures:
|
||||
mock_execute.reset_mock()
|
||||
mock_execute.return_value = (failure, '')
|
||||
self.assertRaises(errors.ProtectedDeviceError,
|
||||
hardware.safety_check_block_device,
|
||||
{}, '/dev/foo')
|
||||
self.assertEqual(1, mock_execute.call_count)
|
||||
|
@@ -0,0 +1,20 @@
|
||||
---
|
||||
fixes:
|
||||
- |
|
||||
Previously when the ``ironic-python-agent`` would undergo erasure of block
|
||||
devices during cleaning, it would automatically attempt to erase the
|
||||
contents of any "Shared Device" clustered filesystems which may be in use
|
||||
by distinct multiple machines over a storage fabric. In particular
|
||||
IBM GPFS, Red Hat Global File System 2, and VmWare Virtual Machine File
|
||||
System (VMFS), are now identified and cleaning is halted. This is important
|
||||
because should an active cluster be using the this disk, cleaning could
|
||||
potentially cause the cluster to go down forcing restoration from backups.
|
||||
Ideally, infrastructure operators should check their environment's storage
|
||||
configuration and un-map any clustered filesystems from being visible to
|
||||
Ironic nodes, unless explicitly needed and expected. Please see the
|
||||
Ironic-Python-Agent troubleshooting documentation for more information.
|
||||
issues:
|
||||
- |
|
||||
Logic to guard VMFS filesystems from being destroyed *may* not recognize
|
||||
VMFS extents. Operators with examples of partitioning for extent usage
|
||||
are encouraged to contact the Ironic community.
|
Reference in New Issue
Block a user