Initial code patches for service steps have merged in ironic, and it is now time to add support into the agent which allows service steps to be raised to the service. Updates the default hardware manager version to 1.2, which has *rarely* been incremented due to oversight. Change-Id: Iabd2c6c551389ec3c24e94b71245b1250345f7a7
10 KiB
Built-in hardware managers
GenericHardwareManager
This is the default hardware manager for ironic-python-agent. It
provides support for hardware-inventory
and the default deploy, clean, and
service steps.
Deploy steps
deploy.write_image(node, ports, image_info, configdrive=None)
-
A deploy step backing the
write_image
deploy step of thedirect deploy interface <admin/interfaces/deploy.html#direct-deploy>
. Should not be used explicitly, but can be overridden to provide a custom way of writing an image. deploy.erase_devices_metadata(node, ports)
-
Erases partition tables from all recognized disk devices. Can be used with software RAID since it requires empty holder disks.
raid.apply_configuration(node, ports, raid_config, delete_existing=True)
-
Apply a software RAID configuration. It belongs to the
raid
interface and must be used through theironic RAID feature <admin/raid.html>
.
Injecting files
deploy.inject_files(node, ports, files, verify_ca=True)
This optional deploy step (introduced in the Wallaby release series)
allows injecting arbitrary files into the node. The list of files is
built from the optional inject_files
property of the node
concatenated with the explicit files
argument. Each item in
the list is a dictionary with the following fields:
path
(required)-
An absolute path to the file on the target partition. All missing directories will be created.
partition
-
Specifies the target partition in one of 3 ways:
- A number is treated as a partition index (starting with 1) on the root device.
- A path is treated as a block device path (e.g.
/dev/sda1
or/dev/disk/by-partlabel/<something>
. - If missing, the agent will try to find a partition containing the
first component of the
path
on the root device. E.g. for/etc/sysctl.d/my.conf
, look for a partition containing/etc
.
deleted
-
If
True
, the file is deleted, not created. Incompatible withcontent
. content
-
Data to write. Incompatible with
deleted
. Can take two forms:- A URL of the content. Can use Python-style formatting to build a
node specific URL, e.g.
http://server/{node[uuid]}/{ports[0][address]}
. - Base64 encoded binary contents.
- A URL of the content. Can use Python-style formatting to build a
node specific URL, e.g.
mode
,owner
,group
-
Numeric mode, owner ID and group ID of the file.
dirmode
-
Numeric mode of the leaf directory if it has to be created.
This deploy step is disabled by default and can be enabled via a
deploy template or via the ipa-inject-files-priority
kernel
parameter.
Known limitations:
- Names are not supported for
owner
andgroup
. - LVM is not supported.
Clean steps
deploy.burnin_cpu
-
Stress-test the CPUs of a node via stress-ng for a configurable amount of time. Disabled by default.
deploy.burnin_disk
-
Stress-test the disks of a node via fio. Disabled by default.
deploy.burnin_memory
-
Stress-test the memory of a node via stress-ng for a configurable amount of time. Disabled by default.
deploy.burnin_network
-
Stress-test the network of a pair of nodes via fio for a configurable amount of time. Disabled by default.
deploy.erase_devices
-
Securely erases all information from all recognized disk devices. Relatively fast when secure ATA erase is available, otherwise can take hours, especially on a virtual environment. Enabled by default.
deploy.erase_devices_metadata
-
Erases partition tables from all recognized disk devices. Can be used as an alternative to the much longer
erase_devices
step. deploy.erase_pstore
-
Erases entries from pstore, the kernel's oops/panic logger. Disabled by default. Can be enabled via priority overrides.
raid.create_configuration
-
Create a RAID configuration. This step belongs to the
raid
interface and must be used through theironic RAID feature <admin/raid.html>
. raid.delete_configuration
-
Delete the RAID configuration. This step belongs to the
raid
interface and must be used through theironic RAID feature <admin/raid.html>
.
Service steps
Service steps can be invoked by an operator of a baremetal node, to modify or perform some intermediate action outside the realm of normal use of a deployed bare metal instance. This is similar in form of interaction to cleaning, and ultimately some cleaning and deployment steps are available to be used.
deploy.burnin_cpu
-
Stress-test the CPUs of a node via stress-ng for a configurable amount of time.
deploy.burnin_memory
-
Stress-test the memory of a node via stress-ng for a configurable amount of time.
deploy.burnin_network
-
Stress-test the network of a pair of nodes via fio for a configurable amount of time.
raid.create_configuration
-
Create a RAID configuration. This step belongs to the
raid
interface and must be used through theironic RAID feature <admin/raid.html>
. raid.apply_configuration(node, ports, raid_config, delete_existing=True)
-
Apply a software RAID configuration. It belongs to the
raid
interface and must be used through theironic RAID feature <admin/raid.html>
. raid.delete_configuration
-
Delete the RAID configuration. This step belongs to the
raid
interface and must be used through theironic RAID feature <admin/raid.html>
. deploy.write_image(node, ports, image_info, configdrive=None)
-
A step backing the
write_image
deploy step of thedirect deploy interface <admin/interfaces/deploy.html#direct-deploy>
. Should not be used explicitly, but can be overridden to provide a custom way of writing an image. deploy.inject_files(node, ports, files, verify_ca=True)
-
A step to inject files into a system. Specifically this step is documented earlier in this documentation.
Note
The Ironic Developers chose to limit the items available for service steps such that the risk of data distruction is generally minimized. That being said, it could be reasonable to reconfigure RAID devices through local hardware managers or to write the base OS image as part of a service operation. As such, caution should be taken, and if additional data erasure steps are needed you may want to consider moving a node through cleaning to remove the workload. Otherwise, if you have a use case, please feel free to reach out to the Ironic Developers so we can understand and enable your use case.
Cleaning safeguards
The stock hardware manager contains a number of safeguards to prevent unsafe conditions from occuring.
Devices Skip List
A list of devices that Ironic does not touch during the cleaning and
deployment process can be specified in the node properties field under
skip_block_devices
. This should be a list of dictionaries
containing hints to identify the drives. For example:
'skip_block_devices': [{'name': '/dev/vda', 'vendor': '0x1af4'}]
To prevent software RAID devices from being deleted, put their volume
name (defined in the target_raid_config
) to the list.
Note: one dictionary with one value for each of the logical disks. For example:
'skip_block_devices': [{'volume_name': 'large'}, {'volume_name': 'temp'}]
Shared Disk Cluster Filesystems
Commonly used shared disk cluster filesystems, when detected, causes cleaning processes on stock hardware manager methods to abort prior to destroying the contents on the disk.
These filesystems include IBM General Parallel File System (GPFS), VmWare Virtual Machine File System (VMFS), and Red Hat Global File System (GFS2).
For information on troubleshooting, and disabling this check, see
/admin/troubleshooting
.
Custom hardware managers
MellanoxDeviceHardwareManager
This is a custom hardware manager for ironic-python-agent. It provides support for Nvidia/Mellanox NICs.
- You can get the binraies firmware for all Nvidia/Mellanox NICs from here Nvidia firmware downloads
- And you can get the deviceID from here Nvidia/Mellanox NICs list
- Also you can check here MFT decumentation for some supported parameters
Clean steps
update_nvidia_nic_firmware_image(node, ports, images)
A clean step used to update Nvidia/Mellanox NICs firmware images from
the required parameter images
list. it's disabled by
default. Each image in the list is a dictionary with the following
fields:
url
(required)checksum
(required)-
checksum of the provided image.
checksumType
(required)-
checksum type, it could be (md5/sha512/sha256).
componentFlavor
(required)-
The PSID of the nic.
version
(required)-
version of the firmware image , it must be the same as in the image file.
update_nvidia_nic_firmware_settings(node, ports, settings)
A clean step used to update Nvidia/Mellanox NICs firmware settings
from the required parameter settings
list. it's disabled by
default. Each settings in the list is a dictionary with the following
fields:
deviceID
(required)-
The ID of the NIC
globalConfig
-
The global configuration for NIC
function0Config
-
The per-function configuration of the first port of the NIC
function1Config
-
The per-function configuration of the second port of the NIC
Service steps
The Clean steps supported by the MellanoxDeviceHardwareManager are also available as Service steps if an infrastructure operator wishes to apply new firmware for a running machine.