Container hardware manager enables some neat use cases; now we have documentation of them. Hoorah! Assisted-by: claude code Change-Id: I853d88f36cd82fb4e07f31c935c449950af47198 Signed-off-by: Jay Faulkner <jay@jvf.cc>
11 KiB
Container-Based Steps
Overview
The Container Hardware Manager in ironic-python-agent (IPA) allows running OCI-compatible containers as steps on bare metal nodes. This enables operators to package arbitrary tools -- firmware updaters, diagnostic suites, compliance scanners -- as container images and execute them during any step-based workflow, such as cleaning, deployment, or servicing.
Basics
A workflow for implementing container-based steps with runbooks is:
- The operator builds an IPA ramdisk with the
ironic-python-agent-podmandiskimage-builder element, which installs podman and the Container Hardware Manager into the ramdisk. - The Ironic conductor sends
[agent_containers]configuration to IPA via the lookup/heartbeat endpoint. This allows conductor-side settings to override any build-time defaults in the ramdisk. - When a runbook triggers a
container_clean_step, IPA uses podman (or docker) to pull and run the specified container image on the bare metal node. - The container runs with host networking by default, executes its task, and exits. IPA reports the result back to the conductor.
Prerequisites
IPA ramdisk with podman support
The IPA ramdisk must be built with the
ironic-python-agent-podman diskimage-builder (DIB) element.
This element is currently Debian-based only.
export DIB_ALLOW_ARBITRARY_CONTAINERS=true
export DIB_RUNNER=podman
disk-image-create ironic-python-agent-ramdisk \
ironic-python-agent-podman \
debian -o ipa-with-podmanKey DIB environment variables:
DIB_ALLOW_ARBITRARY_CONTAINERS-
Set to
trueto allow any container image. Set tofalse(default) to restrict to a specific allowlist. Environments which permit non-admin roles to create and execute runbooks should not set this totruefor security reasons. DIB_ALLOWED_CONTAINERS-
Comma-separated list of allowed container image URLs. Only used when
DIB_ALLOW_ARBITRARY_CONTAINERSisfalse. DIB_RUNNER-
Container runtime:
podman(default) ordocker.
Container registry access
The container registry hosting your images must be accessible from
the cleaning network. If using a private registry, ensure credentials
and TLS certificates are configured in the ramdisk or passed via
pull_options.
Ironic Conductor Configuration
The [agent_containers] configuration group controls how
the conductor instructs IPA to handle containers. These settings are
sent to IPA at lookup time, so changes take effect without rebuilding
the ramdisk.
[agent_containers]
# Allow any container image (default: false)
allow_arbitrary_containers = false
# Allowlist of container images (used when above is false)
allowed_containers = docker://registry.example.com/firmware-tool:latest,docker://registry.example.com/diag-suite:v2
# Container runtime (default: podman)
runner = podman
# Options passed to the pull command
pull_options = --tls-verify=false
# Options passed to the run command
run_options = --rm --network=host --tls-verify=falseWarning
Setting allow_arbitrary_containers = true allows
any container image to be pulled and executed with
host-level network access on the bare metal node. Only enable this in
trusted environments. Prefer using allowed_containers to
maintain an explicit allowlist.
See also: agent_containers.allow_arbitrary_containers,
agent_containers.allowed_containers,
agent_containers.runner, agent_containers.pull_options, agent_containers.run_options.
Example Container-based Runbooks
The built-in step
The Container Hardware Manager exposes a built-in cleaning step
called container_clean_step on the deploy
interface. This step has a default priority of 0, meaning
it only runs when explicitly invoked via manual cleaning, servicing, or
a runbook.
The step accepts the following arguments:
container_url(required)-
The full container image URL, e.g.
docker://registry.example.com/firmware-tool:latest. pull_options(optional)-
Override the default pull options for this specific container.
run_options(optional)-
Override the default run options for this specific container.
Single-container runbook
This example creates a runbook that runs a single firmware update container:
baremetal runbook create \
--name CUSTOM_CONTAINER_FW_UPDATE \
--steps '[
{
"interface": "deploy",
"step": "container_clean_step",
"args": {
"container_url": "docker://registry.example.com/firmware-tool:latest"
},
"order": 1
}
]'Multi-container runbook
Runbooks can combine multiple container steps with traditional steps. This example runs a diagnostic container, then a firmware updater, and finishes with a standard disk metadata erase:
baremetal runbook create \
--name CUSTOM_CONTAINER_CLEAN \
--steps '[
{
"interface": "deploy",
"step": "container_clean_step",
"args": {
"container_url": "docker://registry.example.com/diag-suite:v2"
},
"order": 1
},
{
"interface": "deploy",
"step": "container_clean_step",
"args": {
"container_url": "docker://registry.example.com/firmware-tool:latest",
"run_options": "--rm --network=host --privileged"
},
"order": 2
},
{
"interface": "deploy",
"step": "erase_devices_metadata",
"args": {},
"order": 3
}
]'Adding traits to nodes
Runbooks are matched to nodes via traits. Add the matching trait to all nodes that should use the runbook:
baremetal node add trait <node> CUSTOM_CONTAINER_CLEAN
Using the Runbook
Manual cleaning
Trigger the runbook on a node in manageable state:
baremetal node clean <node> --runbook CUSTOM_CONTAINER_CLEANAutomated cleaning
To use container-based steps for automated cleaning, configure the
conductor to use runbook-based or hybrid cleaning and assign the
runbook. See runbook-cleaning for full details on the available
configuration levels (per-node, per-resource-class, global).
A minimal example using the global default:
[conductor]
automated_clean = true
automated_cleaning_step_source = runbook
automated_cleaning_runbook = CUSTOM_CONTAINER_CLEANAll nodes must have the matching trait
(CUSTOM_CONTAINER_CLEAN) unless trait validation is
disabled via conductor.automated_cleaning_runbook_validate_traits.
Servicing
Container steps also work with servicing. Trigger a container runbook on an
active node:
baremetal node service <node> --runbook CUSTOM_CONTAINER_CLEANAlternative Methods
Operators may utilize container-based steps that are hardcoded via configuration in-ramdisk.
Ironic-python-agent can be configured to expose arbitrary steps using containers for use in workflows, including automated cleaning, via a yaml configuration file.
For example:
steps:
- name: manage_container_cleanup
image: docker://172.24.4.1:5000/cleaning-image:latest
interface: deploy
reboot_requested: true
pull_options:
- --tls-verify=false
run_options:
- --rm
- --network=host
- --tls-verify=false
abortable: true
priority: 20
- name: manage_container_cleanup2
image: docker://172.24.4.1:5000/cleaning-image2:latest
interface: deploy
reboot_requested: true
pull_options:
- --tls-verify=false
run_options:
- --rm
- --network=host
- --tls-verify=false
abortable: true
priority: 10By placing a file in your IPA ramdisk with these contents in the path
indicated by agent_containers.container_steps_file,
cleaning steps manage_container_cleanup and
manage_container_cleanup2 will be reported as available
cleaning steps at the indicated priority.
This is useful for high-security environments which would prefer the hassle of rebuilding a ramdisk to the risk of permitting runtime decisions around what containers to clean with.
Security Considerations
- Prefer allowlisting over
allow_arbitrary_containers = true. The allowlist (allowed_containers) restricts which images IPA will accept, reducing the risk of running untrusted code. - TLS verification -- the default
pull_optionsandrun_optionsinclude--tls-verify=falsefor development convenience. In production, remove this flag and ensure proper TLS certificates are available in the ramdisk. - Container privileges -- by default, containers run
with
--network=host, giving them full access to the node's network stack. Reviewrun_optionsand consider adding--read-onlyor dropping capabilities where possible.
Troubleshooting
- Container pull failures
-
Check that the container registry is accessible from the cleaning network. Verify the image URL in the runbook step. If using TLS, ensure certificates are configured correctly in the ramdisk or add
--tls-verify=falsetopull_optionsfor testing. - Step not found: container_clean_step
-
The IPA ramdisk was not built with the
ironic-python-agent-podmanelement. Rebuild the ramdisk with podman support as described in Prerequisites. - Container rejected by allowlist
-
The container URL does not match any entry in
allowed_containersandallow_arbitrary_containersisfalse. Either add the image to the allowlist or setallow_arbitrary_containers = truein[agent_containers]. - Trait mismatch
-
The node does not have a trait matching the runbook name. Add the trait with
baremetal node add trait <node> <RUNBOOK_NAME>.