Sending signal ``SIGUSR2`` to a conductor process will now trigger a
drain shutdown. This is similar to a ``SIGTERM`` graceful shutdown but
the timeout is determined by ``[DEFAULT]drain_shutdown_timeout`` which
defaults to ``1800`` seconds. This is enough time for running tasks on
existing reserved nodes to either complete or reach their own failure
timeout.
During the drain period the conductor needs to be removed from the hash
ring to prevent new tasks from starting. Other conductors also need to
not fail reserved nodes on the draining conductor which would appear to
be orphaned. This is achieved by running the conductor keepalive
heartbeat for this period, but setting the ``online`` state to
``False``.
When this feature was proposed, SIGINT was suggested as the signal to
use to trigger a drain shutdown. However this is already used by
oslo_service fast exit[1] so using this for drain would be a change in
existing behaviour.
[1] https://opendev.org/openstack/oslo.service/src/branch/master/oslo_service/service.py#L340
Change-Id: I777898f5a14844c9ac9967168f33d55c4f97dfb9
We don't use pysnmp anymore, and we likely won't update this doc
in the future if libraries change again.
Change-Id: If3a3bf02167f187a0e4f8f0d20a77621b5def3eb
DEFAULT_IMAGE_NAME no longer exists in devstack; also our config
now always leads to multiple networks being created, so remove
the fork in the instructions.
Change-Id: I1b3e134c4d5a9633028af367e89ecc44699561cb
In-process token cache is deprecated since 4.2.0 release
and may be removed. Add the setting of memcache for
the auth_token token cache.
Change-Id: I23ad1d9fb1b33160452ab353972fa1274cde363d
Adds basic testing for PXE/iPXE boot secenarios where the OVN
DHCP service is used instead of dnsmasq.
Also adds a release note and documentation to cover the details
and caveats of using ovn as we have discovered through this process.
Change-Id: I28cd20a7f271220d8ca335895ca9e302452fd069
We have discovered hardware that only applies boot mode / secure boot
changes during a reboot. Furthermore, the same hardware cannot update
both at the same time. To err on the safe side, reboot and wait for
the value to change if it's not changed immediately.
Co-Authored-By: Jacob Anders <janders@redhat.com>
Change-Id: I318940a76be531f453f0f5cf31a59cba16febf57
Adds service steps on a variety of internal interfaces,
and begins to tie documentation together to provide clarity
on the use and purpose of service steps.
Change-Id: Ifd7241f06648c8d73c1b97fcf08673496f049f45
Introduce config to allow setting default ramdisks per-architecture.
The hierarchy of the parameters is:
Node config > config by architecture > general config
Change-Id: I95dfece3e8f7bcd3121ac808985cb61997877a51
This is a significant improvement and update to Ironic contributor
documentation, as an attempt to make it easier for new Ironic
contributors to onboard.
It is not perfect, but it's significantly better than the existing
documentation.
What this change does:
- Improve dev-quickstart guide, make it easier to find
devstack configurations.
- Removes information that can bit-rot over time and replaces with
more generic information.
- Provides an actual working, tested, Ironic+Nova devstack conf
What hasn't been done:
- Testing of Ironic BFV or Multitenant networking devstack confs
- Validation that the local development method still works
- There is a ton more information about how to use these testing
envs (both bifrost and devstack) which could be added.
- System prerequsities and Python prerequisites under the unit
tests section has bitrotted considerably; they have not been
significantly modified and will be fixed in a future commit.
Change-Id: I0cdfe50042fabb6b65633961fc418aff5d6ebfe3
A huge list of initial work for service steps
* Adds service_step verb
* Adds service_step db/object/API field on the node object for the
status.
* Increments the API version to 1.87 for both changes.
* Increments the RPC API version to 1.57.
* Adds initial testing to facilitate ensurance that supplied steps
are passed through and executed upon.
Does not:
* Have tests for starting the agent ramdisk, although this is
relatively boiler plate.
* Have a collection of pre-decorated steps available for immediate
consumption.
Change-Id: I5b9dd928f24dff7877a4ab8dc7b743058cace994
Adds support for SHA256 and SHA512 checksums to be passed
to firmware upgrade steps for the ilo hardware type.
Change-Id: I5455c4bfa4741a35b0ddada37298c897887e6cea
Adds a wait step to allow for finer grained workflows
and forcing interruptions which may be needed in some
cases with specialized hardware.
Change-Id: Idc338b761ebe35a4635022a324ca5acbf29fc462
The versions only differ in the first paragraph, and the supposedly
common parts actually have different code paths for different distros.
Also be realistic about which distros we support.
Change-Id: Ifcc19a20d42f384300cadf442951739be8682047
Adds the logic and testing to handle vendor interfaces to be able
to be called as steps, as well as adds the ipmitool send_raw
vendor passthru method to be able to be called as a step.
Change-Id: I741a4173f1d150298008d3190e4c3998402a8b86
* Updates API version to 1.85 to permit an ``unhold`` verb
* Adds the ``deploy hold`` and ``clean hold`` provision states
to the internal state machine.
* Adds on documentation on steps to help provide greater clarity
to Ironic's users on how to utilize steps. It should be noted
this documentation also includes the power state reserved step
names from the DPU functionality patch.
* Fixes the state machine diagram. Changes type to PNG as SVG
rendering is broken due to python libraries utilized for SVG
generation which do not work on more recent Python versions.
Change-Id: I34f58f4e77e7757b89247fd64f5fcde26f679453
This change creates all necessary parts to processing inspection data:
* New API /v1/continue_inspection
Depending on the API version, either behaves like the inspector's API
or (new version) adds the lookup functionality on top.
The lookup process is migrated from ironic-inspector with minor changes.
It takes MAC addresses, BMC addresses and (optionally) a node UUID and
tries to find a single node in INSPECTWAIT state that satisfies all
of these. Any failure results in HTTP 404.
To make lookup faster, the resolved BMC addresses are cached in advance.
* New RPC continue_inspection
Essentially, checks the provision state again and delegates to the
inspect interface.
* New inspect interface call continue_inspection
The base version does nothing. Since we don't yet have in-band
inspection in Ironic proper, the only actual implementation is added
to the existing "inspector" interface that works by doing a call
to ironic-inspector.
Story: #2010275
Task: #46208
Change-Id: Ia3f5bb9d1845d6b8fab30232a72b5a360a5a56d2
Allows steps to be executed on child nodes, and adds
the reserved power_on, power_off, and reboot step names.
Change-Id: I4673214d2ed066aa8b95a35513b144668ade3e2b
Adds the parent node support and tests in one change
including all DB/Model/API changes along with RBAC and
basic API tests.
* Updates the API version to 1.83
* Adds parent_node and related index to the nodes table.
* Adds new API parameters to list by parent node relationship.
Depends-On: https://review.opendev.org/c/openstack/ironic/+/883967
Change-Id: I8d64fee7105718199986db4994e13352d639f04f
The troubleshooting kernel command line option nomodeset
unfortunately changes the way framebuffer interactions work
with graphics devices which in some cases can result in kernel
memory to be used for graphics updates. When this happens on
some specific hardware common in rack mount servers with baseboard
management controllers, this can cause the memory bus to become
locked for a brief time while the graphics update is occuring.
This locked memory bus means disk IO can become blocked,
and network cards can overflow their buffers resulting in
packet loss on top of the latency incurred by the graphics
update executing.
As such, we've removed the nomodeset option from default usage and
added a note describing its removal to the documentation along
with a release note.
Change-Id: I9084d88c3ec6f13bd64b8707892758fa87dd7f86
Unused by Nova and unlike memory_mb/local_gb also by Ironic (actually,
our usage of local_gb is worth double-checking as well, but at the very
least it's referenced by inspection implementations).
Change-Id: Ie8b0d9f58f4dcd102c183c30ae7f5acf68a5e4c3
Even if a glance image is raw, we still recalculate the checksum after
"converting" it to raw. This process may take exceptionally long.
Change-Id: Id93d518b8d2b8064ff901f1a0452abd825e366c0