This reverts commit 05f2c8b79f0d6b7e9200bbc531ff621d2029da2e.
It is being reverted as the centos stream images
contain extra, un-necessary libraries and packages
installed which swells the ramdisk size up substantially
and is causing failures in CI as the compressed image size
expanded by about 100MB, and uncompressed the stream images
are 1.1GB.
Change-Id: Icc3a18ed12d309fd9a00f02d5e703dfeda50e86b
All the tox jobs are based on openstack-tox, we should convert
ironic-tox-unit-with-driver-libs too.
Change-Id: I20836d586edccfb8cd8fed1f3a89f1497ff96943
The baremetal client encodes boolean patch values as strings
("True", "False") but there is no unit test coverage which confirms
that this actually works. This change adds that test coverage.
Change-Id: I9e428ad973e88d3e1ef1e04e49a7b00a4e2d43fd
As a follow-up to the review feedback in[1], type specific fields
arguments are removed and the type is inferred from the versioned
object fields.
Story: 1651346
Task: 10551
[1] https://review.opendev.org/751160
Change-Id: I89a65214ab7d550d0b4a327dd033c27399ae13bf
There are some Ironic execution workflows where there is not an easy way
to retry, such as when attempting to hand off the processing of an async
task to a conductor. Task handoff can require releasing a lock on the
node, so the next entity processing the task can acquire the lock
itself. However, this is vulnerable to race conditions, as there is no
uniform retry mechanism built in to such handoffs. Consider the
continue_node_deploy/clean logic, which does this:
method = 'continue_node_%s' % operation
# Need to release the lock to let the conductor take it
task.release_resources()
getattr(rpc, method)(task.context, uuid, topic=topic
If another process obtains a lock between the releasing of resources and
the acquiring of the lock during the continue_node_* operation, and
holds the lock longer than the max attempt * interval window (which
defaults to 3 seconds), then the handoff will never complete. Beyond
that, because there is no proper queue for processes waiting on the
lock, there is no fairness, so it's also possible that instead of one
long lock being held, the lock is obtained and held for a short window
several times by other competing processes.
This manifests as nodes occasionally getting stuck in the "DEPLOYING"
state during a deploy. For example, a user may attempt to open or access
the serial console before the deploy is complete--the serial console
process obtains a lock and starves the conductor of the lock, so the
conductor cannot finish the deploy. It's also possible a long heartbeat
or badly-timed sequence of heartbeats could do the same.
To fix this, this commit introduces the concept of a "patient" lock,
which will retry indefinitely until it doesn't encounter the NodeLocked
exception. This overrides any retry behavior.
.. note::
There may be other cases where such a lock is desired.
Story: #2008323
Change-Id: I9937fab18a50111ec56a3fd023cdb9d510a1e990
This change removes unused code and concludes the conversion of the
REST API from WSME based to plain JSON.
Change-Id: Ib04c759f86d9758b67a75648b5971f5a80c77ecb
Story: 1651346
Task: 10551
Patching the port internal_info was allowed in error in the just
landed JSON conversion change[1]. This is now fixed, and the comment
has been updated to explain why internal_info needs to be part of
patch schema.
[1] https://review.opendev.org/750120
Change-Id: Ieab085cfd9731e180f741b17a27ea540dabbf62e
Previously disk labels would not be populated if not explicitly
set by an API user, which lead to a dangerous possible case,
which sometimes could work, but was ultimately wrong to
setup a UEFI booting machine with a BIOS MBR partition table.
Not all systems support this, but UEFI systems are supposed to
support GPT partition tables.
We now fallback if no explicit override is set and assume GPT
if the machine is set to UEFI mode.
Change-Id: I001d8c6ee3b1d6c466c71ea5179bdbca9bdd692d