12557 Commits

Author SHA1 Message Date
Zuul
08ce71d0f8 Merge "Add a list of children nodes unit test" 2023-08-17 15:50:06 +00:00
Zuul
7cf0881cab Merge "Add job to test with SQLAlchemy master (2.x)" 2023-08-16 03:13:45 +00:00
Zuul
8e2aab8291 Merge "Fix several issues in the lock/release database code" 2023-08-14 23:09:10 +00:00
Zuul
9b181b83a8 Merge "Support sha256/sha512 with the ilo firmware upgrade logic" 2023-08-14 16:36:21 +00:00
Zuul
a83f501ded Merge "tox: Remove basepython" 2023-08-10 22:21:55 +00:00
Zuul
2dec169587 Merge "Fix missing oslo.versionedobjects library option" 2023-08-08 12:57:29 +00:00
Zuul
e2011518f1 Merge "Prevent MissingAttribute error when supportedApplyTime missing" 2023-08-08 07:59:03 +00:00
Takashi Kajinami
0bda652262 Fix missing oslo.versionedobjects library option
This ensures the options for oslo.versionedobjects library are
included in the file generated by oslo-config-generator.

Change-Id: Ib63c4dd1c14905ec200e67a8fe9ba5f20b160b08
2023-08-08 15:05:22 +09:00
Jacob Anders
f93712d7a6 Prevent MissingAttribute error when supportedApplyTime missing
On some hardware, supportedApplyTime attribute may not be listed
under Redfish BIOS settings URL. This patch adds handling of this case
to prevent failure on attempt of updating BIOS settings.

Change-Id: I40359973fd832146cb2b179bfa447a308078e83d
2023-08-08 13:18:31 +10:00
Julia Kreger
23f4a7d993 Support sha256/sha512 with the ilo firmware upgrade logic
Adds support for SHA256 and SHA512 checksums to be passed
to firmware upgrade steps for the ilo hardware type.

Change-Id: I5455c4bfa4741a35b0ddada37298c897887e6cea
2023-08-07 15:20:14 +00:00
likui
3eca0d8713 tox: Remove basepython
Python 2 is EOL. No environment should be defaulting to it. Our CI
environments certainly aren't.

Change-Id: I7b59edcb2e258b774bc71ad8e9873653d3e0b9a4
2023-08-02 16:57:32 +08:00
Zuul
2528bf6621 Merge "Fix typo in deploy_templates docs" 2023-08-01 07:05:12 +00:00
Jay Faulkner
d2c11df694 Fix typo in deploy_templates docs
There is no /v1/deploy_template; should be /v1/deploy_templates.

Change-Id: Iab0c7be8dd54f2b4d01b0655d615381610eb84d4
2023-07-31 14:35:18 -07:00
Zuul
30881b3281 Merge "Very basic in-band inspection with the "agent" interface" 2023-07-31 18:51:56 +00:00
Zuul
ff28b87a6d Merge "Correct two mistakes in the /continue_inspection API" 2023-07-31 18:43:24 +00:00
Zuul
2b2a80617a Merge "Add python3.10 support in testing runtime" 2023-07-31 15:48:37 +00:00
Zuul
cb96585167 Merge "Add the initial skeleton of the agent inspect interface" 2023-07-31 11:51:37 +00:00
likui
d97deb84e6 Add python3.10 support in testing runtime
In 2023.2 cycle testing runtime, project started adding python 3.10

[1] https://governance.openstack.org/tc/reference/runtimes/2023.2.html

Change-Id: Ifcde8bfb691e4a5a51db82a0607fba4b474fd32d
2023-07-31 15:18:20 +08:00
Zuul
ee77f0b40d Merge "DB: Select upon delete for allocations" 2023-07-31 00:38:35 +00:00
Zuul
eb76784416 Merge "Log when a periodic is completed" 2023-07-31 00:13:45 +00:00
Julia Kreger
bb9b9001ad DB: Select upon delete for allocations
When an allocation has been created, it is still being created
in the background. The client may request deletion of the
allocation *prior* to the creation of the allocation is entirely
completed. That is fine, but the challenge is we may encounter a
locked row if we're asked to delete while in process.

So, we'll query with with_for_update[0] which should be held until
the lock is released, which is only released when the original
locking transaction closes out[1][2].

[0]: https://docs.sqlalchemy.org/en/14/core/selectable.html#sqlalchemy.sql.expression.GenerativeSelect.with_for_update
[1]: https://dev.mysql.com/doc/refman/5.7/en/select.html
[2]: https://dev.mysql.com/doc/refman/5.7/en/innodb-locking-reads.html

Change-Id: I0b68f054c951655b01f0cd776feb5a8c768471ab
Closes-Bug: #2028866
2023-07-28 20:16:52 +00:00
Zuul
ab04ec8ce3 Merge "DB: Streamline allocation interactions" 2023-07-28 20:09:42 +00:00
Zuul
b769a8199a Merge "Add wait step" 2023-07-28 05:16:26 +00:00
Zuul
96b1718b42 Merge "Enable vendor interfaces to be called as steps" 2023-07-27 17:19:09 +00:00
Julia Kreger
cc9af373e7 DB: Streamline allocation interactions
We've discovered we can deadlock on allocations, and reviewing
the code of both the test and the underlying db, it is sort of a
"multiple things contribute scenario", but first up here is to
streamline the allocations update process so we re-query after
closing out the transaction.

Change-Id: I46e78813787703819a61f69d4243271ec07e0983
Partial-Bug: #2028866
2023-07-27 09:33:03 -07:00
Julia Kreger
81b35931d4 Add a list of children nodes unit test
Somehow, missed adding a test for this early on, while working on
validating the API client changes, I discovered that we seemed to
be missing unit tests for the method, because I couldn't get it
to work for the client, but then I realized I was just mixing
the nature of use up. C'est la vie!

Change-Id: I382ad3a011f3151abedb896f58c75e8b76b357fa
2023-07-26 12:03:55 -07:00
Zuul
40356e2eae Merge "Fix ks_template property to be processed only for anaconda deploy" 2023-07-26 17:06:12 +00:00
Zuul
2b9d3638b1 Merge "Document caveats of running with SQLite" 2023-07-26 14:16:58 +00:00
Julia Kreger
8fc8372e74 Add wait step
Adds a wait step to allow for finer grained workflows
and forcing interruptions which may be needed in some
cases with specialized hardware.

Change-Id: Idc338b761ebe35a4635022a324ca5acbf29fc462
2023-07-24 22:42:20 +00:00
Lana Kaleif
36665a0895 Fix ks_template property to be processed only for anaconda deploy
Ramdisk and anaconda deploys share image processing by
ironic.common.pxe_utils.get_instance_image_info(). This method
unnecessarily processes ks_template property when ramdisk
deploy is in use, and by itself calls _get_image_properties()
which requires image_source property to exist.
For ramdisk deploy it is enough to have kernel and ramdisk.
This patch adds conditions to process ks_template property only
for anaconda deploy.

Change-Id: I5f88d3b1da1c17bc26d49370cc6ce74644d13679
2023-07-24 08:31:51 -04:00
Julia Kreger
2d8986bda4 Fix retry logic logging
logging has a method debug, and a static variable DEBUG, which are entirely
different. Tenacity requires the integer value which is passed in for logging
actions.

This is rooted in the examples published on tenacity logging, since the lower
level caller doesn't log in many cases, and our testing didn't catch this precise
case because we were only validating that logging was called, not that logging
worked. We also didn't see any of the errors related to CI resource
contention being lessened when the patch was running, and many of the chances
of lock conflicts being reduced in other fixes.

Duplicates the retry test *without* the logging mock, as not mocking the logging
would have yielded the break.

Change-Id: I4a65a044e90aff3cffae24f191e425bc75b5fb74
2023-07-20 11:36:36 -07:00
Zuul
ef73871524 Merge "Firmware Interface" 2023-07-19 06:00:30 +00:00
Zuul
3e4cd314b4 Merge "Retry SQLite DB write failures due to locks" 2023-07-19 04:13:03 +00:00
Zuul
b78f379997 Merge "Revert "Fix IRONIC_IMAGE_NAME=non-existent-image"" 2023-07-19 01:31:24 +00:00
Zuul
cd30d52c79 Merge "Add additional logging on iLO power failure" 2023-07-18 20:56:17 +00:00
Julia Kreger
091edb0631 Retry SQLite DB write failures due to locks
Adds a database retry decorator to capture and retry exceptions
rooted in SQLite locking. These locking errors are rooted in
the fact that essentially, we can only have one distinct writer
at a time. This writer becomes transaction oriented as well.

Unfortunately with our green threads and API surface, we run into
cases where we have background operations (mainly, periodic tasks...)
and API surface transacations which need to operate against the DB
as well. Because we can't say one task or another (realistically
speaking) can have exclusive control and access, then we run into
database locking errors.

So when we encounter a lock error, we retry.

Adds two additional configuration parameters to the database
configuration section, to allow this capability to be further
tuned, as file IO performance is *surely* a contributing factor
to our locking issues as we mostly see them with a loaded CI
system where other issues begin to crop up.

The new parameters are as follows:
* sqlite_retries, a boolean value allowing the retry logic
  to be disabled. This can largely be ignored, but is available
  as it was logical to include.
* sqlite_max_wait_for_retry, a integer value, default 30 seconds
  as to how long to wait for retrying SQLite database operations
  which are failing due to a "database is locked" error.

The retry logic uses the tenacity library, and performs an
expoential backoff. Setting the amount of time to a very large
number is not advisable, as such the default of 30 seconds was
deemed reasonable.

Change-Id: Ifeb92e9f23a94f2d96bb495fe63a71df9865fef3
2023-07-18 13:14:45 +00:00
Zuul
2699a4ebbc Merge "Stop splitting installation docs per distros" 2023-07-18 12:45:20 +00:00
Julia Kreger
e80e029b48 Log when a periodic is completed
When debugigng the logs, it is sort of difficult to know when
actions complete for a specifc periodic task.

Example: Power sync, which includes an explicit sleep(0) to
yield control. The explicit sleep *is* important as it is
a low priority task. Where this quickly mixes things up
is when we're hunting database locking issues, and we
*must* verify that we *have exited* part of the code
completely.

Change-Id: Ie9eddedf6bca603845ff14e1ccbda665e3b9e5bd
2023-07-16 13:08:38 -07:00
Julia Kreger
0099d1812d Don't actually heartbeat with sqlite
Disables internal heartbeat mechanism when ironic has been
configured to utilize a SQLite database backend.

This is done to lessen the possibility of a
"database is locked" error, which can occur when two
distinct threads attempt to write to the database
at the same time with open writers.

The process keepalive heartbeat process was identified as
a major source of these write operations as it was writing
every ten seconds by default, which would also collide with
periodic tasks.

Change-Id: I7b6d7a78ba2910f22673ad8e72e255f321d3fdff
2023-07-14 09:22:01 -07:00
Dmitry Tantsur
267e61bbc7 Document caveats of running with SQLite
Change-Id: I5d182814b07a2d50345ee365c56a0f00724f6e47
2023-07-14 12:31:59 +02:00
Dmitry Tantsur
f4be664a86 Stop splitting installation docs per distros
The versions only differ in the first paragraph, and the supposedly
common parts actually have different code paths for different distros.

Also be realistic about which distros we support.

Change-Id: Ifcc19a20d42f384300cadf442951739be8682047
2023-07-14 11:38:10 +02:00
Julia Kreger
76c075269d Enable vendor interfaces to be called as steps
Adds the logic and testing to handle vendor interfaces to be able
to be called as steps, as well as adds the ipmitool send_raw
vendor passthru  method to be able to be called as a step.

Change-Id: I741a4173f1d150298008d3190e4c3998402a8b86
2023-07-13 07:40:53 -07:00
Jay Faulkner
1335402f42 Add additional logging on iLO power failure
Currently, there's no information printed about the style of failure
when we cannot properly change the power status on an iLO server. Now,
we ensure the actual server state and the expected server state at the
BMC level is logged, helping with troubleshooting edge cases.

Related-bug: 2021995
Change-Id: I77dc69ef4dd42e5ad674f5c00a4500027ef030ec
2023-07-13 14:02:54 +00:00
Zuul
54da324900 Merge "DB: Fix result set locking with periodics" 2023-07-13 12:21:58 +00:00
Dmitry Tantsur
afada321d8 Very basic in-band inspection with the "agent" interface
Only port creation/updating/deletion logic has been replicated from
ironic-inspector, as well as the add_ports and keep_ports options.

In the future patches, the added code will become a part of processing
hooks.

Change-Id: I69d6a1a53c5bf9e0f41d1a5bce7215edeea54b22
2023-07-13 09:50:11 +02:00
Dmitry Tantsur
6efa2119e4 Add the initial skeleton of the agent inspect interface
No real inspection is done: it only accepts data and returns success.

Common code has been extracted from the existing inspector-based
implementation.

Change-Id: I7462bb2e0449fb1098fe59e394b5c583fea89bac
2023-07-13 09:50:03 +02:00
Zuul
e2273d2b81 Merge "Disable spanning tree" 2023-07-12 19:55:24 +00:00
Julia Kreger
fb978dab1c DB: Fix result set locking with periodics
An issue previously existed where periodics would cause an open
transaction to exist with the database which would cause issues
when attempting to write to the database.

This issue has been fixed by assembling the data to return to
the calling method, such that an open transaction does not
remain, by copying the data retrieved from the database,
thus disjointing it from the transaction.

Closes-Bug: #2027405
Change-Id: I6401193b04fd3be78c37433bfdd0ccbd92aac8da
2023-07-12 12:07:09 -07:00
Stephen Finucane
8d2e93f30c Add job to test with SQLAlchemy master (2.x)
Change-Id: I75c021b02187ee0b746c49eb545b79b3a40608b1
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2023-07-11 10:50:55 -07:00
Zuul
416ea8711e Merge "CI: Change migrations timeout to be >60 seoncds" 2023-07-11 15:03:57 +00:00