Currently, IntelCnaHardwareManager inherits GenericHardwareManager
which makes it a new "GenericHardwareManager" with "MAINLINE" priority.
This causes all other hardware-managers with lower priority than
"MAINLINE" never be used. To fix this, make IntelCnaHardwareManager
inherit basic HardwareManager.
Change-Id: I28b665d8841b0b2e83b132e1f25df95e03e7ba10
Story: 2008142
Task: 40882
(cherry picked from commit 4b0ef13d08)
The listen_port and listen_host directives are intended to allow
deployers of IPA to change the port and host IPA listens on. These
configs have not been obeyed since the migration to the oslo.service
wsgi server.
Story: 2008016
Task: 40668
Change-Id: I76235a6e6ffdf80a0f5476f577b055223cdf1585
(cherry picked from commit 7d0ad36ebd)
Tinyipa is not that tiny anymore and we need to increase the base
memory for VMs in jobs that use it.
Change-Id: I7e2d93ef566f4f2beb54caf445c7810a01aec1de
(cherry picked from commit 2a5ba7ef68)
Agent lookups can fail as we presently use logging.exception,
better known in our code as LOG.exception, which can also generate
other fun issues on journald based systems where additional errors
could be raised resulting in us being unable to troubleshoot the
the actual issue.
Because of the mis-use of LOG.exception and the default behavior
of the backoff retry handler, the retry logic was also not
functional as any error no matter how small caused IPA to
just exit.
Change-Id: Ic4608b7c6ff9773d1403926efb3d59869c71343b
Story: 2007968
Task: 40465
(cherry picked from commit 5eab9bced6)
When no root_device hint is set, an MDRAID partition can be incorrectly
selected as the root device which causes installation of the bootloader
to the physical disks behind the MDRAID volume to fail. See the notes
in the referenced Story for more detail.
This change adds a little more specificity to the listing of block
devices.
Change-Id: I66db457e71a0586723ee753bef961aec5bf58827
Story: 2007905
Task: 40303
(cherry picked from commit 5e95b1321d)
Eventlet, when monkey patching occurs, replaces the base
dns resolver methods. This can lead to compatability issues,
and un-expected exceptions being raised during the process
of monkey patching. Such as one if there are no resolvers.
As such, since we don't really need monkey patching of DNS,
and setting the flag should make the inspector CI jobs happier
where we don't need nor use DNS, AND tinycore may not be setting
a resolver configuration at all, which is the root of the failure
upon monkey patching that casues IPA to fail on start in certian
circumstances.
As a note, this has been performed on other projects due to
bugs. See Id9fe265d67f6e9ea5090bebcacae4a7a9150c5c2.
Change-Id: Ib8f7b844b1bfffff16f88ebbb6ef5ddbe61d5a30
Story: 2007936
Task: 40394
(cherry picked from commit 9830f3cb0f)
AgentRAID expects it and fails with TypeError if it's not provided.
Change-Id: Id84ac129bba97540338e25f0027aa0a0f51bde52
Story: #2006963
(cherry picked from commit f03d72019a)
delete_configuration still fetches all devices as it needs to clean
ones with broken RAID.
Story: #2007907
Task: #40307
Change-Id: I4b0be2b0755108490f9cd3c4f3b71a5e036761a1
(cherry picked from commit 1f3b70c4e9)
When we added software raid support, we started calling bootloader
installation. As time went on, we ehnanced that code path for non
RAID cases in order to ensure that UEFI nvram was setup
for the instance to boot properly.
Somewhere in this process, we missed a possible failure case where
the iscsi client tgtadm may return failures. Obviously, the correct
path is to not call iscsi teardown if we don't need to.
Since it was always semi-opportunistic teardown, we can't blindly
catch any error, and if we started iSCSI and failed to tear the
connection down, we might want to still fail, so this change
moves the logic over to use a flag on the agent object which
one extension to set the flag and the other to read it and take
action based upon that.
Change-Id: Id3b1ae5e59282f4109f6246d5614d44c93aefa7c
Story: 2007937
Task: 40395
(cherry picked from commit 2a56ee03b6)
- Increase the number of VM's since we are running two tests.
- Do no run ipa-tempest-wholedisk-* (the partition jobs are covering
this since tempest runs wholedisk and partition tests)
- Remove `partition` from the job names
Change-Id: Ic82fc9dc7b658dab52c06b3619a0118acd111bc5
(cherry picked from commit 6bde89e4dd)
Caches hardware information collected during inspection
so that the initial lookup can occur without any delay.
Also adds logging to track how long inventory collection takes.
Conflicts:
ironic_python_agent/tests/unit/base.py
Co-Authored-By: Dmitry Tantsur <dtantsur@protonmail.com>
Change-Id: I3e0d237d37219e783d81913fa6cc490492b3f96a
(cherry picked from commit c76b8b2c21)
This has been a popular guidance, and diskimage-builder has recently
started following it.
Change-Id: I794c846fb191c15b0a30546bf64d624dfbde0fd4
(cherry picked from commit ba3caa6c64)
Follow-up to commit c5b97eb781.
Two things slipped through the cracks:
* ImageDownloadError was instantiated incorrectly, resulting in a wrong
error message. This was uncovered by using assertRaisesRegext in tests.
* We allowed calling write(None). This was uncovered by avoiding sleep(4)
in tests and enabling more failed calls before timeout.
Change-Id: If5e798c5461ea3e474a153574b0db2da96f2dfa8
(cherry picked from commit 00ad03b709)
We log them as completed when they start executing.
Also fix a problem in remove_large_keys that prevented items
with defaultdict from being logged.
Change-Id: I34a06cc85f55c693416f8c4c9877d55d6affafc9
(cherry picked from commit 0eee26ea66)
(cherry picked from commit c9876dd937)
The download retry interval was previously five seconds which is
not long enough to recover after a hard network connectivity break
where we may be reliant upon network port forwarding hold-down
timers or even routing protocol route propogation to recover
communication.
Previously the time value was 5 seconds, with 3 attempts, meaning
15 seconds total ignoring the error detection timeouts.
Now it is 10 seconds, with 10 attempts, meaning 100 seconds before
the error detection timeouts.
Change-Id: I6d11edc9a3156f2bdc21c3d432ecc7625d652699
(cherry picked from commit c77a7df851)
Instead of just trying to get the connection and handler
for the download, lets try to retry the whole action of
of downloading.
Change-Id: I9217792d32e6f33c70f146a9b7d3ef58c5644d8a
(cherry picked from commit 159ab9f0ce)
Socket read operations can be blocking and may not timeout as
expected when thinking of timeouts at the beginning of a
socket request. This can occur when streaming file contents
down to the agent and there is a hard connectivity break.
In other words, we could be in a situation like:
- read(fd, len) - Gets data
- Select returns context to the program, we do things with data.
** hard connectivity break for next 90 seconds**
- read(fd, len) - We drain the in-memory buffer side of the socket.
- Select returns context, we do things with our remaining data
** Server retransmits **
** Server times out due to no ack **
** Server closes socket and issues a FIN,RST packet to the client **
** Connectivity restored, Client never got FIN,RST **
** Client socket still waiting for more data **
- read(fd, len) - No data returned
- Select returns, yet we have no data to act on as the buffer is
empty OR the buffered data doesn't meet our requried read len value.
tl;dr noop
- read(fd, len) <-- We continue to try and read until the socket is
recognized as dead, which could be a long time.
NOTE: The above read()s are python's read() on an contents being
streamed. Lower level reads exist, but brains will hurt
if we try to cover the dynamics at that level.
As such, we need to keep an eye on when the last time we
received a packet, and treat that as if we have timed out
or not. Requests periodically yeilds back even when no data
has been received, in order to allow the caller to wall
clock the progress/status and take appropriate action.
When we exceed the timeout time value with our wall clock,
we will fail the download.
Change-Id: I7214fc9dbd903789c9e39ee809f05454aeb5a240
(cherry picked from commit c5b97eb781)
It does not return anything, so it makes no point for it to be
synchronous. Ironic always calls it with wait=True, so there is
no problem with backward compatibility either.
Change-Id: I44fec2e0cb54486328ce71263613d8592e384870
(cherry picked from commit 7e5fe1121e)
Currently running of ipa-centos8-stable-ussuri image causes 100%
cpu usage while cleaning. Proposed change fixes this behavior and
significantly speeds up cleaning.
Change-Id: I2ba9a69f22b11830d8ff1bc346b17bf1a52f25b0
Story: #2007696
Task: #39809
(cherry picked from commit 952489020e)
The only reason the current 2 GiB nodes work is because DIB started
removing linux-firmware from its images. Unfortunately, we need this
package on bare metal, and readding it brings to 3 GiB consump
Change-Id: Ic0f2274b29abe7470b835cfc305898643204e1f6
(cherry picked from commit 67dd91dbea)
Update the URL to the upper-constraints file to point to the redirect
rule on releases.openstack.org so that anyone working on this branch
will switch to the correct upper-constraints list automatically when
the requirements repository branches.
Until the requirements repository has as stable/ussuri branch, tests will
continue to use the upper-constraints list on master.
Change-Id: I2b664bb05c67bc4ae367c76ac3d41283515313fb
If the server is stuck for any reason, the download will hang for
a potentially long time. Provide a timeout (defaults to 60 seconds)
and 2 retries on failure.
Change-Id: Ie53519266edd914fdbfa82fe52b4a55151e5ec5f
Full py3 compatible version.
Add all Python3 modules to stdlib list.
Also includes fix to an enum34 dependency bug.
Change-Id: Ic50a33cf5dfa45c23dbffaee8c5d9e109fb0a734
This function checks for /sys/firmware/efi. Some tests do not mock
isdir, so they fail on UEFI machines.
Change-Id: I088218ddb88717ac07669d0b97c6cd50208ede8c
For compatibility with out-of-band RAID deploy steps, we need to have
one apply_configuration step, not a create/delete pair.
Change-Id: I55bbed96673c9fa247cafdac9a3ade3a6ff3f38d
Story: #2006963
Now that we no longer support py27, we can use the standard library
unittest.mock module instead of the third party mock lib.
Change-Id: I5fdb2a02ee83c692d46cbe28266fcae033bec6f6
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
DIB builds instance images with EFI partitions that only have the boot
flag, but not esp. According to parted documentation, boot is an alias
for esp on GPT, so accept it as well.
To avoid complexities when parsing parted output, the implementation
is switched to existing utils and ironic-lib functions.
Change-Id: I5f57535e5a89528c38d0879177b59db6c0f5c06e
Story: #2007455
Task: #39423
Currently we fail with HTTP 401 if both the known and the received
tokens are None. This prevents IPA from being updated before ironic.
Story: #2007557
Task: #39419
Change-Id: I80249bd3468b581dc035d72156cbfa2f5f225a1b