Allows nodes with a single IP stack to be deployed from a dual-stack
Ironic.
Detecting advertised address and usable Ironic URLs are done completely
independently which does open some space for a misconfiguration. I hope
it's not likely in the reality, especially since this feature is
targetting advanced standalone users.
Change-Id: Ifa506c58caebe00b37167d329b81c166cdb323f2
Closes-Bug: #2045548
Currently, if heartbeat fails, we reschedule it after 5 seconds.
This is fine for the first retry, but it can cause a thundering herd
problem when a lot of nodes fail to heartbeat at once.
This change adds jitter to the minimum wait of 5 seconds. The jitter is
not applied for forced heartbeats: they still have a minimum wait of
exactly 5 seconds from the last heartbeat.
The code is re-ordered to move the interval calculation to one place.
Bonus: correctly logging the next interval.
The unit tests have been rewritten to test the heartbeat process step by
step and not rely on the exact sequence of the calls.
Closes-Bug: #2038438
Change-Id: I4c4207b15fb3d48b55e340b7b3b54af833f92cb5
Also fixes my use of set_override, as it is not on the actual
config object. You'd think I'd remember that, since I've done
that before...
Change-Id: I4b578c4319354001cbbd3b3856af96b30fd25555
Use pure json instead of jsonutils.
Borrow encode function from oslo.serialization to be used in the
utils module.
Change-Id: Ied9a2259a4329a86b4f0853bd1fb187563c0a036
Removes multipath base devices from consideration by
default, and instead allows the device-mapper device
managed by multipath to be picked up and utilized
instead.
In effect, allowing us to ignore standby paths *and*
leverage multiple concurrent IO paths if so offered
via ALUA.
In reality, anyone who has previously built IPA with
multipath tooling might not have encountered issues
previously because they used Active/Active SAN storage
environments. They would have worked because the IO lock
would have been exchanged between controllers and paths.
However, Active/Passive environments will block passive
paths from access, ultimately preventing new locks from
being established without proper negotiation. Ultimately
requiring multipathing *and* the agent to be smart enough
to know to disqualify underlying paths to backend storage
volumes.
An additional benefit of this is active/active MPIO devices
will, as long as ``multipath`` is present inside the ramdisk,
no longer possibly result in duplicate IO wipes occuring
accross numerous devices.
Story: #2010003
Task: #45108
Resolves: rhbz#2076622
Resolves: rhbz#2070519
Change-Id: I0fd6356f036d5ff17510fb838eaf418164cdfc92
Even if journald is present, there is no guarantee that IPA logs there
(this is the case in container-based ramdisks).
Change-Id: Iceeab0010827728711e19e5b031ccac55fe1efde
This means we do not have to rely on modprobe idempotency as
much and it's less code duplication, which is always nice.
Signed-off-by: Jonas Schäfer <jonas.schaefer@cloudandheat.com>
Change-Id: I996aba47bc54309e15e7d56e4a96b23b8deb5c9c
The IPA sends heartbeats to the conductor periodically and when
requested, e.g. at the end of asynchronous commands. In order
to avoid to send such notifications in too quick succession,
e.g. when two asynchronous commands finish at the same time or
when the periodic heartbeat was just sent right before a command
ended, this patch proposes to coalesce heartbeats which are
close together timewise and send only one for all of them
in a time interval of 5 seconds.
Co-Authored-By: Arne Wiebalck <arne.wiebalck@cern.ch>
Story: #2008983
Task: 42633
Change-Id: Idfbce44065e1e5a8b730b94741b2604c51f0ab14
The command params can be huge when configdrive is used. There is no
point in sending them back, Ironic does not use them anyhow.
Story: #2008904
Task: #42479
Change-Id: I6e3db5db2042ca3fb5dafacfacf036fd7fc2fc4c
This is very convenient for debugging and is something ironic and
ironic-inspector already do.
Register SSL options earlier so that they're accounted for.
Change-Id: I56aca8eec1dfeb065ac657452a7076a9e3d17cc3
Heartbeating in IPA has used select.poll() for years to workaround
a bug where changing the time in the ramdisk could cause heartbeats
to stop and never resume.
Now that IPA syncs time at start and exit, this workaround is no
longer needed. So instead, we'll revert to using threading.Event()
in order to make the code simpler and easier to understand.
Since we need this to be an eventlet-event, and not a standard-thread
event, also monkey_patch threading.
Additionally, there were a few completely unused backoff interval
values set, that were never applied. In respect of maintaining the
5+ years old behavior of not doing error backoffs, that code was
removed instead of being made to work.
Change-Id: Ibcde99de64bb7e95d5df63a42a4ca4999f0c4c9b
Adds a new flag (on by default) that enables generating a TLS
certificate and sending it to ironic via heartbeat. Whether
ironic supports auto-generated certificates is determined by
checking its API version.
Change-Id: I01f83dd04cfec2adc9e2a6b9c531391773ed36e5
Depends-On: https://review.opendev.org/747136
Depends-On: https://review.opendev.org/749975
Story: #2007214
Task: #40604
This change enables operators to set [DEFAULT]listen_tls to
true configure IPA to be host its WSGI server over TLS using
existing SSL support in oslo.service.
In addition to configuring this in IPA, a deployer will need to
also set [ssl]cert_file, [ssl]key_file, and optionally
[ssl]ca_file in their ipa config, in addition to embedding those
files into the IPA ramdisk in order for this to be functional.
In order to make this change work, we also need to monkey patch
socket library early, or else oslo.service will end up passing an
unpatched socket to the eventlet wsgi server, which causes
deadlocks.
Change-Id: Ib7decae410915f3c27b045ee08538c94d455b030
The listen_port and listen_host directives are intended to allow
deployers of IPA to change the port and host IPA listens on. These
configs have not been obeyed since the migration to the oslo.service
wsgi server.
Story: 2008016
Task: 40668
Change-Id: I76235a6e6ffdf80a0f5476f577b055223cdf1585
Caches hardware information collected during inspection
so that the initial lookup can occur without any delay.
Also adds logging to track how long inventory collection takes.
Co-Authored-By: Dmitry Tantsur <dtantsur@protonmail.com>
Change-Id: I3e0d237d37219e783d81913fa6cc490492b3f96a
Adds a new poll extension to provide get_hardware_info and get_node_info
interfaces.
get_hardware_info will be used for node validation by ironic deploy
drivers.
get_node_info will be used for sending lookup data to IPA.
standalone mode is assumed as debug only, but it's not the case
considering the poll mode will be introduced, slightly updates the
description, also prevents the mdns lookup when standalone is true.
Story: 1526486
Task: 28724
Change-Id: I5ad772a18cc4584585c5a7b6fb127547cece1998
The agent token was being passed through upon class invocation
as opposed through direct configuration loading due to how IPA
is designed to load items upon start-up.
As a result, we somewhere forgot to save the token to the api
client so heartbeats were being rejected when being forced
to be enabled by default.
This has been corrected an an additional test class has been
added to help ensure that this operates as expected.
Change-Id: I1151945d3dc6a685d62ad49183919ef4a81962f8
Adds support to the agent to receive, store, and return
that token to ironic's API, when supported.
This feature allows ironic and ultimately the agent to
authenticate interactions, when supported, to prevent
malicious abuse of the API endpoint.
Sem-Ver: feature
Change-Id: I6db9117a38be946b785e6f5e75ada1bfdff560ba
This change adds a new introspection data field 'configuration'
with two lists: managers and collectors.
Change-Id: Ice0d7e6ecff3f319bc3a4f41617059fd6914e31c
WSME is no longer maintained and Pecan is an overkill for our (purely
internal) API. This change rewrites the API in Werkzeug (the library
underneath Flask). I don't use Flask here since it's also an overkill
for API with 4 meaningful endpoints.
Change-Id: Ifed45f70869adf00e795202a53a2a53c9c57ef30
This change enables IPA to receive API endpoints and configuration
via multicast DNS.
Story: #2005393
Task: #30382
Change-Id: Ibbf07052bea8f5c0305dda098b2879bcbc2fece5
The agent stop function will write a byte string 'a' to the pipe
as a signal for the run function to end process.
The run function is expecting a literal string.
In python 2.x the byte string will automatically be converted to
literal, while python 3.x won't do the conversion, causing the
process to never stop.
This patch will fix that behavior, allowing the IPA to correctly
stop using python 3.x.
Story: 2004928
Task: 29308
Change-Id: Iad16e8bed2436d961dea8ddaec1c2724225b4097
This patch addresses few minor comments in commit
a659306272542dd38420cb118cc7b04b1e8cf377
Change-Id: Id5b48e3cc96c8807c471c947da3e233cebdf687e
Related-Bug: #1526449
Prevent IPA from picking up the IPv6 link-local address
as a callback_url in cases where it gets tried before other
addressing methods havn't complete yet. In this scenario IPA
sleeps for 10 seconds and then retries giving the nic a chance to
configure its routable IP address.
Change-Id: Ic53334c630180f0d77bb0231e548d2c44bfe55ca
Closes-Bug: #1732692
This patch adds support for rescue mode with DHCP tenant networks in
CoreOS. Applying network config from a configdrive is not yet supported
but will be in a future patch.
Co-Authored-By: Jay Faulkner <jay@jvf.cc>
Co-Authored-By: Taku Izumi <izumi.taku@jp.fujitsu.com>
Co-Authored-By: Annie Lezil <annie.lezil@gmail.com>
Co-Authored-By: Aparna <aparnavtce@gmail.com>
Co-Authored-By: Shivanand Tendulker <stendulker@gmail.com>
Change-Id: I7898ff22800dedba73d7fbfb3801378867abe183
Partial-Bug: 1526449
Minor change to a unit test; the names of the mock arguments to the
unit test method are not consistent with the actual ordering of the
mock decorators. This fixes it.
Change-Id: Id9e0dd1614703760b2fe143b2029f9bf6067420a
This patch is changing the _wait_for_disks() method behavior to wait to
a specific disk if any device hints is specified. There are cases where
the deployment might fail or succeed randomly depending on the order and
time that the disks shows up.
If no root device hints is specified, the method will just wait for any
suitable disk to show up, like before.
The _wait_for_disks call was made into a proper hardware manager method.
It is now also called each time the cached node is updated, not only
on start up. This is to ensure that we wait for the device, matching
root device hints (which are part of the node).
The loop was corrected to avoid redundant sleeps and warnings.
Finally, this patch adds more logging around detecting the root device.
Co-Authored-By: Dmitry Tantsur <dtantsur@redhat.com>
Change-Id: I10ca70d6a390ed802505c0d10d440dfb52beb56c
Closes-Bug: #1670916
Oslo.config deprecated parameter enforce_type and change its
default value to True in Ifa552de0a994e40388cbc9f7dbaa55700ca276b0.
Remove the usage of it to avoid DeprecationWarning: "Using the
'enforce_type' argument is deprecated in version '4.0' and will be
removed in version '5.0': The argument enforce_type has changed its
default value to True and then will be removed completely."
Change-Id: I0f0fb540c43edde64e489915c5199da40a0da9c1
Related--Bug: #1517839
Adds an extra field ``biosdevname`` to network interface inventory
collected by ``default`` inspection collector (which collects the whole
inventory returned by hardware manager) of ironic-python-agent.
This feature requires biosdevname utility to collect the bios given NIC
names. The tooling module for tinyIPA is created for the same purpose.
For CoreOS IPA pxe images, biosdevname tooling module is limited,
because Docker repository is created and embedded into CoreOS pxe
images. The Docker repository uses debian to download the packages.
Debian does not have biosdevname package.
Adds an export variable TINYIPA_REQUIRE_BIOSDEVNAME. Set this
variable to ``true`` in your shell before building tinyIPA.
Closes-Bug: #1635351
Change-Id: Ia96af59e2a74868cac59e5a88cfbb3be60d85687
This change introduces a new base test class that mocks out
utils.execute and forces an exception if it gets called.
This has rooted out many tests that were doing this as a side effect of
calling other functions, doing things like modprobe and running iscsi
on the host's actual machine.
The tests are all now appropriately patched in places where this was
happening, and the new base class permanently prevents this from
accidentally happening again.
If you really want to call utils.execute() then you need to re-mock it
in your unit test.
Change-Id: Idf87d09a9c01a6bfe2767f8becabe65c02983518
Add missing 'autospec' keyword argument to mock.patch and
mock.patch.object calls. Use 'autospec=True' except for a few cases
where it fails because the mocked function is a @classmethod and it
doesn't work. In that case explicity set it to 'autospec=False'
Change-Id: I620dce91abaa4440e1803aeefb3e93c0b65d1419
Use the flake8 plugin flake8-import-order to check import ordering. It
can do it automatically and don't need reviewers to check it.
Change-Id: I946457e9079ce0b54c7fe0ad554d024a1c61dce0
Parse the output of "ip route get $IP" taking
IPv6 into consideration. Also wrap the IP address
in square brackets if it is IPv6.
Change-Id: Ifc44e5aa3c5b814b6ceba04461bb68fe1d75c22b
Closes-Bug: #1650533
This patch adds a new hardware manager, which will disable the embedded
LLDP agent on Intel CNA network cards in order to allow the gathering of
LLDP data during the inspection process.
Change-Id: I572756ac6a7bf67a7f446738ba9d145e1c1bdc48
Closes-Bug: #1623659
In test_agent.py change variable names from 'mocked_' to 'mock_' to
follow convention we use in openstack/ironic.
Change the mock variable names 'wsgi_server_cls' and
'mocked_server_maker' in some of the unit tests to
'mock_make_server'. It is the convention to have 'mock' in the
variable names for mock objects.
Not changing the other unit test files at this time.
Change-Id: I844071c928f92778e8ce0bfbdb8fb89fc26ee43b
Currently, if IPA is booted without an ironic api url, it will default
to localhost and fail to connect. Instead, we now explicitly fail and
print a log message if no api callback url is provided.
Change-Id: I0271be94ba7febc6abd5bf3343f6fa179bc1a6a4
Closes-Bug: #1643966
This patch dispatches out the network_interface_info
to allow vendor hardware managers to plug the spacific
implementation. It also move neworking releated methods
form hardware to netutils
Related-Bug: #1532534
Change-Id: Idcd25c4753c009b5ba70bea97ee4eb83391a77a9
Lookup/Heartbeat via vendor passthru was deprecated in Newton.
This patch removes the corresponding functionality from IPA,
and also removes handling of 'ipa-driver-name' kernel parameter,
as it was only used in code related to old passthru.
Change-Id: I2c7989063ab3e4c0bae33f05d6d2ed857a2d9944
Closes-Bug: #1640533
Use the namedtuple class to improve code readability by creating a Host
class with namedtuple to store the 'hostname' and 'port'
Replace foo[0] with foo.hostname, and foo[1] with foo.port to make code
more readable.
Change-Id: Ie2b5f9cf89e7ccbbcf0a2573dab6f6c5d14c018b
Falls back to vendor passthru on receiving 404.
Also fixes logging around lookup: log traceback on unexpected
exceptions, log successful lookup and replace % with ,
Change-Id: I7160c99ca63585fc333482fa578fdf5e0962b2b6
Depends-On: I9080c07b03103cd7a323e2fc01be821733b07eea
Partial-Bug: #1570841