67 Commits

Author SHA1 Message Date
Dmitry Tantsur
0d4ae976c2
Support several API and Inspector URLs
Allows nodes with a single IP stack to be deployed from a dual-stack
Ironic.

Detecting advertised address and usable Ironic URLs are done completely
independently which does open some space for a misconfiguration. I hope
it's not likely in the reality, especially since this feature is
targetting advanced standalone users.

Change-Id: Ifa506c58caebe00b37167d329b81c166cdb323f2
Closes-Bug: #2045548
2024-01-09 16:43:23 +01:00
Dmitry Tantsur
2ab8364649
Add a jitter to heartbeat retries
Currently, if heartbeat fails, we reschedule it after 5 seconds.
This is fine for the first retry, but it can cause a thundering herd
problem when a lot of nodes fail to heartbeat at once.

This change adds jitter to the minimum wait of 5 seconds. The jitter is
not applied for forced heartbeats: they still have a minimum wait of
exactly 5 seconds from the last heartbeat.

The code is re-ordered to move the interval calculation to one place.
Bonus: correctly logging the next interval.

The unit tests have been rewritten to test the heartbeat process step by
step and not rely on the exact sequence of the calls.

Closes-Bug: #2038438
Change-Id: I4c4207b15fb3d48b55e340b7b3b54af833f92cb5
2023-12-13 17:34:24 +01:00
Julia Kreger
e6fd7e753e Allow md5 to be disabled from the conductor
Also fixes my use of set_override, as it is not on the actual
config object. You'd think I'd remember that, since I've done
that before...

Change-Id: I4b578c4319354001cbbd3b3856af96b30fd25555
2023-05-25 07:59:07 -07:00
Riccardo Pittau
64ffd2ee80 Remove oslo.serialization dependency
Use pure json instead of jsonutils.

Borrow encode function from oslo.serialization to be used in the
utils module.

Change-Id: Ied9a2259a4329a86b4f0853bd1fb187563c0a036
2022-06-17 09:37:35 +02:00
Julia Kreger
014d37743a Multipath Hardware path handling
Removes multipath base devices from consideration by
default, and instead allows the device-mapper device
managed by multipath to be picked up and utilized
instead.

In effect, allowing us to ignore standby paths *and*
leverage multiple concurrent IO paths if so offered
via ALUA.

In reality, anyone who has previously built IPA with
multipath tooling might not have encountered issues
previously because they used Active/Active SAN storage
environments. They would have worked because the IO lock
would have been exchanged between controllers and paths.
However, Active/Passive environments will block passive
paths from access, ultimately preventing new locks from
being established without proper negotiation. Ultimately
requiring multipathing *and* the agent to be smart enough
to know to disqualify underlying paths to backend storage
volumes.

An additional benefit of this is active/active MPIO devices
will, as long as ``multipath`` is present inside the ramdisk,
no longer possibly result in duplicate IO wipes occuring
accross numerous devices.

Story: #2010003
Task: #45108
Resolves: rhbz#2076622
Resolves: rhbz#2070519
Change-Id: I0fd6356f036d5ff17510fb838eaf418164cdfc92
2022-05-18 20:26:39 -03:00
Dmitry Tantsur
2cedaa53c2 Always include the oslo_log log file in ramdisk logs
Even if journald is present, there is no guarantee that IPA logs there
(this is the case in container-based ramdisks).

Change-Id: Iceeab0010827728711e19e5b031ccac55fe1efde
2021-10-28 18:32:40 +02:00
Jonas Schäfer
6441db61ce Move loading of IPMI module loading to a single point
This means we do not have to rely on modprobe idempotency as
much and it's less code duplication, which is always nice.

Signed-off-by: Jonas Schäfer <jonas.schaefer@cloudandheat.com>

Change-Id: I996aba47bc54309e15e7d56e4a96b23b8deb5c9c
2021-08-06 13:14:45 +02:00
Dmitry Tantsur
b605943796 Coalesce heartbeats
The IPA sends heartbeats to the conductor periodically and when
requested, e.g. at the end of asynchronous commands. In order
to avoid to send such notifications in too quick succession,
e.g. when two asynchronous commands finish at the same time or
when the periodic heartbeat was just sent right before a command
ended, this patch proposes to coalesce heartbeats which are
close together timewise and send only one for all of them
in a time interval of 5 seconds.

Co-Authored-By: Arne Wiebalck <arne.wiebalck@cern.ch>

Story: #2008983
Task: 42633

Change-Id: Idfbce44065e1e5a8b730b94741b2604c51f0ab14
2021-06-18 17:19:30 +02:00
Dmitry Tantsur
51aa31070a Do not serialize command_params
The command params can be huge when configdrive is used. There is no
point in sending them back, Ironic does not use them anyhow.

Story: #2008904
Task: #42479
Change-Id: I6e3db5db2042ca3fb5dafacfacf036fd7fc2fc4c
2021-05-18 12:59:28 +02:00
Dmitry Tantsur
c585603ee6 Log configuration options on start-up
This is very convenient for debugging and is something ironic and
ironic-inspector already do.

Register SSL options earlier so that they're accounted for.

Change-Id: I56aca8eec1dfeb065ac657452a7076a9e3d17cc3
2020-11-11 16:38:10 +01:00
Zuul
faeb9441d3 Merge "Simplify heartbeating by removing use of select()" 2020-09-29 15:47:08 +00:00
Jay Faulkner
a01646f56b Simplify heartbeating by removing use of select()
Heartbeating in IPA has used select.poll() for years to workaround
a bug where changing the time in the ramdisk could cause heartbeats
to stop and never resume.

Now that IPA syncs time at start and exit, this workaround is no
longer needed. So instead, we'll revert to using threading.Event()
in order to make the code simpler and easier to understand.

Since we need this to be an eventlet-event, and not a standard-thread
event, also monkey_patch threading.

Additionally, there were a few completely unused backoff interval
values set, that were never applied. In respect of maintaining the
5+ years old behavior of not doing error backoffs, that code was
removed instead of being made to work.

Change-Id: Ibcde99de64bb7e95d5df63a42a4ca4999f0c4c9b
2020-09-22 16:59:47 +00:00
Dmitry Tantsur
021e0a6a46 Generate a TLS certificate and send it to ironic
Adds a new flag (on by default) that enables generating a TLS
certificate and sending it to ironic via heartbeat. Whether
ironic supports auto-generated certificates is determined by
checking its API version.

Change-Id: I01f83dd04cfec2adc9e2a6b9c531391773ed36e5
Depends-On: https://review.opendev.org/747136
Depends-On: https://review.opendev.org/749975
Story: #2007214
Task: #40604
2020-09-11 17:46:52 +02:00
Jay Faulkner
1d11f0b7dd If listen_tls is true, enable TLS on wsgi server
This change enables operators to set [DEFAULT]listen_tls to
true configure IPA to be host its WSGI server over TLS using
existing SSL support in oslo.service.

In addition to configuring this in IPA, a deployer will need to
also set [ssl]cert_file, [ssl]key_file, and optionally
[ssl]ca_file in their ipa config, in addition to embedding those
files into the IPA ramdisk in order for this to be functional.

In order to make this change work, we also need to monkey patch
socket library early, or else oslo.service will end up passing an
unpatched socket to the eventlet wsgi server, which causes
deadlocks.

Change-Id: Ib7decae410915f3c27b045ee08538c94d455b030
2020-09-02 16:07:42 -07:00
Jay Faulkner
7d0ad36ebd Make WSGI server respect listen_* directives
The listen_port and listen_host directives are intended to allow
deployers of IPA to change the port and host IPA listens on. These
configs have not been obeyed since the migration to the oslo.service
wsgi server.

Story: 2008016
Task: 40668
Change-Id: I76235a6e6ffdf80a0f5476f577b055223cdf1585
2020-08-31 14:37:38 +00:00
Zuul
bfb395837d Merge "Adds poll mode deployment support" 2020-07-22 19:53:31 +00:00
Julia Kreger
c76b8b2c21 Limit Inspection->Lookup->Heartbeat lag
Caches hardware information collected during inspection
so that the initial lookup can occur without any delay.

Also adds logging to track how long inventory collection takes.

Co-Authored-By: Dmitry Tantsur <dtantsur@protonmail.com>
Change-Id: I3e0d237d37219e783d81913fa6cc490492b3f96a
2020-07-03 10:32:26 +02:00
Kaifeng Wang
61c95554ff Adds poll mode deployment support
Adds a new poll extension to provide get_hardware_info and get_node_info
interfaces.

get_hardware_info will be used for node validation by ironic deploy
drivers.

get_node_info will be used for sending lookup data to IPA.

standalone mode is assumed as debug only, but it's not the case
considering the poll mode will be introduced, slightly updates the
description, also prevents the mdns lookup when standalone is true.

Story: 1526486
Task: 28724

Change-Id: I5ad772a18cc4584585c5a7b6fb127547cece1998
2020-06-21 16:44:00 +08:00
Riccardo Pittau
d5d62c8dbf Use unittest mock from standard library
Drop the third party mock library to use unittest mock from
standard library.

Change-Id: Ib64b661572e4869a24865c02a6c84a6603930394
2020-04-06 14:35:50 +02:00
Julia Kreger
c97a71d6f3 Fix agent token vmedia use
The agent token was being passed through upon class invocation
as opposed through direct configuration loading due to how IPA
is designed to load items upon start-up.

As a result, we somewhere forgot to save the token to the api
client so heartbeats were being rejected when being forced
to be enabled by default.

This has been corrected an an additional test class has been
added to help ensure that this operates as expected.

Change-Id: I1151945d3dc6a685d62ad49183919ef4a81962f8
2020-03-20 08:53:53 +00:00
Julia Kreger
af5f05a0ee Agent token support
Adds support to the agent to receive, store, and return
that token to ironic's API, when supported.

This feature allows ironic and ultimately the agent to
authenticate interactions, when supported, to prevent
malicious abuse of the API endpoint.

Sem-Ver: feature
Change-Id: I6db9117a38be946b785e6f5e75ada1bfdff560ba
2020-03-12 10:35:17 -07:00
Dmitry Tantsur
31b73b4984 Expose collector and hardware manager names via introspection data
This change adds a new introspection data field 'configuration'
with two lists: managers and collectors.

Change-Id: Ice0d7e6ecff3f319bc3a4f41617059fd6914e31c
2020-01-22 11:15:38 +01:00
Dmitry Tantsur
f1b2df908a Replace WSME and Pecan with Werkzeug
WSME is no longer maintained and Pecan is an overkill for our (purely
internal) API. This change rewrites the API in Werkzeug (the library
underneath Flask). I don't use Flask here since it's also an overkill
for API with 4 meaningful endpoints.

Change-Id: Ifed45f70869adf00e795202a53a2a53c9c57ef30
2019-12-04 16:50:47 +01:00
Dmitry Tantsur
c7c307c497 Software RAID: try to assemble RAID on start-up
TinyIPA does not do it, which prevents deployment on RAID devices.

Story: #2004581
Task: #36196
Change-Id: I6c1d90b4669102ab59588ab15f7166626c4d72be
2019-08-09 13:28:25 +02:00
Dmitry Tantsur
5c5328ccaa Supports fetching API endpoints from mDNS
This change enables IPA to receive API endpoints and configuration
via multicast DNS.

Story: #2005393
Task: #30382
Change-Id: Ibbf07052bea8f5c0305dda098b2879bcbc2fece5
2019-05-29 16:58:24 +02:00
Riccardo Pittau
d525f8a07f Making ironic-python-agent able to stop with python 3.x
The agent stop function will write a byte string 'a' to the pipe
as a signal for the run function to end process.
The run function is expecting a literal string.
In python 2.x the byte string will automatically be converted to
literal, while python 3.x won't do the conversion, causing the
process to never stop.
This patch will fix that behavior, allowing the IPA to correctly
stop using python 3.x.

Story: 2004928
Task: 29308
Change-Id: Iad16e8bed2436d961dea8ddaec1c2724225b4097
2019-02-04 09:55:08 +01:00
Shivanand Tendulker
f08636fe8b Follow-up patch for rescue extension for CoreOS
This patch addresses few minor comments in commit
a659306272542dd38420cb118cc7b04b1e8cf377

Change-Id: Id5b48e3cc96c8807c471c947da3e233cebdf687e
Related-Bug: #1526449
2018-01-30 19:00:13 +00:00
Zuul
893c63f24a Merge "Rescue extension for CoreOS with DHCP tenant networks" 2017-12-11 21:14:09 +00:00
Derek Higgins
214790d17e Ignore IPv6 link local addresses
Prevent IPA from picking up the IPv6 link-local address
as a callback_url in cases where it gets tried before other
addressing methods havn't complete yet. In this scenario IPA
sleeps for 10 seconds and then retries giving the nic a chance to
configure its routable IP address.

Change-Id: Ic53334c630180f0d77bb0231e548d2c44bfe55ca
Closes-Bug: #1732692
2017-11-21 10:11:21 +00:00
Mario Villaplana
a659306272 Rescue extension for CoreOS with DHCP tenant networks
This patch adds support for rescue mode with DHCP tenant networks in
CoreOS. Applying network config from a configdrive is not yet supported
but will be in a future patch.

Co-Authored-By: Jay Faulkner <jay@jvf.cc>
Co-Authored-By: Taku Izumi <izumi.taku@jp.fujitsu.com>
Co-Authored-By: Annie Lezil <annie.lezil@gmail.com>
Co-Authored-By: Aparna <aparnavtce@gmail.com>
Co-Authored-By: Shivanand Tendulker <stendulker@gmail.com>
Change-Id: I7898ff22800dedba73d7fbfb3801378867abe183
Partial-Bug: 1526449
2017-11-06 04:48:58 -05:00
Ruby Loo
b433fb07ea Unit test has incorrect mock order
Minor change to a unit test; the names of the mock arguments to the
unit test method are not consistent with the actual ordering of the
mock decorators. This fixes it.

Change-Id: Id9e0dd1614703760b2fe143b2029f9bf6067420a
2017-10-18 11:32:23 -04:00
Lucas Alvares Gomes
3189c16a5e Fix waiting for target disk to appear
This patch is changing the _wait_for_disks() method behavior to wait to
a specific disk if any device hints is specified. There are cases where
the deployment might fail or succeed randomly depending on the order and
time that the disks shows up.

If no root device hints is specified, the method will just wait for any
suitable disk to show up, like before.

The _wait_for_disks call was made into a proper hardware manager method.
It is now also called each time the cached node is updated, not only
on start up. This is to ensure that we wait for the device, matching
root device hints (which are part of the node).

The loop was corrected to avoid redundant sleeps and warnings.

Finally, this patch adds more logging around detecting the root device.

Co-Authored-By: Dmitry Tantsur <dtantsur@redhat.com>
Change-Id: I10ca70d6a390ed802505c0d10d440dfb52beb56c
Closes-Bug: #1670916
2017-10-16 15:39:25 +02:00
ChangBo Guo(gcb)
30e0da15ea Remove usage of parameter enforce_type
Oslo.config deprecated parameter enforce_type and change its
default value to True in Ifa552de0a994e40388cbc9f7dbaa55700ca276b0.
Remove the usage of it to avoid DeprecationWarning: "Using the
'enforce_type' argument is deprecated in version '4.0' and will be
removed in version '5.0': The argument enforce_type has changed its
default value to True and then will be removed completely."

Change-Id: I0f0fb540c43edde64e489915c5199da40a0da9c1
Related--Bug: #1517839
2017-06-14 13:47:29 +08:00
Annie Lezil
fdcb0922a5 Collect NIC name given by BIOS
Adds an extra field ``biosdevname`` to network interface inventory
collected by ``default`` inspection collector (which collects the whole
inventory returned by hardware manager) of ironic-python-agent.

This feature requires biosdevname utility to collect the bios given NIC
names. The tooling module for tinyIPA is created for the same purpose.
For CoreOS IPA pxe images, biosdevname tooling module is limited,
because Docker repository is created and embedded into CoreOS pxe
images. The Docker repository uses debian to download the packages.
Debian does not have biosdevname package.

Adds an export variable TINYIPA_REQUIRE_BIOSDEVNAME. Set this
variable to ``true`` in your shell before building tinyIPA.

Closes-Bug: #1635351
Change-Id: Ia96af59e2a74868cac59e5a88cfbb3be60d85687
2017-05-18 14:44:11 -07:00
Julian Edwards
f57cbccf8b Prevent tests' unmocked access to utils.execute()
This change introduces a new base test class that mocks out
utils.execute and forces an exception if it gets called.
This has rooted out many tests that were doing this as a side effect of
calling other functions, doing things like modprobe and running iscsi
on the host's actual machine.

The tests are all now appropriately patched in places where this was
happening, and the new base class permanently prevents this from
accidentally happening again.

If you really want to call utils.execute() then you need to re-mock it
in your unit test.

Change-Id: Idf87d09a9c01a6bfe2767f8becabe65c02983518
2017-05-15 10:48:43 +10:00
John L. Villalovos
1695cb18c2 Add missing 'autospec' argument to mock.patch
Add missing 'autospec' keyword argument to mock.patch and
mock.patch.object calls. Use 'autospec=True' except for a few cases
where it fails because the mocked function is a @classmethod and it
doesn't work. In that case explicity set it to 'autospec=False'

Change-Id: I620dce91abaa4440e1803aeefb3e93c0b65d1419
2017-03-19 10:04:19 -07:00
John L. Villalovos
949f4f509e Use flake8-import-order
Use the flake8 plugin flake8-import-order to check import ordering. It
can do it automatically and don't need reviewers to check it.

Change-Id: I946457e9079ce0b54c7fe0ad554d024a1c61dce0
2017-02-16 09:46:21 -08:00
Derek Higgins
256b233b4b Add IPv6 unit test for _get_route_source
Follow up from previous patch

Change-Id: Ic0286c4eac57f3ae06b237b700d9dbe95ea8b2c0
2017-01-19 15:24:17 +00:00
Derek Higgins
9f5f664080 Advertise the correct address when using IPv6
Parse the output of "ip route get $IP" taking
IPv6 into consideration. Also wrap the IP address
in square brackets if it is IPv6.

Change-Id: Ifc44e5aa3c5b814b6ceba04461bb68fe1d75c22b
Closes-Bug: #1650533
2017-01-11 11:00:56 +00:00
Szymon Borkowski
ef47d62f43 Add a new Hardware Manager for CNA network card
This patch adds a new hardware manager, which will disable the embedded
LLDP agent on Intel CNA network cards in order to allow the gathering of
LLDP data during the inspection process.

Change-Id: I572756ac6a7bf67a7f446738ba9d145e1c1bdc48
Closes-Bug: #1623659
2016-12-12 17:17:23 +01:00
John L. Villalovos
9ebb70d2fd Update mock variable name in unit tests
In test_agent.py change variable names from 'mocked_' to 'mock_' to
follow convention we use in openstack/ironic.

Change the mock variable names 'wsgi_server_cls' and
'mocked_server_maker' in some of the unit tests to
'mock_make_server'. It is the convention to have 'mock' in the
variable names for mock objects.

Not changing the other unit test files at this time.

Change-Id: I844071c928f92778e8ce0bfbdb8fb89fc26ee43b
2016-12-07 13:37:34 -08:00
Jenkins
9f9b5b5c43 Merge "Skip API related work if no api url configured" 2016-12-07 17:05:39 +00:00
Yufei
dd9253f1b6 Skip API related work if no api url configured
Currently, if IPA is booted without an ironic api url, it will default
to localhost and fail to connect. Instead, we now explicitly fail and
print a log message if no api callback url is provided.

Change-Id: I0271be94ba7febc6abd5bf3343f6fa179bc1a6a4
Closes-Bug: #1643966
2016-12-07 17:04:05 +08:00
Jenkins
f9236682f7 Merge "Dispatched out network interface info to all hardware managers" 2016-11-20 13:05:01 +00:00
Moshe Levi
966db1c18c Dispatched out network interface info to all hardware managers
This patch dispatches out the network_interface_info
to allow vendor hardware managers to plug the spacific
implementation. It also move neworking releated methods
form hardware to netutils
Related-Bug: #1532534

Change-Id: Idcd25c4753c009b5ba70bea97ee4eb83391a77a9
2016-11-17 13:08:03 +02:00
Luong Anh Tuan
ab41106cf6 Python 3 Compatible JSON
In order to be really python3 compatible, the json lib was replaced
with oslo.serialization(1.10 or newer) module jsontuils since it's
the recommended migration to python3 guide.

https://wiki.openstack.org/wiki/Python3#Serialization:_base64.2C_JSON.2C_etc.

Change-Id: I2d8b62e642aba4ccd1b70be7e9b3784a95a6743d
Closes-Bug: #1629068
2016-11-16 08:19:51 +00:00
Pavlo Shchelokovskyy
b033bfd933 Remove old lookup/heartbeat from IPA
Lookup/Heartbeat via vendor passthru was deprecated in Newton.

This patch removes the corresponding functionality from IPA,
and also removes handling of 'ipa-driver-name' kernel parameter,
as it was only used in code related to old passthru.

Change-Id: I2c7989063ab3e4c0bae33f05d6d2ed857a2d9944
Closes-Bug: #1640533
2016-11-09 16:34:44 +00:00
John L. Villalovos
abf98ae84a Use namedtuple to improve code readability
Use the namedtuple class to improve code readability by creating a Host
class with namedtuple to store the 'hostname' and 'port'

Replace foo[0] with foo.hostname, and foo[1] with foo.port to make code
more readable.

Change-Id: Ie2b5f9cf89e7ccbbcf0a2573dab6f6c5d14c018b
2016-08-30 16:12:29 -07:00
Jenkins
87b2b82478 Merge "Use new agent API if available" 2016-08-07 02:44:48 +00:00
Dmitry Tantsur
09265ba4b5 Use new agent API if available
Falls back to vendor passthru on receiving 404.

Also fixes logging around lookup: log traceback on unexpected
exceptions, log successful lookup and replace % with ,

Change-Id: I7160c99ca63585fc333482fa578fdf5e0962b2b6
Depends-On: I9080c07b03103cd7a323e2fc01be821733b07eea
Partial-Bug: #1570841
2016-08-05 12:02:41 +02:00