Commit Graph

54 Commits

Author SHA1 Message Date
Dmitry Tantsur
f824930bbd
Import disk_{utils,partitioner} from ironic-lib
With the iscsi deploy long gone, these modules are only used in IPA and
in fact represent a large part of its critical logic. Having them
separately sometimes makes fixing issues tricky if an interface of
a function needs changing.

This change imports the code mostly as it is, just removing run_as_root and
a deprecated function, as well as moving configuration options to config.py.

Also migrates one relevant function from ironic_lib.utils.

Change-Id: If8fae8210d85c61abb85c388b300e40a75d0531c
2024-03-15 18:45:04 +01:00
Dmitry Tantsur
6cd36a750f
Make inspection URL optional if the collectors are provided
With the new in-band inspection, we can derive the callback URL from
the Ironic URL, there is no need to duplicate it. This change uses
the presence of collectors as a sign to run inspection.

The previous approach of setting an inspection URL, with or without
explicitly setting collectors, still works for compatibility with
ironic-inspector.

Change-Id: Ie4279ee6d2995c9686f1dcdef1d6e5dc1dd20871
2024-01-10 08:55:42 +01:00
Dmitry Tantsur
0d4ae976c2
Support several API and Inspector URLs
Allows nodes with a single IP stack to be deployed from a dual-stack
Ironic.

Detecting advertised address and usable Ironic URLs are done completely
independently which does open some space for a misconfiguration. I hope
it's not likely in the reality, especially since this feature is
targetting advanced standalone users.

Change-Id: Ifa506c58caebe00b37167d329b81c166cdb323f2
Closes-Bug: #2045548
2024-01-09 16:43:23 +01:00
Zuul
b42f0be422 Merge "implement basic-auth support for user-image download process" 2023-10-13 17:08:28 +00:00
Adam Rozman
70961789a6 implement basic-auth support for user-image download process
This feature was proposed in https://bugs.launchpad.net/ironic-python-agent/+bug/2021947

Change-Id: I9dbfc1402240beb75b6736214753fd86dccae676
2023-10-10 16:25:51 +03:00
Zuul
23c8427224 Merge "Extend the lookup timeout to 600 seconds" 2023-09-22 12:35:00 +00:00
Julia Kreger
4efcce5310 Extend the lookup timeout to 600 seconds
Changes the default lookup timeout to be 600 seconds which
reduces the risk of lookup failing as a write operation
to the backing database is performed upon lookup thanks to
generation of an agent token.

Overall, this is fairly harmless since by default ramdisks
restart the agent if they were not able to successfully
start.

Change-Id: I35c64c0b4f9b3b607df1bc0c4c2a852aa3595cbd
2023-08-24 08:29:07 -07:00
Julia Kreger
b6c263a5dc preserve/handle config drives on 4k block devices
When an underlying block device (or driver) only supports 4KB IO,
this can cause some issues with aspects like using an ISO9660 filesystem
which can only support a maximum of 2KB IO.

The agent will now attempt to mount the filesystem *before* deleting the
supplied file, and should that fail it will mount the configuration drive
file from the ramdisk utilizing a loopback, and then extract the contents
of the ramdisk into a newly created VFAT filesystem which supports 4KB
block IO.

Closes-Bug: #2028002
Change-Id: I336acb8e8eb5a02dde2f5e24c258e23797d200ee
2023-08-24 08:10:22 -07:00
Julia Kreger
78c1343a54 Fix Bandit errors
Bandit 1.7.5 released with a timeout check for all requests and
urllib calls.

Fixed those.

In the process, then exposed a bandit b310 issue, which was already
covered by the code, but explicitly marked it as such.

Also, enables bandit checks to be voting for CI..

Change-Id: If0e87790191f5f3648366d571e1d85dd7393a548
2023-06-06 08:34:55 -07:00
Dmitry Tantsur
c1c5537ba2 Revert disabling MD5 checksums
This was a significant breaking change that was landed despite explicit
disagreement by some community members (myself included). It has already
resulted in an accidental Ironic CI breakage, has broken Bifrost and has
a potential of breaking Metal3. In case of Metal3, MD5 support is a part
of its public API.

While MD5 is a potential security hazard, I don't see the need to hurry
this change without giving the community time to prepare. This change
reverts the new option md5_enabled to True.

Change-Id: I32b291ea162e8eb22429712c15cb5b225a6daafd
2023-05-04 09:26:10 +02:00
Zuul
f37ea85a27 Merge "Disable MD5 image checksums" 2023-05-02 06:41:25 +00:00
Dmitry Tantsur
3e05a03f7c Deprecate LLDP in inventory in favour of a new collector
Binary LLDP data is bloating inventory causing us to disable its collection
by default. For other similar low-level information, such as PCI devices
or DMI data, we already use inspection collectors instead. Now that the
inventory format is shared with out-of-band inspection, having LLDP
there makes even less sense.

This change adds a new collector ``lldp`` to replace the now-deprecated
inventory field.

Change-Id: I56be06a7d1db28407e1128c198c12bea0809d3a3
2023-04-26 19:33:51 +00:00
Julia Kreger
32df26a22a Disable MD5 image checksums
MD5 image checksums have long been supersceeded by the use of a
``os_hash_algo`` and ``os_hash_value`` field as part of the
properties of an image.

In the process of doing this, we determined that checksum via
URL usage was non-trivial and determined that an appropriate
path was to allow the checksum type to be determined as needed.

Change-Id: I26ba8f8c37d663096f558e83028ff463d31bd4e6
2023-04-24 16:54:42 -07:00
Julia Kreger
beb7484858 Guard shared device/cluster filesystems
Certain filesystems are sometimes used in specialty computing
environments where a shared storage infrastructure or fabric exists.
These filesystems allow for multi-host shared concurrent read/write
access to the underlying block device by *not* locking the entire
device for exclusive use. Generally ranges of the disk are reserved
for each interacting node to write to, and locking schemes are used
to prevent collissions.

These filesystems are common for use cases where high availability
is required or ability for individual computers to collaborate on a
given workload is critical, such as a group of hypervisors supporting
virtual machines because it can allow for nearly seamless transfer
of workload from one machine to another.

Similar technologies are also used for cluster quorum and cluster
durable state sharing, however that is not specifically considered
in scope.

Where things get difficult is becuase the entire device is not
exclusively locked with the storage fabrics, and in some cases locking
is handled by a Distributed Lock Manager on the network, or via special
sector interactions amongst the cluster members which understand
and support the filesystem.

As a reult of this IO/Interaction model, an Ironic-Python-Agent
performing cleaning can effectively destroy the cluster just by
attempting to clean storage which it percieves as attached locally.
This is not IPA's fault, often this case occurs when a Storage
Administrator forgot to update LUN masking or volume settings on
a SAN as it relates to an individual host in the overall
computing environment. The net result of one node cleaning the
shared volume may include restoration from snapshot, backup
storage, or may ultimately cause permenant data loss, depending
on the environment and the usage of that environment.

Included in this patch:
- IBM GPFS - Can be used on a shared block device... apparently according
             to IBM's documentation. The standard use of GPFS is more Ceph
             like in design... however GPFS is also a specially licensed
             commercial offering, so it is a red flag if this is
             encountered, and should be investigated by the environment's
             systems operator.
- Red Hat GFS2 - Is used with shared common block devices in clusters.
- VMware VMFS - Is used with shared SAN block devices, as well as
                local block devices. With shared block devices,
                ranges of the disk are locked instead of the whole
                disk, and the ranges are mapped to virtual machine
                disk interfaces.
                It is unknown, due to lack of information, if this
                will detect and prevent erasure of VMFS logical
                extent volumes.

Co-Authored-by: Jay Faulkner <jay@jvf.cc>
Change-Id: Ic8cade008577516e696893fdbdabf70999c06a5b
Story: 2009978
Task: 44985
2022-07-19 13:24:03 -07:00
Jay Faulkner
de726d4acf Do not permit IPA standalone to be enabled by conf
IPA standalone mode is a developer-only option, and if enabled
accidentally on a production agent could cause undesired behavior.

Developers who need this behavior should build a purpose-built agent,
with standalone hardcoded to True in cmd/agent.py.

Change-Id: Icc67dbe15acbbf6fee886f274d2169a0769a5053
2021-03-25 12:45:28 +01:00
Dmitry Tantsur
59cb08fd28 New deploy step for injecting arbitrary files
This change adds a deploy step inject_files that adds a flexible
way to inject files into the instance.

Change-Id: I0e70a2cbc13744195c9493a48662e465ec010dbe
Story: #2008611
Task: #41794
2021-02-16 16:56:52 +01:00
Kaifeng Wang
6072e2d65a Remove lldp-timeout support
The kernel parameter lldp-timeout was deprecated removed in this patch.

Change-Id: I98da49e61d9ed3236cc495d1ab351eba0931473b
2021-01-15 16:13:52 +08:00
Zuul
94b0e97e8b Merge "Generate TLS certificates with validity time in the past" 2020-12-15 20:08:09 +00:00
Dmitry Tantsur
557293ca6a Generate TLS certificates with validity time in the past
Otherwise a slight clock skew may prevent them from working, see
e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1906448.

Change-Id: Icea103af06edef16c0dc4578877dc04cd6ec3b0c
2020-12-10 16:22:13 +01:00
Julia Kreger
7a83773fbc Option to enable bootloader config failure bypass
Some hardware is very well intentioned. However this intention
can result in the UEFI NVRAM table being full which prevents us
from adding new records to the table. We can't be sure what to
delete, so in this case some operators just need the ability to
tell ironic "it is okay if this fails, it will still work."

The added ``ignore_bootloader_failure`` option adds
this capability which can be set per-node either in the agent
configuation via the ramdisk image, or in the pxe_append_params
configuration parameter for the node itself with a
``ipa-ignore-bootloader-failure`` option in order to prevent
the failure from being raised.

Change-Id: If3c83fb2ea2025fce092d495a64f32077c70d2d6
Story: 2008386
Task: 41309
2020-12-10 06:42:48 -08:00
Bob Fournier
6e3f28d720 Bring up VLAN interfaces and include in introspection report
Add the ability to bring up VLAN interfaces and include them in the
introspection report.  A new configuration field is added -
``ipa-enable-vlan-interfaces``, which defines either the VLAN interface
to enable, the interface to use, or 'all' - which indicates all
interfaces.  If the particular VLAN is not provided, IPA will
use the lldp info for the interface to determine which VLANs should
be enabled.

Change-Id: Icb4f66a02b298b4d165ebb58134cd31029e535cc
Story: 2008298
Task: 41183
2020-11-20 10:17:00 -05:00
Julia Kreger
066a96a926 Follow-up to API version setting
Follow-up on Ib96a1057792f45f2e4554671e32c436140463ee8 to
improve some of the wording and review feedback by
Dmitry Tantsur.

Change-Id: Id77b0d72f3d78e5befd05fbdb6b21bc780f4ddfe
2020-10-30 08:28:54 -07:00
Jay Faulkner
80575566b1 Allow manual setting of Ironic API Version
Typically, the Ironic API client in IPA will autodetect the API version
based on the output of a GET of the root of the API. If for some reason
this API endpoint is restricted, or the operator wishes to limit the
Ironic API version IPA uses, they can now set CONF.ironic_api_version to
avoid autodetection and force a version.

Change-Id: Ib96a1057792f45f2e4554671e32c436140463ee8
2020-10-23 15:38:42 +00:00
Dmitry Tantsur
021e0a6a46 Generate a TLS certificate and send it to ironic
Adds a new flag (on by default) that enables generating a TLS
certificate and sending it to ironic via heartbeat. Whether
ironic supports auto-generated certificates is determined by
checking its API version.

Change-Id: I01f83dd04cfec2adc9e2a6b9c531391773ed36e5
Depends-On: https://review.opendev.org/747136
Depends-On: https://review.opendev.org/749975
Story: #2007214
Task: #40604
2020-09-11 17:46:52 +02:00
Jay Faulkner
1d11f0b7dd If listen_tls is true, enable TLS on wsgi server
This change enables operators to set [DEFAULT]listen_tls to
true configure IPA to be host its WSGI server over TLS using
existing SSL support in oslo.service.

In addition to configuring this in IPA, a deployer will need to
also set [ssl]cert_file, [ssl]key_file, and optionally
[ssl]ca_file in their ipa config, in addition to embedding those
files into the IPA ramdisk in order for this to be functional.

In order to make this change work, we also need to monkey patch
socket library early, or else oslo.service will end up passing an
unpatched socket to the eventlet wsgi server, which causes
deadlocks.

Change-Id: Ib7decae410915f3c27b045ee08538c94d455b030
2020-09-02 16:07:42 -07:00
Dmitry Tantsur
d50ff06b6b Enable the logs collection by default
It's incredibly helpful when debugging and most of consumers seem
to enable and rely on it.

Change-Id: I33bf58b3eb16b63b70f2a23e8a04449dc88fd94c
2020-08-19 17:25:24 +02:00
Vladyslav Drok
ba6ca246f5 Add possibility to pass global request ID
It can be done via ipa-global-request-id kernel commandline parameter.

Story: 2007681
Task: 39792
Change-Id: I6f544327d310c976a1625cfb411947591867882a
2020-08-12 15:21:08 +03:00
Dmitry Tantsur
353d09c3b0 Support changing the protocol part of callback_url to https
Adds a new kernel parameter for manual configuration and also creates
foundation for automatic TLS support later.

Change-Id: If341c3a8a268fc8cab6bd6be04b12ca32b31c8d8
Story: #2007214
Task: #40619
2020-08-06 15:14:31 +02:00
Zuul
bfb395837d Merge "Adds poll mode deployment support" 2020-07-22 19:53:31 +00:00
Julia Kreger
c77a7df851 Extend retries to 9, 10 seconds apart.
The download retry interval was previously five seconds which is
not long enough to recover after a hard network connectivity break
where we may be reliant upon network port forwarding hold-down
timers or even routing protocol route propogation to recover
communication.

Previously the time value was 5 seconds, with 3 attempts, meaning
15 seconds total ignoring the error detection timeouts.

Now it is 10 seconds, with 10 attempts, meaning 100 seconds before
the error detection timeouts.

Change-Id: I6d11edc9a3156f2bdc21c3d432ecc7625d652699
2020-06-23 20:27:49 +00:00
Kaifeng Wang
61c95554ff Adds poll mode deployment support
Adds a new poll extension to provide get_hardware_info and get_node_info
interfaces.

get_hardware_info will be used for node validation by ironic deploy
drivers.

get_node_info will be used for sending lookup data to IPA.

standalone mode is assumed as debug only, but it's not the case
considering the poll mode will be introduced, slightly updates the
description, also prevents the mdns lookup when standalone is true.

Story: 1526486
Task: 28724

Change-Id: I5ad772a18cc4584585c5a7b6fb127547cece1998
2020-06-21 16:44:00 +08:00
Dmitry Tantsur
8adb7e1a04 Add timeout and retries when connection to an image server
If the server is stuck for any reason, the download will hang for
a potentially long time. Provide a timeout (defaults to 60 seconds)
and 2 retries on failure.

Change-Id: Ie53519266edd914fdbfa82fe52b4a55151e5ec5f
2020-04-24 10:34:40 +02:00
Julia Kreger
af5f05a0ee Agent token support
Adds support to the agent to receive, store, and return
that token to ironic's API, when supported.

This feature allows ironic and ultimately the agent to
authenticate interactions, when supported, to prevent
malicious abuse of the API endpoint.

Sem-Ver: feature
Change-Id: I6db9117a38be946b785e6f5e75ada1bfdff560ba
2020-03-12 10:35:17 -07:00
Julia Kreger
cee4bfc4bc Add NTP time sync
Attempt to sync the clock and save it to the hardware clock.

This feature supports use of chrony or ntpdate.

Sem-Ver: feature
Change-Id: I178d7614429d582e742d9cba6d0fa3ae099775e3
Story: 1619054
Task: 11591
2020-03-07 09:16:19 -08:00
Kaifeng Wang
4097847a10 Clean up options deprecated prehistory
Configuration options like api-url had been deprecated long time ago[1],
this patch removes it.

[1] https://review.opendev.org/#/c/131632/

Change-Id: Ie448b35a4423066ef44dca7616e716cb5c118881
2019-11-19 08:46:21 +08:00
Derek Higgins
c4bb694082 Bump up ipa-ip-lookup-attempts to 6
To accommodate network setup that takes longer then
30 seconds increase the number of IP lookup attempts. This
allows for IPv6 setup that occurs after DHCP has time out.

Change-Id: I1351e150a63c6247210ca0cbc8ce0abfe82129cd
2019-11-12 03:38:38 +00:00
Julia Kreger
696606f682 manual introspection trigger command
Change-Id: I64e66682c1e54f6edc260a22f46f5f6df8e85af1
Story: 2005896
Task: 33756
2019-07-08 07:43:40 -07:00
Dmitry Tantsur
5c5328ccaa Supports fetching API endpoints from mDNS
This change enables IPA to receive API endpoints and configuration
via multicast DNS.

Story: #2005393
Task: #30382
Change-Id: Ibbf07052bea8f5c0305dda098b2879bcbc2fece5
2019-05-29 16:58:24 +02:00
Bill Dodd
3c30088c1e Add min/max values to integer config options
None of the existing ironic-python-agent integer config options included
min or max values. Added appropriate min/max values for the integer
config options.

Two of the integer options are for ports (listen_port and
advertise_port). These were changed to use the more appropriate
oslo_config cfg.PortOpt instead of cfg.IntOpt. PortOpt has the proper
min and max values built in.

Change-Id: I98709a45d099aea62c9973beb6817591cb445a9c
Story: 1731950
2018-05-23 12:08:42 -05:00
Julia Kreger
3164053f08 Fix gate and bump CoreOS version to latest stable.
Increases the amount of ram for CoreOS IPA to 2GB
as the base CoreOS image is now 310MB.

Bumped CPU count for CoreOS runs to 2 CPUs as the
concurrency helps boot times for the CoreOS ramdisk.

Adds netbase, udev, and open-iscsi to debian jessie container
as they are no longer present in the default container.

Explicitly set path variable for execution in the debian
container as udevadm is in /sbin, and we may not have
/sbin on the path that is passed through to the
chroot.

Also fixed new pep8 test failures.

Story: #1600228
Task: #16287
Change-Id: I488445dfd261b7bca322a0be7b4d8ca6105750a3
2018-05-10 15:50:05 -07:00
Lucas Alvares Gomes
3189c16a5e Fix waiting for target disk to appear
This patch is changing the _wait_for_disks() method behavior to wait to
a specific disk if any device hints is specified. There are cases where
the deployment might fail or succeed randomly depending on the order and
time that the disks shows up.

If no root device hints is specified, the method will just wait for any
suitable disk to show up, like before.

The _wait_for_disks call was made into a proper hardware manager method.
It is now also called each time the cached node is updated, not only
on start up. This is to ensure that we wait for the device, matching
root device hints (which are part of the node).

The loop was corrected to avoid redundant sleeps and warnings.

Finally, this patch adds more logging around detecting the root device.

Co-Authored-By: Dmitry Tantsur <dtantsur@redhat.com>
Change-Id: I10ca70d6a390ed802505c0d10d440dfb52beb56c
Closes-Bug: #1670916
2017-10-16 15:39:25 +02:00
Jenkins
fd7f10b993 Merge "Configure and use SSL-related requests options" 2017-02-07 09:57:49 +00:00
Derek Higgins
b4e41e2dd2 Agent: Listen for connections on both IPv4 and IPv6 ports
Allow connections if deploying over a IPv6 network.

Change-Id: Ied2f6be4aa4d1a70524df1df3506e596f6926e5b
Closes-Bug: #1650539
2017-01-19 15:24:11 +00:00
Pavlo Shchelokovskyy
fdd11b54a5 Configure and use SSL-related requests options
This patch adds standard SSL options to IPA config and makes use of them
when making HTTP requests.

For now, a single set of certificates is used when needed.
In the future configuration can be expanded to allow per-service
certificates.

Besides, the 'insecure' option (defaults to False) can be overridden
through kernel command line parameter 'ipa-insecure'.
This will allow running IPA in CI-like environments with self-signed SSL
certificates.

Change-Id: I259d9b3caa9ba1dc3d7382f375b8e086a5348d80
Closes-Bug: #1642515
2017-01-13 11:33:44 +02:00
Joanna Taryma
83a19a4844 Fail IPA startup if no protocol prefix in ironic api address
Add regex validation of api_url specified in configuration file.
Oslo config will raise exception if no supported protocol prefix
is included in Ironic api address in configuration file.
Supported protocols are http and https.

Closes-Bug: #1630785
Change-Id: I437b4ea0a2995921ddede03bc670087fdbbc8b83
2016-12-23 16:13:25 +01:00
Jenkins
4cf29db7e2 Merge "Use oslo-config-generator for sample config" 2016-12-16 19:42:46 +00:00
Pavlo Shchelokovskyy
762f3bf4e6 Use oslo-config-generator for sample config
The old generate_sample.sh is broken already as it refers to
non-existing openstack/common path.

Let's use oslo-config-generator as many other OpenStack projects do.

Also, where applicable, option descriptions are updated with the
corresponding kernel parameters to set those options durig pxe boot.

Change-Id: Id4a0df30ea573d52f3b359f357fe8f4a29751939
2016-12-09 21:01:02 +02:00
Yufei
dd9253f1b6 Skip API related work if no api url configured
Currently, if IPA is booted without an ironic api url, it will default
to localhost and fail to connect. Instead, we now explicitly fail and
print a log message if no api callback url is provided.

Change-Id: I0271be94ba7febc6abd5bf3343f6fa179bc1a6a4
Closes-Bug: #1643966
2016-12-07 17:04:05 +08:00
Pavlo Shchelokovskyy
b033bfd933 Remove old lookup/heartbeat from IPA
Lookup/Heartbeat via vendor passthru was deprecated in Newton.

This patch removes the corresponding functionality from IPA,
and also removes handling of 'ipa-driver-name' kernel parameter,
as it was only used in code related to old passthru.

Change-Id: I2c7989063ab3e4c0bae33f05d6d2ed857a2d9944
Closes-Bug: #1640533
2016-11-09 16:34:44 +00:00
Sam Betts
a7f0af722f Support LLDP data as part of interfaces in inventory
To support multi-tenant networking in Ironic we need to be able to
discover not just the NICs a baremetal machine has but also the physical
connectivity to switches in the network.

This patch collects LLDP (Link Layer Discovery Protocol) data as part of
the list interfaces stage of the generic hardware manager. This
information can then be processed by the ironic inspector to populate
the local link information on each ironic port.

The processing done on this data in ironic python agent is limited, this
is to allow for server side processing hooks to process as much or as
little of the data as they want. This is to allow for multi-vendor
environments that might use different parts of the LLDP packet to use a
generic ramdisk and configure the processing server side using inspector
plugins.

Reserved fields switch_port_descr and switch_chassis_descr have been
deprecated for removal in Ocata in favor of passing the whole packet.

Change-Id: Idae9b1ede1797029da1bd521501b121957ca1f1a
Partial-Bug: #1526403
2016-06-22 18:26:04 +01:00