With the iscsi deploy long gone, these modules are only used in IPA and
in fact represent a large part of its critical logic. Having them
separately sometimes makes fixing issues tricky if an interface of
a function needs changing.
This change imports the code mostly as it is, just removing run_as_root and
a deprecated function, as well as moving configuration options to config.py.
Also migrates one relevant function from ironic_lib.utils.
Change-Id: If8fae8210d85c61abb85c388b300e40a75d0531c
With the new in-band inspection, we can derive the callback URL from
the Ironic URL, there is no need to duplicate it. This change uses
the presence of collectors as a sign to run inspection.
The previous approach of setting an inspection URL, with or without
explicitly setting collectors, still works for compatibility with
ironic-inspector.
Change-Id: Ie4279ee6d2995c9686f1dcdef1d6e5dc1dd20871
Allows nodes with a single IP stack to be deployed from a dual-stack
Ironic.
Detecting advertised address and usable Ironic URLs are done completely
independently which does open some space for a misconfiguration. I hope
it's not likely in the reality, especially since this feature is
targetting advanced standalone users.
Change-Id: Ifa506c58caebe00b37167d329b81c166cdb323f2
Closes-Bug: #2045548
Changes the default lookup timeout to be 600 seconds which
reduces the risk of lookup failing as a write operation
to the backing database is performed upon lookup thanks to
generation of an agent token.
Overall, this is fairly harmless since by default ramdisks
restart the agent if they were not able to successfully
start.
Change-Id: I35c64c0b4f9b3b607df1bc0c4c2a852aa3595cbd
When an underlying block device (or driver) only supports 4KB IO,
this can cause some issues with aspects like using an ISO9660 filesystem
which can only support a maximum of 2KB IO.
The agent will now attempt to mount the filesystem *before* deleting the
supplied file, and should that fail it will mount the configuration drive
file from the ramdisk utilizing a loopback, and then extract the contents
of the ramdisk into a newly created VFAT filesystem which supports 4KB
block IO.
Closes-Bug: #2028002
Change-Id: I336acb8e8eb5a02dde2f5e24c258e23797d200ee
Bandit 1.7.5 released with a timeout check for all requests and
urllib calls.
Fixed those.
In the process, then exposed a bandit b310 issue, which was already
covered by the code, but explicitly marked it as such.
Also, enables bandit checks to be voting for CI..
Change-Id: If0e87790191f5f3648366d571e1d85dd7393a548
This was a significant breaking change that was landed despite explicit
disagreement by some community members (myself included). It has already
resulted in an accidental Ironic CI breakage, has broken Bifrost and has
a potential of breaking Metal3. In case of Metal3, MD5 support is a part
of its public API.
While MD5 is a potential security hazard, I don't see the need to hurry
this change without giving the community time to prepare. This change
reverts the new option md5_enabled to True.
Change-Id: I32b291ea162e8eb22429712c15cb5b225a6daafd
Binary LLDP data is bloating inventory causing us to disable its collection
by default. For other similar low-level information, such as PCI devices
or DMI data, we already use inspection collectors instead. Now that the
inventory format is shared with out-of-band inspection, having LLDP
there makes even less sense.
This change adds a new collector ``lldp`` to replace the now-deprecated
inventory field.
Change-Id: I56be06a7d1db28407e1128c198c12bea0809d3a3
MD5 image checksums have long been supersceeded by the use of a
``os_hash_algo`` and ``os_hash_value`` field as part of the
properties of an image.
In the process of doing this, we determined that checksum via
URL usage was non-trivial and determined that an appropriate
path was to allow the checksum type to be determined as needed.
Change-Id: I26ba8f8c37d663096f558e83028ff463d31bd4e6
Certain filesystems are sometimes used in specialty computing
environments where a shared storage infrastructure or fabric exists.
These filesystems allow for multi-host shared concurrent read/write
access to the underlying block device by *not* locking the entire
device for exclusive use. Generally ranges of the disk are reserved
for each interacting node to write to, and locking schemes are used
to prevent collissions.
These filesystems are common for use cases where high availability
is required or ability for individual computers to collaborate on a
given workload is critical, such as a group of hypervisors supporting
virtual machines because it can allow for nearly seamless transfer
of workload from one machine to another.
Similar technologies are also used for cluster quorum and cluster
durable state sharing, however that is not specifically considered
in scope.
Where things get difficult is becuase the entire device is not
exclusively locked with the storage fabrics, and in some cases locking
is handled by a Distributed Lock Manager on the network, or via special
sector interactions amongst the cluster members which understand
and support the filesystem.
As a reult of this IO/Interaction model, an Ironic-Python-Agent
performing cleaning can effectively destroy the cluster just by
attempting to clean storage which it percieves as attached locally.
This is not IPA's fault, often this case occurs when a Storage
Administrator forgot to update LUN masking or volume settings on
a SAN as it relates to an individual host in the overall
computing environment. The net result of one node cleaning the
shared volume may include restoration from snapshot, backup
storage, or may ultimately cause permenant data loss, depending
on the environment and the usage of that environment.
Included in this patch:
- IBM GPFS - Can be used on a shared block device... apparently according
to IBM's documentation. The standard use of GPFS is more Ceph
like in design... however GPFS is also a specially licensed
commercial offering, so it is a red flag if this is
encountered, and should be investigated by the environment's
systems operator.
- Red Hat GFS2 - Is used with shared common block devices in clusters.
- VMware VMFS - Is used with shared SAN block devices, as well as
local block devices. With shared block devices,
ranges of the disk are locked instead of the whole
disk, and the ranges are mapped to virtual machine
disk interfaces.
It is unknown, due to lack of information, if this
will detect and prevent erasure of VMFS logical
extent volumes.
Co-Authored-by: Jay Faulkner <jay@jvf.cc>
Change-Id: Ic8cade008577516e696893fdbdabf70999c06a5b
Story: 2009978
Task: 44985
IPA standalone mode is a developer-only option, and if enabled
accidentally on a production agent could cause undesired behavior.
Developers who need this behavior should build a purpose-built agent,
with standalone hardcoded to True in cmd/agent.py.
Change-Id: Icc67dbe15acbbf6fee886f274d2169a0769a5053
This change adds a deploy step inject_files that adds a flexible
way to inject files into the instance.
Change-Id: I0e70a2cbc13744195c9493a48662e465ec010dbe
Story: #2008611
Task: #41794
Some hardware is very well intentioned. However this intention
can result in the UEFI NVRAM table being full which prevents us
from adding new records to the table. We can't be sure what to
delete, so in this case some operators just need the ability to
tell ironic "it is okay if this fails, it will still work."
The added ``ignore_bootloader_failure`` option adds
this capability which can be set per-node either in the agent
configuation via the ramdisk image, or in the pxe_append_params
configuration parameter for the node itself with a
``ipa-ignore-bootloader-failure`` option in order to prevent
the failure from being raised.
Change-Id: If3c83fb2ea2025fce092d495a64f32077c70d2d6
Story: 2008386
Task: 41309
Add the ability to bring up VLAN interfaces and include them in the
introspection report. A new configuration field is added -
``ipa-enable-vlan-interfaces``, which defines either the VLAN interface
to enable, the interface to use, or 'all' - which indicates all
interfaces. If the particular VLAN is not provided, IPA will
use the lldp info for the interface to determine which VLANs should
be enabled.
Change-Id: Icb4f66a02b298b4d165ebb58134cd31029e535cc
Story: 2008298
Task: 41183
Follow-up on Ib96a1057792f45f2e4554671e32c436140463ee8 to
improve some of the wording and review feedback by
Dmitry Tantsur.
Change-Id: Id77b0d72f3d78e5befd05fbdb6b21bc780f4ddfe
Typically, the Ironic API client in IPA will autodetect the API version
based on the output of a GET of the root of the API. If for some reason
this API endpoint is restricted, or the operator wishes to limit the
Ironic API version IPA uses, they can now set CONF.ironic_api_version to
avoid autodetection and force a version.
Change-Id: Ib96a1057792f45f2e4554671e32c436140463ee8
Adds a new flag (on by default) that enables generating a TLS
certificate and sending it to ironic via heartbeat. Whether
ironic supports auto-generated certificates is determined by
checking its API version.
Change-Id: I01f83dd04cfec2adc9e2a6b9c531391773ed36e5
Depends-On: https://review.opendev.org/747136
Depends-On: https://review.opendev.org/749975
Story: #2007214
Task: #40604
This change enables operators to set [DEFAULT]listen_tls to
true configure IPA to be host its WSGI server over TLS using
existing SSL support in oslo.service.
In addition to configuring this in IPA, a deployer will need to
also set [ssl]cert_file, [ssl]key_file, and optionally
[ssl]ca_file in their ipa config, in addition to embedding those
files into the IPA ramdisk in order for this to be functional.
In order to make this change work, we also need to monkey patch
socket library early, or else oslo.service will end up passing an
unpatched socket to the eventlet wsgi server, which causes
deadlocks.
Change-Id: Ib7decae410915f3c27b045ee08538c94d455b030
Adds a new kernel parameter for manual configuration and also creates
foundation for automatic TLS support later.
Change-Id: If341c3a8a268fc8cab6bd6be04b12ca32b31c8d8
Story: #2007214
Task: #40619
The download retry interval was previously five seconds which is
not long enough to recover after a hard network connectivity break
where we may be reliant upon network port forwarding hold-down
timers or even routing protocol route propogation to recover
communication.
Previously the time value was 5 seconds, with 3 attempts, meaning
15 seconds total ignoring the error detection timeouts.
Now it is 10 seconds, with 10 attempts, meaning 100 seconds before
the error detection timeouts.
Change-Id: I6d11edc9a3156f2bdc21c3d432ecc7625d652699
Adds a new poll extension to provide get_hardware_info and get_node_info
interfaces.
get_hardware_info will be used for node validation by ironic deploy
drivers.
get_node_info will be used for sending lookup data to IPA.
standalone mode is assumed as debug only, but it's not the case
considering the poll mode will be introduced, slightly updates the
description, also prevents the mdns lookup when standalone is true.
Story: 1526486
Task: 28724
Change-Id: I5ad772a18cc4584585c5a7b6fb127547cece1998
If the server is stuck for any reason, the download will hang for
a potentially long time. Provide a timeout (defaults to 60 seconds)
and 2 retries on failure.
Change-Id: Ie53519266edd914fdbfa82fe52b4a55151e5ec5f
Adds support to the agent to receive, store, and return
that token to ironic's API, when supported.
This feature allows ironic and ultimately the agent to
authenticate interactions, when supported, to prevent
malicious abuse of the API endpoint.
Sem-Ver: feature
Change-Id: I6db9117a38be946b785e6f5e75ada1bfdff560ba
Attempt to sync the clock and save it to the hardware clock.
This feature supports use of chrony or ntpdate.
Sem-Ver: feature
Change-Id: I178d7614429d582e742d9cba6d0fa3ae099775e3
Story: 1619054
Task: 11591
Configuration options like api-url had been deprecated long time ago[1],
this patch removes it.
[1] https://review.opendev.org/#/c/131632/
Change-Id: Ie448b35a4423066ef44dca7616e716cb5c118881
To accommodate network setup that takes longer then
30 seconds increase the number of IP lookup attempts. This
allows for IPv6 setup that occurs after DHCP has time out.
Change-Id: I1351e150a63c6247210ca0cbc8ce0abfe82129cd
This change enables IPA to receive API endpoints and configuration
via multicast DNS.
Story: #2005393
Task: #30382
Change-Id: Ibbf07052bea8f5c0305dda098b2879bcbc2fece5
None of the existing ironic-python-agent integer config options included
min or max values. Added appropriate min/max values for the integer
config options.
Two of the integer options are for ports (listen_port and
advertise_port). These were changed to use the more appropriate
oslo_config cfg.PortOpt instead of cfg.IntOpt. PortOpt has the proper
min and max values built in.
Change-Id: I98709a45d099aea62c9973beb6817591cb445a9c
Story: 1731950
Increases the amount of ram for CoreOS IPA to 2GB
as the base CoreOS image is now 310MB.
Bumped CPU count for CoreOS runs to 2 CPUs as the
concurrency helps boot times for the CoreOS ramdisk.
Adds netbase, udev, and open-iscsi to debian jessie container
as they are no longer present in the default container.
Explicitly set path variable for execution in the debian
container as udevadm is in /sbin, and we may not have
/sbin on the path that is passed through to the
chroot.
Also fixed new pep8 test failures.
Story: #1600228
Task: #16287
Change-Id: I488445dfd261b7bca322a0be7b4d8ca6105750a3
This patch is changing the _wait_for_disks() method behavior to wait to
a specific disk if any device hints is specified. There are cases where
the deployment might fail or succeed randomly depending on the order and
time that the disks shows up.
If no root device hints is specified, the method will just wait for any
suitable disk to show up, like before.
The _wait_for_disks call was made into a proper hardware manager method.
It is now also called each time the cached node is updated, not only
on start up. This is to ensure that we wait for the device, matching
root device hints (which are part of the node).
The loop was corrected to avoid redundant sleeps and warnings.
Finally, this patch adds more logging around detecting the root device.
Co-Authored-By: Dmitry Tantsur <dtantsur@redhat.com>
Change-Id: I10ca70d6a390ed802505c0d10d440dfb52beb56c
Closes-Bug: #1670916
This patch adds standard SSL options to IPA config and makes use of them
when making HTTP requests.
For now, a single set of certificates is used when needed.
In the future configuration can be expanded to allow per-service
certificates.
Besides, the 'insecure' option (defaults to False) can be overridden
through kernel command line parameter 'ipa-insecure'.
This will allow running IPA in CI-like environments with self-signed SSL
certificates.
Change-Id: I259d9b3caa9ba1dc3d7382f375b8e086a5348d80
Closes-Bug: #1642515
Add regex validation of api_url specified in configuration file.
Oslo config will raise exception if no supported protocol prefix
is included in Ironic api address in configuration file.
Supported protocols are http and https.
Closes-Bug: #1630785
Change-Id: I437b4ea0a2995921ddede03bc670087fdbbc8b83
The old generate_sample.sh is broken already as it refers to
non-existing openstack/common path.
Let's use oslo-config-generator as many other OpenStack projects do.
Also, where applicable, option descriptions are updated with the
corresponding kernel parameters to set those options durig pxe boot.
Change-Id: Id4a0df30ea573d52f3b359f357fe8f4a29751939
Currently, if IPA is booted without an ironic api url, it will default
to localhost and fail to connect. Instead, we now explicitly fail and
print a log message if no api callback url is provided.
Change-Id: I0271be94ba7febc6abd5bf3343f6fa179bc1a6a4
Closes-Bug: #1643966
Lookup/Heartbeat via vendor passthru was deprecated in Newton.
This patch removes the corresponding functionality from IPA,
and also removes handling of 'ipa-driver-name' kernel parameter,
as it was only used in code related to old passthru.
Change-Id: I2c7989063ab3e4c0bae33f05d6d2ed857a2d9944
Closes-Bug: #1640533
To support multi-tenant networking in Ironic we need to be able to
discover not just the NICs a baremetal machine has but also the physical
connectivity to switches in the network.
This patch collects LLDP (Link Layer Discovery Protocol) data as part of
the list interfaces stage of the generic hardware manager. This
information can then be processed by the ironic inspector to populate
the local link information on each ironic port.
The processing done on this data in ironic python agent is limited, this
is to allow for server side processing hooks to process as much or as
little of the data as they want. This is to allow for multi-vendor
environments that might use different parts of the LLDP packet to use a
generic ramdisk and configure the processing server side using inspector
plugins.
Reserved fields switch_port_descr and switch_chassis_descr have been
deprecated for removal in Ocata in favor of passing the whole packet.
Change-Id: Idae9b1ede1797029da1bd521501b121957ca1f1a
Partial-Bug: #1526403