41 Commits

Author SHA1 Message Date
Julia Kreger
beb7484858 Guard shared device/cluster filesystems
Certain filesystems are sometimes used in specialty computing
environments where a shared storage infrastructure or fabric exists.
These filesystems allow for multi-host shared concurrent read/write
access to the underlying block device by *not* locking the entire
device for exclusive use. Generally ranges of the disk are reserved
for each interacting node to write to, and locking schemes are used
to prevent collissions.

These filesystems are common for use cases where high availability
is required or ability for individual computers to collaborate on a
given workload is critical, such as a group of hypervisors supporting
virtual machines because it can allow for nearly seamless transfer
of workload from one machine to another.

Similar technologies are also used for cluster quorum and cluster
durable state sharing, however that is not specifically considered
in scope.

Where things get difficult is becuase the entire device is not
exclusively locked with the storage fabrics, and in some cases locking
is handled by a Distributed Lock Manager on the network, or via special
sector interactions amongst the cluster members which understand
and support the filesystem.

As a reult of this IO/Interaction model, an Ironic-Python-Agent
performing cleaning can effectively destroy the cluster just by
attempting to clean storage which it percieves as attached locally.
This is not IPA's fault, often this case occurs when a Storage
Administrator forgot to update LUN masking or volume settings on
a SAN as it relates to an individual host in the overall
computing environment. The net result of one node cleaning the
shared volume may include restoration from snapshot, backup
storage, or may ultimately cause permenant data loss, depending
on the environment and the usage of that environment.

Included in this patch:
- IBM GPFS - Can be used on a shared block device... apparently according
             to IBM's documentation. The standard use of GPFS is more Ceph
             like in design... however GPFS is also a specially licensed
             commercial offering, so it is a red flag if this is
             encountered, and should be investigated by the environment's
             systems operator.
- Red Hat GFS2 - Is used with shared common block devices in clusters.
- VMware VMFS - Is used with shared SAN block devices, as well as
                local block devices. With shared block devices,
                ranges of the disk are locked instead of the whole
                disk, and the ranges are mapped to virtual machine
                disk interfaces.
                It is unknown, due to lack of information, if this
                will detect and prevent erasure of VMFS logical
                extent volumes.

Co-Authored-by: Jay Faulkner <jay@jvf.cc>
Change-Id: Ic8cade008577516e696893fdbdabf70999c06a5b
Story: 2009978
Task: 44985
2022-07-19 13:24:03 -07:00
Jay Faulkner
de726d4acf Do not permit IPA standalone to be enabled by conf
IPA standalone mode is a developer-only option, and if enabled
accidentally on a production agent could cause undesired behavior.

Developers who need this behavior should build a purpose-built agent,
with standalone hardcoded to True in cmd/agent.py.

Change-Id: Icc67dbe15acbbf6fee886f274d2169a0769a5053
2021-03-25 12:45:28 +01:00
Dmitry Tantsur
59cb08fd28 New deploy step for injecting arbitrary files
This change adds a deploy step inject_files that adds a flexible
way to inject files into the instance.

Change-Id: I0e70a2cbc13744195c9493a48662e465ec010dbe
Story: #2008611
Task: #41794
2021-02-16 16:56:52 +01:00
Kaifeng Wang
6072e2d65a Remove lldp-timeout support
The kernel parameter lldp-timeout was deprecated removed in this patch.

Change-Id: I98da49e61d9ed3236cc495d1ab351eba0931473b
2021-01-15 16:13:52 +08:00
Zuul
94b0e97e8b Merge "Generate TLS certificates with validity time in the past" 2020-12-15 20:08:09 +00:00
Dmitry Tantsur
557293ca6a Generate TLS certificates with validity time in the past
Otherwise a slight clock skew may prevent them from working, see
e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1906448.

Change-Id: Icea103af06edef16c0dc4578877dc04cd6ec3b0c
2020-12-10 16:22:13 +01:00
Julia Kreger
7a83773fbc Option to enable bootloader config failure bypass
Some hardware is very well intentioned. However this intention
can result in the UEFI NVRAM table being full which prevents us
from adding new records to the table. We can't be sure what to
delete, so in this case some operators just need the ability to
tell ironic "it is okay if this fails, it will still work."

The added ``ignore_bootloader_failure`` option adds
this capability which can be set per-node either in the agent
configuation via the ramdisk image, or in the pxe_append_params
configuration parameter for the node itself with a
``ipa-ignore-bootloader-failure`` option in order to prevent
the failure from being raised.

Change-Id: If3c83fb2ea2025fce092d495a64f32077c70d2d6
Story: 2008386
Task: 41309
2020-12-10 06:42:48 -08:00
Bob Fournier
6e3f28d720 Bring up VLAN interfaces and include in introspection report
Add the ability to bring up VLAN interfaces and include them in the
introspection report.  A new configuration field is added -
``ipa-enable-vlan-interfaces``, which defines either the VLAN interface
to enable, the interface to use, or 'all' - which indicates all
interfaces.  If the particular VLAN is not provided, IPA will
use the lldp info for the interface to determine which VLANs should
be enabled.

Change-Id: Icb4f66a02b298b4d165ebb58134cd31029e535cc
Story: 2008298
Task: 41183
2020-11-20 10:17:00 -05:00
Julia Kreger
066a96a926 Follow-up to API version setting
Follow-up on Ib96a1057792f45f2e4554671e32c436140463ee8 to
improve some of the wording and review feedback by
Dmitry Tantsur.

Change-Id: Id77b0d72f3d78e5befd05fbdb6b21bc780f4ddfe
2020-10-30 08:28:54 -07:00
Jay Faulkner
80575566b1 Allow manual setting of Ironic API Version
Typically, the Ironic API client in IPA will autodetect the API version
based on the output of a GET of the root of the API. If for some reason
this API endpoint is restricted, or the operator wishes to limit the
Ironic API version IPA uses, they can now set CONF.ironic_api_version to
avoid autodetection and force a version.

Change-Id: Ib96a1057792f45f2e4554671e32c436140463ee8
2020-10-23 15:38:42 +00:00
Dmitry Tantsur
021e0a6a46 Generate a TLS certificate and send it to ironic
Adds a new flag (on by default) that enables generating a TLS
certificate and sending it to ironic via heartbeat. Whether
ironic supports auto-generated certificates is determined by
checking its API version.

Change-Id: I01f83dd04cfec2adc9e2a6b9c531391773ed36e5
Depends-On: https://review.opendev.org/747136
Depends-On: https://review.opendev.org/749975
Story: #2007214
Task: #40604
2020-09-11 17:46:52 +02:00
Jay Faulkner
1d11f0b7dd If listen_tls is true, enable TLS on wsgi server
This change enables operators to set [DEFAULT]listen_tls to
true configure IPA to be host its WSGI server over TLS using
existing SSL support in oslo.service.

In addition to configuring this in IPA, a deployer will need to
also set [ssl]cert_file, [ssl]key_file, and optionally
[ssl]ca_file in their ipa config, in addition to embedding those
files into the IPA ramdisk in order for this to be functional.

In order to make this change work, we also need to monkey patch
socket library early, or else oslo.service will end up passing an
unpatched socket to the eventlet wsgi server, which causes
deadlocks.

Change-Id: Ib7decae410915f3c27b045ee08538c94d455b030
2020-09-02 16:07:42 -07:00
Dmitry Tantsur
d50ff06b6b Enable the logs collection by default
It's incredibly helpful when debugging and most of consumers seem
to enable and rely on it.

Change-Id: I33bf58b3eb16b63b70f2a23e8a04449dc88fd94c
2020-08-19 17:25:24 +02:00
Vladyslav Drok
ba6ca246f5 Add possibility to pass global request ID
It can be done via ipa-global-request-id kernel commandline parameter.

Story: 2007681
Task: 39792
Change-Id: I6f544327d310c976a1625cfb411947591867882a
2020-08-12 15:21:08 +03:00
Dmitry Tantsur
353d09c3b0 Support changing the protocol part of callback_url to https
Adds a new kernel parameter for manual configuration and also creates
foundation for automatic TLS support later.

Change-Id: If341c3a8a268fc8cab6bd6be04b12ca32b31c8d8
Story: #2007214
Task: #40619
2020-08-06 15:14:31 +02:00
Zuul
bfb395837d Merge "Adds poll mode deployment support" 2020-07-22 19:53:31 +00:00
Julia Kreger
c77a7df851 Extend retries to 9, 10 seconds apart.
The download retry interval was previously five seconds which is
not long enough to recover after a hard network connectivity break
where we may be reliant upon network port forwarding hold-down
timers or even routing protocol route propogation to recover
communication.

Previously the time value was 5 seconds, with 3 attempts, meaning
15 seconds total ignoring the error detection timeouts.

Now it is 10 seconds, with 10 attempts, meaning 100 seconds before
the error detection timeouts.

Change-Id: I6d11edc9a3156f2bdc21c3d432ecc7625d652699
2020-06-23 20:27:49 +00:00
Kaifeng Wang
61c95554ff Adds poll mode deployment support
Adds a new poll extension to provide get_hardware_info and get_node_info
interfaces.

get_hardware_info will be used for node validation by ironic deploy
drivers.

get_node_info will be used for sending lookup data to IPA.

standalone mode is assumed as debug only, but it's not the case
considering the poll mode will be introduced, slightly updates the
description, also prevents the mdns lookup when standalone is true.

Story: 1526486
Task: 28724

Change-Id: I5ad772a18cc4584585c5a7b6fb127547cece1998
2020-06-21 16:44:00 +08:00
Dmitry Tantsur
8adb7e1a04 Add timeout and retries when connection to an image server
If the server is stuck for any reason, the download will hang for
a potentially long time. Provide a timeout (defaults to 60 seconds)
and 2 retries on failure.

Change-Id: Ie53519266edd914fdbfa82fe52b4a55151e5ec5f
2020-04-24 10:34:40 +02:00
Julia Kreger
af5f05a0ee Agent token support
Adds support to the agent to receive, store, and return
that token to ironic's API, when supported.

This feature allows ironic and ultimately the agent to
authenticate interactions, when supported, to prevent
malicious abuse of the API endpoint.

Sem-Ver: feature
Change-Id: I6db9117a38be946b785e6f5e75ada1bfdff560ba
2020-03-12 10:35:17 -07:00
Julia Kreger
cee4bfc4bc Add NTP time sync
Attempt to sync the clock and save it to the hardware clock.

This feature supports use of chrony or ntpdate.

Sem-Ver: feature
Change-Id: I178d7614429d582e742d9cba6d0fa3ae099775e3
Story: 1619054
Task: 11591
2020-03-07 09:16:19 -08:00
Kaifeng Wang
4097847a10 Clean up options deprecated prehistory
Configuration options like api-url had been deprecated long time ago[1],
this patch removes it.

[1] https://review.opendev.org/#/c/131632/

Change-Id: Ie448b35a4423066ef44dca7616e716cb5c118881
2019-11-19 08:46:21 +08:00
Derek Higgins
c4bb694082 Bump up ipa-ip-lookup-attempts to 6
To accommodate network setup that takes longer then
30 seconds increase the number of IP lookup attempts. This
allows for IPv6 setup that occurs after DHCP has time out.

Change-Id: I1351e150a63c6247210ca0cbc8ce0abfe82129cd
2019-11-12 03:38:38 +00:00
Julia Kreger
696606f682 manual introspection trigger command
Change-Id: I64e66682c1e54f6edc260a22f46f5f6df8e85af1
Story: 2005896
Task: 33756
2019-07-08 07:43:40 -07:00
Dmitry Tantsur
5c5328ccaa Supports fetching API endpoints from mDNS
This change enables IPA to receive API endpoints and configuration
via multicast DNS.

Story: #2005393
Task: #30382
Change-Id: Ibbf07052bea8f5c0305dda098b2879bcbc2fece5
2019-05-29 16:58:24 +02:00
Bill Dodd
3c30088c1e Add min/max values to integer config options
None of the existing ironic-python-agent integer config options included
min or max values. Added appropriate min/max values for the integer
config options.

Two of the integer options are for ports (listen_port and
advertise_port). These were changed to use the more appropriate
oslo_config cfg.PortOpt instead of cfg.IntOpt. PortOpt has the proper
min and max values built in.

Change-Id: I98709a45d099aea62c9973beb6817591cb445a9c
Story: 1731950
2018-05-23 12:08:42 -05:00
Julia Kreger
3164053f08 Fix gate and bump CoreOS version to latest stable.
Increases the amount of ram for CoreOS IPA to 2GB
as the base CoreOS image is now 310MB.

Bumped CPU count for CoreOS runs to 2 CPUs as the
concurrency helps boot times for the CoreOS ramdisk.

Adds netbase, udev, and open-iscsi to debian jessie container
as they are no longer present in the default container.

Explicitly set path variable for execution in the debian
container as udevadm is in /sbin, and we may not have
/sbin on the path that is passed through to the
chroot.

Also fixed new pep8 test failures.

Story: #1600228
Task: #16287
Change-Id: I488445dfd261b7bca322a0be7b4d8ca6105750a3
2018-05-10 15:50:05 -07:00
Lucas Alvares Gomes
3189c16a5e Fix waiting for target disk to appear
This patch is changing the _wait_for_disks() method behavior to wait to
a specific disk if any device hints is specified. There are cases where
the deployment might fail or succeed randomly depending on the order and
time that the disks shows up.

If no root device hints is specified, the method will just wait for any
suitable disk to show up, like before.

The _wait_for_disks call was made into a proper hardware manager method.
It is now also called each time the cached node is updated, not only
on start up. This is to ensure that we wait for the device, matching
root device hints (which are part of the node).

The loop was corrected to avoid redundant sleeps and warnings.

Finally, this patch adds more logging around detecting the root device.

Co-Authored-By: Dmitry Tantsur <dtantsur@redhat.com>
Change-Id: I10ca70d6a390ed802505c0d10d440dfb52beb56c
Closes-Bug: #1670916
2017-10-16 15:39:25 +02:00
Jenkins
fd7f10b993 Merge "Configure and use SSL-related requests options" 2017-02-07 09:57:49 +00:00
Derek Higgins
b4e41e2dd2 Agent: Listen for connections on both IPv4 and IPv6 ports
Allow connections if deploying over a IPv6 network.

Change-Id: Ied2f6be4aa4d1a70524df1df3506e596f6926e5b
Closes-Bug: #1650539
2017-01-19 15:24:11 +00:00
Pavlo Shchelokovskyy
fdd11b54a5 Configure and use SSL-related requests options
This patch adds standard SSL options to IPA config and makes use of them
when making HTTP requests.

For now, a single set of certificates is used when needed.
In the future configuration can be expanded to allow per-service
certificates.

Besides, the 'insecure' option (defaults to False) can be overridden
through kernel command line parameter 'ipa-insecure'.
This will allow running IPA in CI-like environments with self-signed SSL
certificates.

Change-Id: I259d9b3caa9ba1dc3d7382f375b8e086a5348d80
Closes-Bug: #1642515
2017-01-13 11:33:44 +02:00
Joanna Taryma
83a19a4844 Fail IPA startup if no protocol prefix in ironic api address
Add regex validation of api_url specified in configuration file.
Oslo config will raise exception if no supported protocol prefix
is included in Ironic api address in configuration file.
Supported protocols are http and https.

Closes-Bug: #1630785
Change-Id: I437b4ea0a2995921ddede03bc670087fdbbc8b83
2016-12-23 16:13:25 +01:00
Jenkins
4cf29db7e2 Merge "Use oslo-config-generator for sample config" 2016-12-16 19:42:46 +00:00
Pavlo Shchelokovskyy
762f3bf4e6 Use oslo-config-generator for sample config
The old generate_sample.sh is broken already as it refers to
non-existing openstack/common path.

Let's use oslo-config-generator as many other OpenStack projects do.

Also, where applicable, option descriptions are updated with the
corresponding kernel parameters to set those options durig pxe boot.

Change-Id: Id4a0df30ea573d52f3b359f357fe8f4a29751939
2016-12-09 21:01:02 +02:00
Yufei
dd9253f1b6 Skip API related work if no api url configured
Currently, if IPA is booted without an ironic api url, it will default
to localhost and fail to connect. Instead, we now explicitly fail and
print a log message if no api callback url is provided.

Change-Id: I0271be94ba7febc6abd5bf3343f6fa179bc1a6a4
Closes-Bug: #1643966
2016-12-07 17:04:05 +08:00
Pavlo Shchelokovskyy
b033bfd933 Remove old lookup/heartbeat from IPA
Lookup/Heartbeat via vendor passthru was deprecated in Newton.

This patch removes the corresponding functionality from IPA,
and also removes handling of 'ipa-driver-name' kernel parameter,
as it was only used in code related to old passthru.

Change-Id: I2c7989063ab3e4c0bae33f05d6d2ed857a2d9944
Closes-Bug: #1640533
2016-11-09 16:34:44 +00:00
Sam Betts
a7f0af722f Support LLDP data as part of interfaces in inventory
To support multi-tenant networking in Ironic we need to be able to
discover not just the NICs a baremetal machine has but also the physical
connectivity to switches in the network.

This patch collects LLDP (Link Layer Discovery Protocol) data as part of
the list interfaces stage of the generic hardware manager. This
information can then be processed by the ironic inspector to populate
the local link information on each ironic port.

The processing done on this data in ironic python agent is limited, this
is to allow for server side processing hooks to process as much or as
little of the data as they want. This is to allow for multi-vendor
environments that might use different parts of the LLDP packet to use a
generic ramdisk and configure the processing server side using inspector
plugins.

Reserved fields switch_port_descr and switch_chassis_descr have been
deprecated for removal in Ocata in favor of passing the whole packet.

Change-Id: Idae9b1ede1797029da1bd521501b121957ca1f1a
Partial-Bug: #1526403
2016-06-22 18:26:04 +01:00
Jenkins
99a053f654 Merge "Add configuration options for DISK_WAIT" 2016-06-22 02:29:46 +00:00
Yosef Hoffman
13a8c6321e Add configuration options for DISK_WAIT
https://review.openstack.org/#/c/320295/ introduced two internal
variables: _DISK_WAIT_ATTEMPTS and _DISK_WAIT_DELAY. These values are
hardcoded. This patch adds configuration options for these so
that an operator can change them based on their own needs/fleet of
hardware.

Change-Id: I2ba97669ec710fb4a435307466cd8add9c2293ba
Closes-Bug: #1585663
2016-06-20 18:47:26 -04:00
Yosef Hoffman
90c15e10cb lldp-timeout kernel parameter missing ipa- prefix
Every other Ironic python agent kernel parameter is prefixed with "ipa-".
This patch allows users to use the old "lldp-timeout" parameter or the new
"ipa-lldp-timeout" parameter. Warning message is logged if "lldp-timeout"
parameter is used.

(Also fixed typo while I'm at it.)

Change-Id: Icc05ead31506628e4926be6549916a19cad48db3
Closes-Bug: #1588325
2016-06-03 12:17:55 -04:00
Sam Betts
95e1e4e35a Consolidate IPA configuration into a config module
This patch moves the IPA oslo configs out of the agent cmd into their
own module so that it is safe to import them from other places in the
application without causing circular imports.

Change-Id: I100792bd0d1f369763afaa6f93e144e9967c3048
2016-05-31 15:24:23 +01:00