47 Commits

Author SHA1 Message Date
Jay Faulkner
be8ee50ea1 Inspect non-raw images for safety
When IPA gets a non-raw image, it performs an on-the-fly conversion
using qemu-img convert, as well as running qemu-img frequently to get
basic information about the image before validating it.

Now, we ensure that before any qemu-img calls are made, that we have
inspected the image for safety and pass through the detected format.

If given a disk_format=raw image and image streaming is enabled
(default), we retain the existing behavior of not inspecting it in
any way and streaming it bit-perfect to the device. In this case, we
never use qemu-based tools on the image at all.

If given a disk_format=raw image and image streaming is disabled, this
change fixes a bug where the image may have been converted if it was not
actually raw in the first place. We now stream these bit-perfect to the
device.

Adds two config options:
- [DEFAULT]/disable_deep_image_inspection, which can be set to "True" in
  order to disable all security features. Do not do this.
- [DEFAULT]/permitted_image_formats, default raw,qcow2, for image types
  IPA should accept.

Both of these configuration options are wired up to be set by the lookup
data returned by Ironic at lookup time.

This uses a image format inspection module imported from Nova; this
inspector will eventually live in oslo.utils, at which point we'll
migrate our usage of the inspector to it.

Closes-Bug: #2071740
Change-Id: I5254b80717cb5a7f9084e3eff32a00b968f987b7
2024-09-04 09:21:59 -07:00
Jay Faulkner
36e5993a04 [codespell] Fix spelling issues in IPA
This fixes several spelling issues identified by codepsell. In some
cases, I may have manually modified a line to make the output more clear
or to correct grammatical issues which were obvious in the codespell
output.

Later changes in this chain will provide the codespell config used to
generate this, as well as adding this commit's SHA, once landed, to a
.git-blame-ignore-revs file to ensure it will not pollute git historys
for modern clients.

Related-Bug: 2047654
Change-Id: I240cf8484865c9b748ceb51f3c7b9fd973cb5ada
2023-12-28 10:54:46 -08:00
Julia Kreger
eb95273ffb Add get_service_steps logic to the agent
Initial code patches for service steps have merged in
ironic, and it is now time to add support into the
agent which allows service steps to be raised to
the service.

Updates the default hardware manager version to 1.2,
which has *rarely* been incremented due to oversight.

Change-Id: Iabd2c6c551389ec3c24e94b71245b1250345f7a7
2023-08-31 06:22:22 -07:00
Julia Kreger
beb7484858 Guard shared device/cluster filesystems
Certain filesystems are sometimes used in specialty computing
environments where a shared storage infrastructure or fabric exists.
These filesystems allow for multi-host shared concurrent read/write
access to the underlying block device by *not* locking the entire
device for exclusive use. Generally ranges of the disk are reserved
for each interacting node to write to, and locking schemes are used
to prevent collissions.

These filesystems are common for use cases where high availability
is required or ability for individual computers to collaborate on a
given workload is critical, such as a group of hypervisors supporting
virtual machines because it can allow for nearly seamless transfer
of workload from one machine to another.

Similar technologies are also used for cluster quorum and cluster
durable state sharing, however that is not specifically considered
in scope.

Where things get difficult is becuase the entire device is not
exclusively locked with the storage fabrics, and in some cases locking
is handled by a Distributed Lock Manager on the network, or via special
sector interactions amongst the cluster members which understand
and support the filesystem.

As a reult of this IO/Interaction model, an Ironic-Python-Agent
performing cleaning can effectively destroy the cluster just by
attempting to clean storage which it percieves as attached locally.
This is not IPA's fault, often this case occurs when a Storage
Administrator forgot to update LUN masking or volume settings on
a SAN as it relates to an individual host in the overall
computing environment. The net result of one node cleaning the
shared volume may include restoration from snapshot, backup
storage, or may ultimately cause permenant data loss, depending
on the environment and the usage of that environment.

Included in this patch:
- IBM GPFS - Can be used on a shared block device... apparently according
             to IBM's documentation. The standard use of GPFS is more Ceph
             like in design... however GPFS is also a specially licensed
             commercial offering, so it is a red flag if this is
             encountered, and should be investigated by the environment's
             systems operator.
- Red Hat GFS2 - Is used with shared common block devices in clusters.
- VMware VMFS - Is used with shared SAN block devices, as well as
                local block devices. With shared block devices,
                ranges of the disk are locked instead of the whole
                disk, and the ranges are mapped to virtual machine
                disk interfaces.
                It is unknown, due to lack of information, if this
                will detect and prevent erasure of VMFS logical
                extent volumes.

Co-Authored-by: Jay Faulkner <jay@jvf.cc>
Change-Id: Ic8cade008577516e696893fdbdabf70999c06a5b
Story: 2009978
Task: 44985
2022-07-19 13:24:03 -07:00
Dmitry Tantsur
be3882162e Remove the iscsi extension
Change-Id: I2f0e581575112d6c7ba0d211661cab3e0b6caca6
2021-05-10 12:43:44 +02:00
Dmitry Tantsur
fe6b687968 When reporting that agent is busy, report the executed command
Also make this API return a proper HTTP code (409 instead of 500).

Change-Id: I5d86878b5ed6142ed2630adee78c0867c49b663f
2020-09-18 17:52:49 +02:00
Julia Kreger
f670f704f3 Clarify connection error on heartbeats
Heartbeat connection errors are often a sign of a transitory
network failures which may resolve themselves. But an operator
looking at the screen doesn't necessarilly know that.

They don't understand that there could have been a network
failure, or a misconfiguration that caused the connectivity
failure and soft of kind of default to "well it failed"
without further clarification.

As such, this patch adds explicit catching of the requests
ConnectionError exception and rasies a new internal error
with a more verbose error message in that event to provide
operators with additional clarity.

Change-Id: I4cb2c0d1f577df1c4451308bd86efa8f94390b0c
Story: 2008046
Task: 40709
2020-08-20 13:45:47 -07:00
Mark Goddard
1b4ce47921 Add an ability to run in-band deploy steps
Mostly adaptation of cleaning methods.

Co-Authored-By: Dmitry Tantsur <dtantsur@redhat.com>
Change-Id: Ife0502391bbece46d619a20a825dfdb191d5c2b4
Story: 2006963
Task: 37791
2020-04-06 10:24:08 +02:00
Julia Kreger
cee4bfc4bc Add NTP time sync
Attempt to sync the clock and save it to the hardware clock.

This feature supports use of chrony or ntpdate.

Sem-Ver: feature
Change-Id: I178d7614429d582e742d9cba6d0fa3ae099775e3
Story: 1619054
Task: 11591
2020-03-07 09:16:19 -08:00
Zuul
09d2db7c39 Merge "Software RAID: Create/delete configurations" 2019-06-05 07:34:01 +00:00
Arne Wiebalck
2db123d318 Software RAID: Create/delete configurations
This patch proposes to extend the IPA to be able to configure software
RAID devices. For this, the {create,delete}_configuration methods of
the GenericHardwareManager are implemented.

Change-Id: Id20302537f7994982c7584af546a7e7520e9612b
Story: #2004581
Task: #29101
2019-06-04 12:33:40 +02:00
Dmitry Tantsur
f821db3a54 Allow image checksum to be a URL
We allow image_source to be a URL, let us also support URLs for checksums.
This change copies handling of multi-file checksum files from metalsmith.

Change-Id: Ie4d7e5c79b76bdd72d50eeb384cf10519278a80c
Story: #2005061
Task: #29605
2019-02-25 14:28:09 +01:00
Jaganathan Palanisamy
cc9e05da50 NUMA-topology collector
Implement the optional collector for fetching the NUMA topology
details.
Collects RAM, CPU Cores, thread siblings and NICS data for
each NUMA node and stored under "numa_topology" key.

Closes-bug: #1635253

Co-Authored-By: Jaganathan Palanisamy <jpalanis@redhat.com>

Change-Id: I5a546c009d95f39b7af4d89cf785be8acb8ebc67
Signed-off-by: karthik s <ksundara@redhat.com>
2017-05-16 08:07:58 -04:00
Galyna Zholtkevych
9c2d0cdd85 Correct failure message output when downloading
This fixes unreadable output on download image failure.
Adding new instance variable to exception `ImageDownloadError` class
to avoid redundant logs.

Change-Id: I51782abd572588adfc62745eeab9c559eb8346dd
Closes-Bug: #1657691
2017-03-10 19:16:07 +00:00
Nam Nguyen Hoai
41cf45f126 Fix two typos, "messsage" and "containg"
This patch set updated two wrong words:
+ In error.py file, it should be changed from "messsage" to "message"
+ In utils.py file, it should be changed from "containg" to "contaning"

Change-Id: I5ad121ec58ccc6e5f3cc499eca50d16e691f217e
2016-11-22 08:56:33 +07:00
Shivanand Tendulker
3665306dfb Use ironic-lib to create configdrive
Shell script to create config drive being replaced with python
code in ironic-lib.

Closes-Bug: #1493328

Change-Id: I31108f1173db3fb585386b2949ec880a95305fb6
2016-10-21 03:39:06 +00:00
John L. Villalovos
20d960ff98 Remove Python 2.6 format style
In Python 2.6 it was required to use {0}, {1}...{n} when using the
string format function. In Python 2.7 and Python 3 it it not required.

Change {N} to {} in code.

This brings the code in style alignment with other projects like
ironic and ironic-lib.

Change-Id: I81c4bb67b0974f73905f14b589b3dd0a7131650d
Depends-On: I8f0e5405f3e2d6e35418c73f610ac6b779dd75e5
2016-10-06 09:05:26 -07:00
Dmitry Tantsur
6829d34c15 Bind to interface routable to the ironic host, not a random one
Binding to the first interface that has an IP address is error-prone: there is
no guarantee that ironic can reach us via this inteface. It is much safer to
detect the interface facing ironic and bind to it.

Unused LookupAgentInterfaceError exception is deleted.

The TinyIPA build also requires iptables dependency at build time to insert the
required kernel modules.

Closes-Bug: #1558956
Change-Id: I9586805e6c7f52a50834bc03efeb72d1faa6cb65
2016-03-21 14:21:12 +00:00
Zhenguo Niu
d25d94b316 Change to use WARNING level for heartbeat conflict errors
It's normal that ironic returns 409 Conflict from time to time, so
it's a bit confusing that we report this with Exception level and
traceback.

Change-Id: I1627c61facc3fadd0f5d9d324150e7d2833c7fbc
Closes-Bug: #1533113
2016-03-06 17:13:02 +08:00
Dmitry Tantsur
c474a5ac6c Support Linux-IO in addition to tgtd
The iSCSI extension now tries to use Linux-IO first (via rtslib)
and falls back to tgtd if Linux-IO can't be used (e.g. in the CoreOS-based
image which uses containers).

Change-Id: I9cc7a30d9c93c445a66d183146e9260c2b096d33
Closes-Bug: #1504562
2015-11-30 18:38:03 +01:00
dparalen
e51ccbe7c3 avoid duplicate text in ISCSIError message
The ISCSIError class defines a class-level message attribute with
value: "Error starting iSCSI target". This attribute is further
processed in RESTError.__init__ method, the ISCSIError super-class, to
create an Exception message concatenating self.message with provided
details argument.  However, the ISCISError.__init__ method provides a
details attribute prefixed with the same text to the super(ISCSIError,
self).__init__ method.  As a result, the text appears twice:
"ISCSIError: Error starting iSCSI target: Error starting iSCSI target:
ISCSI daemon didn't initialize. Failed with exit code 107. stdout: .
stderr: tgtadm: failed to send request hdr to tgt daemon, Transport
endpoint is not connected"

The patch purpose is to remove the details prefix to avoid duplicate
text in the exception text while honouring ISCSIError.message.

Change-Id: I9e1434ae17da5112527a841ac069ed2285566cca
2015-10-20 08:10:56 +02:00
Josh Gachnang
e17129dbad Add more info to checksum exception
If an image cannot be downloaded for some reason, it is helpful for
operators to have the image path, checksum, and calculated checksums
available easily from the API.

Change-Id: I6a2fb46726245cebd730b5c51d4f25f8465f1658
2015-09-21 15:54:02 -07:00
Dmitry Tantsur
096830414b Add support for inspection using ironic-inspector
Adds a new module ironic_python_agent.inspector and new entry point
for extensions, which will allow vendor-specific inspection.

Inspection is run on service start up just before the lookup.
Due to this early start, and due to the fact we don't even know
MAC address of nodes on inspection (to say nothing about IP addresses),
exception handling is a bit different from other agent features:
we try hard not to error out until we send at least something to inspector.

Change-Id: I00932463d41819fd0a050782e2c88eddf6fc08c6
2015-09-07 18:22:54 +02:00
Josh Gachnang
108599f3f0 Fix printing of errors in IPA
Exception messages weren't being bubbled up to the API because the
base exception class wasn't printing correctly. This adds a string
and representation function to ensure they print properly and show
up correctly when debugging interactively.

Cleaned up the `message` attr on the exception classes. It looks
like they started out all without a period, but started adding them
later. Changed classes that were setting error `details` == `message`
to use the default details provided in RESTError.

Change-Id: I1ce256585c9a574e1d1f857c7dc4c417a56b913b
2015-08-11 14:03:09 -07:00
Jim Rollenhagen
601201d120 Update hacking and fix hacking violations
This does a few things:

* Update hacking to the version in global-requirements. Old hacking was
  installing a version of pbr that was breaking other packages.

* Fix all the hacking/pep8 rules that updating hacking raised.

* Do some general docstring cleanup, while already in there cleaning up
  a bunch of docstrings due to H405 violations.

Change-Id: I1fc1e59d4c3d7b14631f8b576e3f3854bc452188
Closes-Bug: #1461717
2015-06-03 16:58:57 -07:00
Josh Gachnang
5f4fa7f27e Add cleaning/zapping support to IPA
This will add support for in band cleaning operations to IPA and
replace the decom API that was unused.

Adds API support for get_clean_steps, which returns a list of
supported clean steps for the node, execute_clean_step, to execute
one of the steps returned by get_clean_steps.

Adds versioning and naming for hardware managers, so if a new hardware
manager version is deployed in the middle of cleaning/zapping, the
cleaning/zapping will be restarted to avoid incompatabilities.

blueprint implement-cleaning-states
blueprint inband-raid-configuration
blueprint implement-zaping-states
Depends-On: Ia2500ed5afb72058b4c5e8f41307169381cbce48
Change-Id: I750b80b9bf98b3ddc5643bb4c14a67d2052239af
2015-03-17 17:07:04 -07:00
Lucas Alvares Gomes
d23e0170de Add the image extension (for local boot)
Initially this extension supports installing a bootloader so the user
image can boot from the local disk.

Change-Id: Ia588aafc240b55119c02f1254addc0cf796f88c5
2015-03-04 16:34:17 +00:00
Lucas Alvares Gomes
d3aa7c93aa Add iscsi extension
This extension allows IPA to be used with the PXE/iSCSI methodology of
deployment in Ironic.

Change-Id: I32ec9fa74182c0d03c7ef1b698b1d0c0e3007773
2015-02-26 12:13:00 +00:00
Jay Faulkner
e5d88be8cb Log required troubleshooting info on image dl fail
Currently, we only log the image ID and attempted URL. Now, we log the
status code recieved and detailed information about how and when things
failed.

Change-Id: I718c7facbe1500d98be78b7b6137e92fdfb2fdf1
Closes-bug: 1420981
Depends-On: I69f6f6eef4ad573f406d64d579a9811c70ac5d28
2015-02-12 07:35:11 -08:00
Michael Turek
0c4aa3dcf2 Make all IPA error classes inherit from RESTError
Currently several IPA error classes inherit from Exception. This
patch makes the base class of those classes RESTError. These
error classes are also restructured to initialize in the same
manner as other classes which inherit from RESTError. Additionally
test cases are added for these error classes.

Change-Id: Ie6235e4cc25f072b789b2e72e4592d4cf02bfedc
Closes-bug: #1410372
2015-01-16 21:56:03 +00:00
Ruby Loo
166f56da94 Consistent way to set details for Error instances
This fixes and cleans up (making it consistent) how error
instances set their details value. The base class RESTError
will set the details value; all subclasses should call their
parent's __init__().

Unit tests were added to test that the Error instances are
initialized correctly.

Change-Id: I2390fa0012f8e4e6d73cbfb188f1733dfe85e65a
Closes-Bug: #1408817
2015-01-15 18:40:29 +00:00
Jenkins
b51b303403 Merge "HardwareManagerMethodNotFound requires a method" 2015-01-15 18:32:11 +00:00
Ruby Loo
f09359760c Error classes invoke their parent's __init__()
This fixes some Error classes so that they are correcting invoking
their parent's __init__() method instead of some other ancestor's
method.

Change-Id: I7cb2fc56792f7516222baf75f76b50509deefcf5
Closes-Bug: 1408813
2015-01-12 21:12:37 +00:00
Jay Faulkner
49d547d97e HardwareManagerMethodNotFound requires a method
If this exception is called, it should contain a method argument. If
not, allow the incorrect call to bubble up rather than setting it to
None. This will ensure this error is never called without the method
argument.

Change-Id: Iedc82b3446d1ee41d6ae94ee43391e12ef4899a7
2015-01-09 17:02:37 +00:00
Jay Faulkner
2bbec5770c Allow use of multiple simultaneous HW managers
Currently we pick the most specific manager and use it. Instead, call
each method on each hardware manager in priority order, and consider the
call successful if the method exists and doesn't throw
IncompatibleHardwareMethodError.

This is an API breaking change for anyone with out-of-tree
HardwareManagers.

Closes-bug: 1408469
Change-Id: I30c65c9259acd4f200cb554e7d688344b7486a58
2015-01-08 15:15:13 -08:00
Jim Rollenhagen
a9f2179761 Fix exception that is not properly raised
This commit fixes an exception that was not properly raised, and
also makes the exception more relevant.

This also fixes an outstanding bug where, if the agent
was not associated with a node, get_node_uuid() would fail in an
unexpected manner.

Change-Id: Ifca474a73dd50b5fd2242e5b7e938a5db04f27a8
2014-09-10 14:53:37 -07:00
Ramakrishnan G
8c0584c121 Add vmedia boot support in IPA
This commit adds support for booting IPA from virtual
media cdrom.  When IPA is booted over virtual media cdrom,
the parameters to the IPA are passed in a text file within
the virtual media floppy.

Change-Id: Ia04585416aada85022af73fb2b945bd3895606f0
Closes-Bug: #1358723
2014-09-02 12:51:50 +05:30
Josh Gachnang
83782018f7 Improve Disk Detection
The previous implementation of list_block_devices used blockdev,
which would list partitions, software RAID and other devices as block devices.
By switching to lsblk, the agent can filter down to only physical block
devices, which is all the agent cares about for any of its operations.

This change adds two new fields to the BlockDevice class: model, a string of
the block devices reported model, and rotational, a boolean representing a
spinning disk (True) or a solid state disk (False). This data can be useful
for vendor hardware managers.

Change-Id: I385c3bb378c2c49385bca14a1d7efa074933becf
Closes-Bug: 1344351
2014-07-18 17:42:23 -07:00
Jim Rollenhagen
c5df7070af Better errors for execute() failures
Exceptions raised due to processutils.execute() failing now include
stdout and stderr.

Change-Id: Id5d1b5bc51d377f9f3c338cd7303ea800f76e5cd
2014-06-24 06:50:54 -07:00
Ellen Hui
b4f1a0b2d3 Tries to advertise valid default IP
During the first heartbeat, the heartbeater asks the agent to check
its advertised address; if the advertised IP is still the default
(None), the agent tries to replace it with the IP of the first network
interface it finds.  If it fails to find either a network interface or
an IP address, the agent raises an exception.

Change-Id: I6d435d39e99ed0ff5c8b4883b6aa0b356f6cb4ae
Closes-Bug: #1309110
2014-06-10 20:54:34 +00:00
Russell Haering
dff46583d3 Add a HardwareManager method to erase devices
Add erase_devices method to the HardwareManager class. By default this
method iterates block devices, and calls a new abstract
erase_block_device method for each device. This patch includes a
simple implementation of erase_block_device on the
GenericHardwareManager which attempts to issue an ATA secure erase on
supported devices.

Change-Id: I81da065395b8785f636f1b0a0d60c9f1c045441e
2014-06-06 10:27:53 -07:00
Vladimir Kozhukalov
b306626e86 Flow extension uses extension manager from agent
Removed creating separate extension manager for flow extension.
Instead, have made flow extension using the same extension manager
instance which is initialized in agent. It fixes circular
extension loading in stevedore.

Closes-Bug: #1316145
Change-Id: Id339f1876168a41ca43ba7473f3ff6949a233ef3
2014-06-02 15:21:38 +04:00
Alexander Gordeev
ed4460990e Make encoding.serialize() more programmatical
Introduce `serializable_fields` to express which class attributes
to be serialized.

Get rid of OrderedDict. Just replacing it with regular dict.

Change-Id: I3f7639dab171d3d62e92d0d1bb6d7b071cf963ad
2014-05-06 18:02:45 +04:00
Jim Rollenhagen
2e691d7971 Check configdrive size before writing to partition
Avoids writing a configdrive out to disk that is larger than
the intended partition.

Change-Id: I4e067ccb23ba528d96e4faad39219f67b4178e82
2014-04-25 13:42:47 -07:00
Jim Rollenhagen
3c1d52cbb1 Use # instead of """ for copyright blocks
Reformats copyright messages to be comments rather than
docstring-style blocks.

Change-Id: I4d863f53b67bb49d03bda0952b9e6179b6d23c59
2014-04-10 07:14:06 -07:00
Josh Gachnang
5914e36b30 Replacing teeth/overlord with ipa/ironic 2014-03-19 16:19:52 -07:00
Josh Gachnang
b30d345c2e Renaming to IPA 2014-03-19 15:50:43 -07:00