These flags will be processed in a new ironic-inspector plugin
to support setting capabilities like cpu_vt (virtualization enabled).
Change-Id: I5fe9310c316841eabdd2d5e2ef2ae30afa03d29a
Partial-Bug: #1571580
In order to support a more complex syntax for root device hints (e.g
operators: greater than, less than, in, etc...) we need to stop relying
on the kernel command line for passing the root device hints. This patch
changes this approach by getting the root device hints from a cached
node object that was set in the hardware module.
Two new functions: "cache_node" and "get_cached_node" were added to the
hardware module. The idea is to facilitate the access to a node object
representation from the hardware extension methods without changing
method signatures, which would break compatibility with out-of-tree
hardware managers.
Note that the new "get_cached_node" is just a guard function to
facilitate the tests for the code.
The function parse_root_device_hints() and its tests were removed since
it's not used/needed anymore.
Partial-Bug: #1561137
Change-Id: I830fe7da1a59b46e348213b6f451c2ee55f6008c
Some kernel modules take substantial time to initialize. For example,
with mpt2sas RAID driver inspection and deployment randomly fail
due to IPA starting before the driver finishes initialization.
This problem is probably impossible to solve in a generic case, as
modern Linux environment do not have a notion of "hardware is fully
initialized" moment. All hardware is essentially hotplug.
To solve it at least for the simplest case, this patch adds a wait loop
on start up waiting for at least one suitable disk to appear in inventory.
Note that root device hints are not considered, as the node might not
be known at that moment yet.
Change-Id: Id163ca28f7c140c302ea04947ded3f3c58b284de
Partial-Bug: #1582797
I would've voted -1 on the patch in question had I reviewed it, and per
standard OpenStack/Ironic procedure, I'm reverting it for re-review and
discussion.
In this case; I don't think the new method in the HWM interface is
needed, and that evaluate_hardware_support() is intended to handle the
cases handled.
This reverts commit 0962cae1da69a1a2981d5950ad741d91115dac06.
Change-Id: Ic08e44bdf116403444b257ee9f4e5b906f5eac53
Some kernel modules take substantial time to initialize. For example,
with mpt2sas RAID driver inspection and deployment randomly fail
due to IPA starting before the driver finishes initialization.
Add a new hardware manager method initialize_hardware, which gets
run on start up before other hardware manager method invocations.
The generic implementation is to call udev settle and wait for
at least one suitable disk device to appear with the hardcoded
timeout of 15 seconds. Also preload the IPMI modules instead of
calling modprobe every time the inventory is requested.
Change-Id: If7758bb6e3faac7d05451baa3a26adb8ab9953d5
Partial-Bug: #1582797
Introduce a new parameter in driver_internal_info called
agent_erase_devices_zeroize to control the behavior of shred. This
parameter controls the --zero argument used when invoking shred.
Configuring this to false disabled the last pass of zeroes, leaving the
device with random data.
Change-Id: I7053034f5b5bc6737b535ee601e6fb71284d4a83
Partial-bug: #1568811
Depends-On: Ia7ea8d909df9ae86a6dbd68ba94746b171535eb8
Presently should the ATA erasure operation fails, IPA halts the
cleaning process and the node goes to CLEANFAIL state as a result.
This failure could be the result of a previous cleaning failure
that left drive security enabled, for which code has been added
in an attempt to address this case by attempting to unlock the
the drive.
In the event that an operator wishes to automatically fallback to
disk scrubbing operations, the capability has been added through
a driver_internal_info field "agent_continue_if_ata_erase_failed"
that can be set to True, however defaults to False keeping the
same behavior that IPA presently exhibits in the event of ATA
erase operations failing.
Partial-Bug: #1536695
Change-Id: I88edd9477f4f05aa55b2fe8efa4bbff1c5573bb1
In the DIB build the DHCP code (provided by the dhcp-all-interfaces element)
races with the service starting IPA. It does not matter for deployment itself,
as we're waiting for the route to the Ironic API to appear. However, for
inspection it may result in reporting back all NIC's without IP addresses.
Inspection fails in this case.
This change makes inspection wait for *all* NIC's to get their IP addresses up
to a small timeout. The timeout is 60 seconds by default and can be changed
via the new ipa-inspection-dhcp-wait-timeout kernel option (0 to not wait).
After the wait inspection proceedes in any case, so the worst downside
is making inspection 60 seconds longer.
To avoid waiting for NIC's that are not even connected, this change extends the
NetworkInterface class with 'has_carrier' field.
Closes-Bug: #1564954
Change-Id: I5bf14de4c1c622f4bf6e3eadbe20c44759da5d66
This patch is making the list_all_block_devices() method to wait for
udev to settle it's event queue prior to listing the devices.
Sometimes the ironic-python-agent service may start before all devices
were detected and end up erroring out because it couldn't find a
suitable disk for deployment.
Closes-Bug: #1551300
Change-Id: I1ae2062a711115a1ea14b79ae9ace7ddd2fff9d5
Changed implementation to strip tokens up until the first 'Size: '
string. This will allow for less parsing errors in the first
six lines of the following output:
"dmidecode --type 17 | grep Size" returns:
Maximum Memory Module Size: 4096 MB
Maximum Total Memory Size: 8192 MB
Size: 2048 MB
Size: 2048 MB
Added a condition in the exception handling to address the
issue of the bug on other outputs like:
Installed Size: Not Installed
Enabled Size: Not Installed
Size: No Module Installed
Size: 1024 MB
Common strings like "No Module Installed" and "Not Installed" are
normal. These two strings are hard coded in the before mentioned
comparison and when found are logged as warnings instead of errors.
Change-Id: If3475afcebfc7af7e9256b99924919557c4d909c
Closes-Bug: #1521202
This patch set add hardware vendor information to data.
By using this data, we can get hints to detect driver.
Change-Id: I39385fd5d616edfad719c255f22642f215bfb532
This patch is extending the root device hints to also look at the device
name. This patch also refactors the tests for root device hints making
it easier to test a different hint per test.
Change-Id: I48d6456c75bbe6ddf16ac6561e5461ca51eb9c37
Partial-Bug: #1526732
If two hardware managers have the same clean step, for example
'erase_devices' in the GenericHardwareManager and a custom manager,
IPA must determine which step should be kept and which should be run
in order to prevent running the step multiple times.
This patch uses the following filtering logic to decide which step
"wins":
- Keep the step that belongs to HardwareManager with highest
HardwareSupport (larger int) value.
- If equal support level, keep the step with the higher defined
priority (larger int).
- If equal support level and priority, keep the step associated with
the HardwareManager whose name comes earlier in the alphabet.
Other than individual step priority, picking which step to keep does
not actually impact the cleaning run. However, in order to make
testing easier, this change ensures deterministic, predictable
results.
Co-Authored-By: Mario Villaplana <mario.villaplana@gmail.com>
Co-Authored-By: Jay Faulkner <jay@jvf.cc>
Co-Authored-By: Brad Morgan <brad@morgabra.com>
Change-Id: Iaeea4200c38ee22cab72ba81c1dbae3389e675e4
Now pyudev raises DeviceNotFoundByFileError which does not inherit
from EnvironmentError, so our 'except' block in hardware.py no longer
catch the exception. It broke unit tests, but it can also potentially
break the deploy.
This patch updates hardware.py to catch both old a new exceptions.
Change-Id: Iaefd6089f6f766a241054d8e132b2f3098c8130d
Closes-Bug: #1522756
This patch is a follow up patch fixing some nits left by the review
da9c3b0adc67efa916fc534d975823c0a45948a1, this patch adds the
wwn_with_extension and wwn_vendor_extension root device hints to the
"serializable_fields" list attribute of the BlockDevice class and fixes
some tests.
Change-Id: I6039be535988319276f9ac355c80997d34328ce8
This patch is extending the root device hints to also look at
ID_WWN_WITH_EXTENSION and ID_WWN_VENDOR_EXTENSION from udev.
Prior to this patch the IPA ramdisk only cared about ID_WWN but in some
systems in some platforms with a RAID controller, this ID can be same
even if they are different disks (see bug 1516641).
Closes-Bug: #1516641
Change-Id: Ic3e9a1111dfcc99702190c173562a0dccf5f94c4
This is a follow-up patch for commit
3af9ab36bfae3a369fdb3d2b6d02ac803c39ee17
The review requested that a LOG.debug() message be added.
Change-Id: I36fbd4269c948812f4bee66d0130150afd0c0279
Bring ironic-python-agent in line with the other ironic projects.
Stop ignoring all E12* errors except E129
Stop ignoring E711
Change-Id: Icb9bc198473d1b5e807c20869eb2af7f4d7ac360
This patches updates the get_clean_steps() method to make the
erase_devices step abortable. Erasing devices is something that can be
cancelled without damaging the machine.
When a clean step is aborted the provision state of the Ironic node
will go to CLEANFAIL state. The operator can then do what is needed to
fix the problem (i.e network booting issues) and restart the cleaning
later on.
Partial-Bug: #1455825
Change-Id: Ic181ac3712810c6f6925e8b627ee79e77ecf4d83
Put the columns to retrieve from lsblk into a list so that future
modifications to columns will require fewer code changes.
Also add a 'block_type' parameter which defaults to 'disk'. To make the
function more flexible if callers wanted a different block type.
Update and add unit tests
Change-Id: If06460e13a5b56dc8d6efca9ff5b58ac6ba1f357
Currently we only use these disk properties for root device hints.
However, they'll be really useful for inspector, especially for also
implementing root device hints.
Change-Id: I48aa6b6d2d198d16f2f8e387970f7230066cf8a2
Create a SerializableComparable class derived from the Serializable
class.
Added the following functions to the SerializableComparable class:
'__eq__'
'__ne__'
Disable the '__hash__' function in the SerializableComparable class as
some derived classes are mutable.
Use the SerializableComparable class in hardware.py and
extensions/base.py
This should make unit testing users of the class easier when doing a
self.assertEqual() or self.assertNotEqual()
Added some initial unit testing for encoding.py
Change-Id: If0f14b3bfe7f1391f65dd730a16a534afed0da82
Adds a new module ironic_python_agent.inspector and new entry point
for extensions, which will allow vendor-specific inspection.
Inspection is run on service start up just before the lookup.
Due to this early start, and due to the fact we don't even know
MAC address of nodes on inspection (to say nothing about IP addresses),
exception handling is a bit different from other agent features:
we try hard not to error out until we send at least something to inspector.
Change-Id: I00932463d41819fd0a050782e2c88eddf6fc08c6
Instead of silently failing, raise DeviceNotFound when no root device
hints were provided and all found block devices are smaller than 4GB.
Change-Id: Idd2e2c5905adf847f00ad15a84a817c3715225dd
Closes-Bug: #1490761
Hardware managers should load at runtime. This will ensure the agent is
ready to respond to API calls before it begins heartbeating. Also, it
means in case of a syntax or other error in a HardwareManager, the agent
will crash before it heartbeats, which is better than it working until a
hardware manager method is needed.
Change-Id: I9403ce7bedc8d5af20b6d84371367253b26b74c2
Closes-bug: 1490008
There is no way for two hardware managers to handle erasing two disks
in two different ways. dispatch_to_managers was designed specifically
for this case, and the default behavior will remain the same for the
GenericHardwareManager (erase_block_device will pick up each disk).
Also return the result of the dispatch calls, so they'll be logged by
Ironic and give more cleaning insight.
Change-Id: I19e9dc8539a0729fbb96cae92fe633e24608fc68
This function is useful in any HardwareManager that interacts with
disks. Subclassing GenericHardwareManager is not ideal for any
hardware manager that interacts with only specific devices.
Change-Id: Ib20e68a8916590513c0a825e44407a110cfbb441
* Added NetworkInterface.ip4_address
* Added HardwareManager.get_bmc_address()
* Added Memory.physical_mb
This is total memory as reported by dmidecode, and yes,
it's different from total, as it includes kernel reserved space.
* Added CPU.architecture
As a side effect, get_cpus was switched to lscpu.
Also fixes problem when get_cpus reported the current frequency
instead of maximum one.
Change-Id: I4080d4d551eb0bb995a94ef9a300351910c09fb9
The param was added to the GenericHardwareManager but it wasn't added
to the base class.
This is a breaking API change for the hardware managers.
Change-Id: Ia73fe14308986496e3a4f8d71bc2298a9130cffa
Debugging the agent is a huge pain point. Tracebacks are rarely logged,
error messages are often only returned via the API, and lack of
info logging makes it hard to determine where some failures occur.
Some errors only return a 500 with no error message or logs.
Change-Id: I0a127de6e4abf62e20d5c5ad583ba46738604d2d
Was running into 'expected string, int found' when calling
shred with an Int for iterations.
Change-Id: Iffce247caba5b0d62ac89b6411402c8d975cfd2f
Closes-Bug: #1469838
Today, there is no option to configure number of iterations to be
done for shred block device erasing and defaults it to 1. This patch
adds a configuration option to change the number of passes to be done
to erase a block device.
Change-Id: I1921d33a6b364c4682b6c9baaf61ac092cfa11d7
Partial-Bug:#1465130
in-band disk erase using shred fails with error "'module' object has no
attribute 'ProcessExecutionError'". This commit is to fix the issue.
Change-Id: Ia0c426074b2f0e9d534ed96a3e213933160edc61
Closes-Bug:#144799
In-band disk erase using shred fails for agent_ilo driver as it tries to
erase the virtual floppy device attached.This fix is to skip the virtual
media devices and continue with other disks.
Change-Id: I26745985382d440f7d4b3fbfffb14545067fcca6
Closes-Bug:#1450298
The docstrings here were all giving WARNINGs or ERRORs during the docs
build, and were generally making unappealing looking developer
documentation. I corrected the syntax and did what was neccessary to
make the build come out clean.
Change-Id: I74b00a7f125770b0468cff3bdf26d0d52cd054d7
(cherry picked from commit c0921cdff372ce1fd6df1c4ab4eb5463e2cba0e4)