The idea is to retreive USB devices informations via 'lshw' and
return the list to ironic in order to be able to create introspection
rules based on USB devices.
Change-Id: I39d60cb467614fca7a7f701dbe576154213580a5
The new implementation can return it when unable to lock the node.
Other possible errors are 400 and 404 (should not be retried), as well as
5xx (already retried).
Change-Id: I74c2f54a624dc47e8e2d1e67ae4c6a6078e01d2f
With the new in-band inspection, we can derive the callback URL from
the Ironic URL, there is no need to duplicate it. This change uses
the presence of collectors as a sign to run inspection.
The previous approach of setting an inspection URL, with or without
explicitly setting collectors, still works for compatibility with
ironic-inspector.
Change-Id: Ie4279ee6d2995c9686f1dcdef1d6e5dc1dd20871
Allows nodes with a single IP stack to be deployed from a dual-stack
Ironic.
Detecting advertised address and usable Ironic URLs are done completely
independently which does open some space for a misconfiguration. I hope
it's not likely in the reality, especially since this feature is
targetting advanced standalone users.
Change-Id: Ifa506c58caebe00b37167d329b81c166cdb323f2
Closes-Bug: #2045548
Somehow, it has worked correctly for years, but now I've discovered that
the new inspection is (no longer?) tolerant to the missing header.
While here, copy all headers from the heartbeat code.
Change-Id: I9e5c609eb4435e520bc225dea08aedfdf169744b
Since we moved to exponential wait we increased the amount of time
to run unit tests, now we can configure the max time to wait
- before: Ran: 33 tests in 22.6581 sec.
- after: Ran: 33 tests in 4.0256 sec.
Change-Id: Ibdcfebacad0489d17183e43ceb0d603fce67e72b
* ProxyError is derived from ConnectionError, but it's necessary
to check the Response object to identify.
- Added ProxyError in retry_if_exception_type
- Updated _post_to_inspector to proper handle ProxyError
- Updated the wait to use wait_exponential instead of wait_fixed.
Closes-Bug: 2045429
Change-Id: Iefe3fe581cd4e7c91a0da708e6f6d0fdaacab6fe
IPA now includes information about numa node id when collecting
information about PCI devices.
Closes-bug: #1622940
Co-Authored-By: Jay Faulkner <jay@jvf.cc>
Change-Id: I70b0cb3eff66d67bb8168982acbbf335de0599cd
Bandit 1.7.5 released with a timeout check for all requests and
urllib calls.
Fixed those.
In the process, then exposed a bandit b310 issue, which was already
covered by the code, but explicitly marked it as such.
Also, enables bandit checks to be voting for CI..
Change-Id: If0e87790191f5f3648366d571e1d85dd7393a548
Binary LLDP data is bloating inventory causing us to disable its collection
by default. For other similar low-level information, such as PCI devices
or DMI data, we already use inspection collectors instead. Now that the
inventory format is shared with out-of-band inspection, having LLDP
there makes even less sense.
This change adds a new collector ``lldp`` to replace the now-deprecated
inventory field.
Change-Id: I56be06a7d1db28407e1128c198c12bea0809d3a3
Use pure json instead of jsonutils.
Borrow encode function from oslo.serialization to be used in the
utils module.
Change-Id: Ied9a2259a4329a86b4f0853bd1fb187563c0a036
Currently the test takes 5*5=25 seconds. Re-arrange the code so
that it's possible to change the retry delay in tests.
Change-Id: Ia559dad4bc656f8ad6b2cb8cb0137a97e2614db7
A transitory connection failure, such as one caused by
a port being held down for traffic forwarding, can experience
intermittent connectivity failures which result in failed
introspections.
Now the agent retries.
Change-Id: I72c5e3aca000d3854a17f8a461b1a2935e5c0d9b
Collects PCI class, revision, and bus information for the pci-devices
collector, these metrics as well as vendor id and device id are
components which can be used to construct device information like
lspci output, which is how cyborg agent collects accelerator devices.
Accelerator device based scheduling is possible after ironic has such
information in place.
Change-Id: I6c37c554f37dd5f1d21c8fd4fad2a4f44a3c75d7
Story: 2007971
Task: 40474
Caches hardware information collected during inspection
so that the initial lookup can occur without any delay.
Also adds logging to track how long inventory collection takes.
Co-Authored-By: Dmitry Tantsur <dtantsur@protonmail.com>
Change-Id: I3e0d237d37219e783d81913fa6cc490492b3f96a
This change adds a new introspection data field 'configuration'
with two lists: managers and collectors.
Change-Id: Ice0d7e6ecff3f319bc3a4f41617059fd6914e31c
This change enables IPA to receive API endpoints and configuration
via multicast DNS.
Story: #2005393
Task: #30382
Change-Id: Ibbf07052bea8f5c0305dda098b2879bcbc2fece5
We have seen issues in misconfigured systems where messages from IPA
to ironic-inspector fail with 404 or other standard HTTP error
codes. Although there is an info message reporting the URL, the error
messages do not include the URL of the service IPA is trying to talk
to, which makes debugging the configuration difficult. This change
adds the URL to the error already being reported to improve that
situation.
Change-Id: Ib092ac690d29c385c0564c086ad2fec3df0fb2e0
Signed-off-by: Doug Hellmann <doug@doughellmann.com>
* Remove support for setting IPMI credentials (removed from inspector in Pike)
* Stop sending the ipmi_address field (bmc_address is used instead since Pike)
Change-Id: I1696041db62ba27e5d31e8481cb225a43d7e2a46
Closes-Bug: #1654318
This patch adds standard SSL options to IPA config and makes use of them
when making HTTP requests.
For now, a single set of certificates is used when needed.
In the future configuration can be expanded to allow per-service
certificates.
Besides, the 'insecure' option (defaults to False) can be overridden
through kernel command line parameter 'ipa-insecure'.
This will allow running IPA in CI-like environments with self-signed SSL
certificates.
Change-Id: I259d9b3caa9ba1dc3d7382f375b8e086a5348d80
Closes-Bug: #1642515
Inspector is using inventory directly, so we can remove the bits sending
processed network and scheduling properties.
Change-Id: I6c58bc3c5ea78fd2dbda82b38515f332ce2e8d4a
stevedore no longer raises a KeyError when there's an entrypoint
missing. Instead we must supply a callback that handles that error.
Update inspection code to work with this.
Also bumps stevedore minimum to 1.16 ahead of the global-requirements
bot.
Closes-Bug: #1603542
Change-Id: I12af23f2525ac90e577bdd10bbfbbd9788e9551c
Depends-On: I8aa1ee52ff7de50488acb86e8920da89ddb05771
The log extension is responsible for retrieving logs from the system,
if journalctl is present the logs will come from it, otherwise we
fallback to getting the logs from the /var/log directory + dmesg logs.
In the coreos ramdisk, we need to bind mount /run/log in the container
so the IPA service can have access to the journal.
For the tinyIPA ramdisk, the logs from IPA are now being redirected to
/var/logs/ironic-python-agent.log instead of only going to the default
stdout.
Inspector now shares the same method of collecting logs, extending its
capabilities for non-systemd systems.
Partial-Bug: #1587143
Change-Id: Ie507e2e5c58cffa255bbfb2fa5ffb95cb98ed8c4
Adds a new collector, which gathers list of PCI devices.
Each entry is a dictionary containing 2 keys:
- vendor-id
- product-id
Such information can then be used by the inspector to distinguish
appropriate PCI devices.
Change-Id: Id7521d66410e7d408d7eada692b6123e769ce084
Partial-Bug: #1580893
Adds a new BootInfo object with 2 fields:
* current_boot_mode - bios or uefi, detected from presence of /sys/firmware/efi
as per the following answer: http://askubuntu.com/a/162896
This field will be used for setting the boot_mode capability in ironic-inspector
* pxe_interface - PXE booting interface, if it can be detected.
This fields is already used by ironic-inspector, added here for consistency.
Change-Id: Ib36b592ffaba3bfa055d65c9526607867d302584
Partial-Bug: #1571580
We hoped that checking /sys/class/net/XXX/carrier will allow us
to not wait for interfaces that are not connected at all.
In reality this field turned out to be unreliable. For example, it is
also set to 0 when interface is down or is being configured.
The bug https://bugzilla.redhat.com/show_bug.cgi?id=1327255 shows
the case when carrier is 0 for all interfaces, including one that is
used to post back data, which is obvious non-sense.
This change removes check on carrier for the loop. To avoid 60 seconds
wait for people with several NIC's, it's changed to only wait for the
PXE booting NIC, which obviously must get an IP address.
This makes IP addresses in the inspection data for other NIC's somewhat
unreliable. A new option inspection_dhcp_all_interfaces is introduced
to allow waiting for all NIC's to get IP addresses.
This change should finally fix bug 1564954.
Change-Id: I8b04bf726980fdcf6bd536c6bb28e30ac50658fb
Related-Bug: #1564954
In the DIB build the DHCP code (provided by the dhcp-all-interfaces element)
races with the service starting IPA. It does not matter for deployment itself,
as we're waiting for the route to the Ironic API to appear. However, for
inspection it may result in reporting back all NIC's without IP addresses.
Inspection fails in this case.
This change makes inspection wait for *all* NIC's to get their IP addresses up
to a small timeout. The timeout is 60 seconds by default and can be changed
via the new ipa-inspection-dhcp-wait-timeout kernel option (0 to not wait).
After the wait inspection proceedes in any case, so the worst downside
is making inspection 60 seconds longer.
To avoid waiting for NIC's that are not even connected, this change extends the
NetworkInterface class with 'has_carrier' field.
Closes-Bug: #1564954
Change-Id: I5bf14de4c1c622f4bf6e3eadbe20c44759da5d66
If we do not set this explicitly, tar will warn "journal: implausibly
old time stamp" when the user tries to untar the log files.
Change-Id: I4a5a1ffd4eeca9697cdcf16e02d3ff3c22d7132c
Logging the whole journalctl output is not the best idea. Fortunately,
it does not work right now and fails with a traceback :)
This change adds a new log_stdout argument to utils.execute() and uses it in
the "logs" inspection collector.
Also do not log the logs while logging the collected data.
Change-Id: Ibc726ac2c4f5eb06c73ac4765bb400077b84a6cc
Somehow it didn't pop earlier. Updated tests to contain some creepy
russian letters :)
Closes-Bug: #1517913
Change-Id: I4c6712ea1e813d1f0f0d0aedaccfa1187526e0ec
We are using oslo.log now, but some of the modules still use logging.
We should use oslo.log to keep consistency, besides, oslo.log can
provide fine wrapper for OpenStack projects.
Change-Id: Ibe57e503b88b39e284a9e4b11a1886cd4e8d4ccf
This is a port of downstream inspector ramdisk plugins we found helpful.
* logs - sends journald logs with inspection data.
* extra-hardware - uses hardware-detect utility to collect bigger
hardware inventory and to run benchmarks.
Change-Id: If05402606c45185d618279eef46e68c51209f82b
1. cleanly separate deprecated and non-deprecated properties
2. add root disk to inspection data, so that we can have a proper
fallback when root device hints are not given.
Change-Id: Ie19b82ff2a914873ff4b2395b02643e086b934b1
Adds a new module ironic_python_agent.inspector and new entry point
for extensions, which will allow vendor-specific inspection.
Inspection is run on service start up just before the lookup.
Due to this early start, and due to the fact we don't even know
MAC address of nodes on inspection (to say nothing about IP addresses),
exception handling is a bit different from other agent features:
we try hard not to error out until we send at least something to inspector.
Change-Id: I00932463d41819fd0a050782e2c88eddf6fc08c6