43 Commits

Author SHA1 Message Date
Zuul
b6075156b3 Merge "USB device discovery" 2024-03-28 21:22:53 +00:00
Damien Rannou
3fd68c0848 USB device discovery
The idea is to retreive USB devices informations via 'lshw' and
return the list to ironic in order to be able to create introspection
rules based on USB devices.

Change-Id: I39d60cb467614fca7a7f701dbe576154213580a5
2024-02-19 14:49:52 +01:00
Zuul
6d35c1e949 Merge "Make inspection URL optional if the collectors are provided" 2024-02-07 23:06:34 +00:00
Dmitry Tantsur
0010f5c11a
Also retry inspection on HTTP CONFLICT
The new implementation can return it when unable to lock the node.

Other possible errors are 400 and 404 (should not be retried), as well as
5xx (already retried).

Change-Id: I74c2f54a624dc47e8e2d1e67ae4c6a6078e01d2f
2024-01-26 16:21:24 +01:00
Dmitry Tantsur
6cd36a750f
Make inspection URL optional if the collectors are provided
With the new in-band inspection, we can derive the callback URL from
the Ironic URL, there is no need to duplicate it. This change uses
the presence of collectors as a sign to run inspection.

The previous approach of setting an inspection URL, with or without
explicitly setting collectors, still works for compatibility with
ironic-inspector.

Change-Id: Ie4279ee6d2995c9686f1dcdef1d6e5dc1dd20871
2024-01-10 08:55:42 +01:00
Dmitry Tantsur
0d4ae976c2
Support several API and Inspector URLs
Allows nodes with a single IP stack to be deployed from a dual-stack
Ironic.

Detecting advertised address and usable Ironic URLs are done completely
independently which does open some space for a misconfiguration. I hope
it's not likely in the reality, especially since this feature is
targetting advanced standalone users.

Change-Id: Ifa506c58caebe00b37167d329b81c166cdb323f2
Closes-Bug: #2045548
2024-01-09 16:43:23 +01:00
Dmitry Tantsur
2bb74523ae
Add missing headers to the inspection callback
Somehow, it has worked correctly for years, but now I've discovered that
the new inspection is (no longer?) tolerant to the missing header.

While here, copy all headers from the heartbeat code.

Change-Id: I9e5c609eb4435e520bc225dea08aedfdf169744b
2024-01-09 16:38:46 +01:00
Iury Gregory Melo Ferreira
03b6b0a4ab Fix inspector retries to not take a long time
Since we moved to exponential wait we increased the amount of time
to run unit tests, now we can configure the max time to wait

- before: Ran: 33 tests in 22.6581 sec.
- after: Ran: 33 tests in 4.0256 sec.

Change-Id: Ibdcfebacad0489d17183e43ceb0d603fce67e72b
2023-12-19 14:26:59 -03:00
Iury Gregory Melo Ferreira
801da9ec1f Retry in ProxyError during post inspector data
* ProxyError is derived from ConnectionError, but it's necessary
to check the Response object to identify.

- Added ProxyError in retry_if_exception_type
- Updated _post_to_inspector to proper handle ProxyError
- Updated the wait to use wait_exponential instead of wait_fixed.

Closes-Bug: 2045429
Change-Id: Iefe3fe581cd4e7c91a0da708e6f6d0fdaacab6fe
2023-12-06 12:01:35 -03:00
Zhou Ya
76ad06225a Get numa_node info when collecting pci devices info
IPA now includes information about numa node id when collecting
information about PCI devices.

Closes-bug: #1622940
Co-Authored-By: Jay Faulkner <jay@jvf.cc>
Change-Id: I70b0cb3eff66d67bb8168982acbbf335de0599cd
2023-10-24 14:27:21 -07:00
Julia Kreger
78c1343a54 Fix Bandit errors
Bandit 1.7.5 released with a timeout check for all requests and
urllib calls.

Fixed those.

In the process, then exposed a bandit b310 issue, which was already
covered by the code, but explicitly marked it as such.

Also, enables bandit checks to be voting for CI..

Change-Id: If0e87790191f5f3648366d571e1d85dd7393a548
2023-06-06 08:34:55 -07:00
Dmitry Tantsur
3e05a03f7c Deprecate LLDP in inventory in favour of a new collector
Binary LLDP data is bloating inventory causing us to disable its collection
by default. For other similar low-level information, such as PCI devices
or DMI data, we already use inspection collectors instead. Now that the
inventory format is shared with out-of-band inspection, having LLDP
there makes even less sense.

This change adds a new collector ``lldp`` to replace the now-deprecated
inventory field.

Change-Id: I56be06a7d1db28407e1128c198c12bea0809d3a3
2023-04-26 19:33:51 +00:00
Riccardo Pittau
64ffd2ee80 Remove oslo.serialization dependency
Use pure json instead of jsonutils.

Borrow encode function from oslo.serialization to be used in the
utils module.

Change-Id: Ied9a2259a4329a86b4f0853bd1fb187563c0a036
2022-06-17 09:37:35 +02:00
Dmitry Tantsur
62672de131 Reduce the duration of retries in the inspector tests
Currently the test takes 5*5=25 seconds. Re-arrange the code so
that it's possible to change the retry delay in tests.

Change-Id: Ia559dad4bc656f8ad6b2cb8cb0137a97e2614db7
2020-10-07 12:39:01 +02:00
Julia Kreger
bb27badf76 Add basic retries for inspection
A transitory connection failure, such as one caused by
a port being held down for traffic forwarding, can experience
intermittent connectivity failures which result in failed
introspections.

Now the agent retries.

Change-Id: I72c5e3aca000d3854a17f8a461b1a2935e5c0d9b
2020-09-14 22:38:18 +00:00
Kaifeng Wang
b424fbfa35 Extends pci devices metrics
Collects PCI class, revision, and bus information for the pci-devices
collector, these metrics as well as vendor id and device id are
components which can be used to construct device information like
lspci output, which is how cyborg agent collects accelerator devices.

Accelerator device based scheduling is possible after ironic has such
information in place.

Change-Id: I6c37c554f37dd5f1d21c8fd4fad2a4f44a3c75d7
Story: 2007971
Task: 40474
2020-08-04 23:32:37 +08:00
Julia Kreger
c76b8b2c21 Limit Inspection->Lookup->Heartbeat lag
Caches hardware information collected during inspection
so that the initial lookup can occur without any delay.

Also adds logging to track how long inventory collection takes.

Co-Authored-By: Dmitry Tantsur <dtantsur@protonmail.com>
Change-Id: I3e0d237d37219e783d81913fa6cc490492b3f96a
2020-07-03 10:32:26 +02:00
Dmitry Tantsur
31b73b4984 Expose collector and hardware manager names via introspection data
This change adds a new introspection data field 'configuration'
with two lists: managers and collectors.

Change-Id: Ice0d7e6ecff3f319bc3a4f41617059fd6914e31c
2020-01-22 11:15:38 +01:00
Julia Kreger
696606f682 manual introspection trigger command
Change-Id: I64e66682c1e54f6edc260a22f46f5f6df8e85af1
Story: 2005896
Task: 33756
2019-07-08 07:43:40 -07:00
Dmitry Tantsur
5c5328ccaa Supports fetching API endpoints from mDNS
This change enables IPA to receive API endpoints and configuration
via multicast DNS.

Story: #2005393
Task: #30382
Change-Id: Ibbf07052bea8f5c0305dda098b2879bcbc2fece5
2019-05-29 16:58:24 +02:00
Doug Hellmann
a2d25de639 show inspection callback url in error messages
We have seen issues in misconfigured systems where messages from IPA
to ironic-inspector fail with 404 or other standard HTTP error
codes. Although there is an info message reporting the URL, the error
messages do not include the URL of the service IPA is trying to talk
to, which makes debugging the configuration difficult. This change
adds the URL to the error already being reported to improve that
situation.

Change-Id: Ib092ac690d29c385c0564c086ad2fec3df0fb2e0
Signed-off-by: Doug Hellmann <doug@doughellmann.com>
2019-05-13 18:33:49 -04:00
Dmitry Tantsur
f153a741e1 Clean up deprecated items in the inspection code
* Remove support for setting IPMI credentials (removed from inspector in Pike)
* Stop sending the ipmi_address field (bmc_address is used instead since Pike)

Change-Id: I1696041db62ba27e5d31e8481cb225a43d7e2a46
Closes-Bug: #1654318
2017-09-19 14:05:13 +02:00
Jenkins
fd7f10b993 Merge "Configure and use SSL-related requests options" 2017-02-07 09:57:49 +00:00
Pavlo Shchelokovskyy
fdd11b54a5 Configure and use SSL-related requests options
This patch adds standard SSL options to IPA config and makes use of them
when making HTTP requests.

For now, a single set of certificates is used when needed.
In the future configuration can be expanded to allow per-service
certificates.

Besides, the 'insecure' option (defaults to False) can be overridden
through kernel command line parameter 'ipa-insecure'.
This will allow running IPA in CI-like environments with self-signed SSL
certificates.

Change-Id: I259d9b3caa9ba1dc3d7382f375b8e086a5348d80
Closes-Bug: #1642515
2017-01-13 11:33:44 +02:00
Dmitry Tantsur
10bff0a518 Remove compatibility with old bash-based introspection ramdisk
Inspector is using inventory directly, so we can remove the bits sending
processed network and scheduling properties.

Change-Id: I6c58bc3c5ea78fd2dbda82b38515f332ce2e8d4a
2017-01-09 14:10:47 +01:00
Luong Anh Tuan
ab41106cf6 Python 3 Compatible JSON
In order to be really python3 compatible, the json lib was replaced
with oslo.serialization(1.10 or newer) module jsontuils since it's
the recommended migration to python3 guide.

https://wiki.openstack.org/wiki/Python3#Serialization:_base64.2C_JSON.2C_etc.

Change-Id: I2d8b62e642aba4ccd1b70be7e9b3784a95a6743d
Closes-Bug: #1629068
2016-11-16 08:19:51 +00:00
Jim Rollenhagen
62743fec6a Update to work with latest stevedore
stevedore no longer raises a KeyError when there's an entrypoint
missing. Instead we must supply a callback that handles that error.
Update inspection code to work with this.

Also bumps stevedore minimum to 1.16 ahead of the global-requirements
bot.

Closes-Bug: #1603542
Change-Id: I12af23f2525ac90e577bdd10bbfbbd9788e9551c
Depends-On: I8aa1ee52ff7de50488acb86e8920da89ddb05771
2016-07-17 14:38:00 -04:00
Lucas Alvares Gomes
af81914ce7 Add a log extension
The log extension is responsible for retrieving logs from the system,
if journalctl is present the logs will come from it, otherwise we
fallback to getting the logs from the /var/log directory + dmesg logs.

In the coreos ramdisk, we need to bind mount /run/log in the container
so the IPA service can have access to the journal.

For the tinyIPA ramdisk, the logs from IPA are now being redirected to
/var/logs/ironic-python-agent.log instead of only going to the default
stdout.

Inspector now shares the same method of collecting logs, extending its
capabilities for non-systemd systems.

Partial-Bug: #1587143
Change-Id: Ie507e2e5c58cffa255bbfb2fa5ffb95cb98ed8c4
2016-06-28 17:02:11 +01:00
Szymon Borkowski
f7e080c8bf Add PCI devices collector to inspector
Adds a new collector, which gathers list of PCI devices.
Each entry is a dictionary containing 2 keys:
- vendor-id
- product-id
Such information can then be used by the inspector to distinguish
appropriate PCI devices.

Change-Id: Id7521d66410e7d408d7eada692b6123e769ce084
Partial-Bug: #1580893
2016-06-24 14:50:58 +02:00
Dmitry Tantsur
53b187a4c3 Add boot information into the inventory
Adds a new BootInfo object with 2 fields:

* current_boot_mode - bios or uefi, detected from presence of /sys/firmware/efi
  as per the following answer: http://askubuntu.com/a/162896
  This field will be used for setting the boot_mode capability in ironic-inspector
* pxe_interface - PXE booting interface, if it can be detected.
  This fields is already used by ironic-inspector, added here for consistency.

Change-Id: Ib36b592ffaba3bfa055d65c9526607867d302584
Partial-Bug: #1571580
2016-05-26 17:05:11 +02:00
Dmitry Tantsur
6da6ace384 [inspection] wait for the PXE DHCP by default and remove the carrier check
We hoped that checking /sys/class/net/XXX/carrier will allow us
to not wait for interfaces that are not connected at all.
In reality this field turned out to be unreliable. For example, it is
also set to 0 when interface is down or is being configured.
The bug https://bugzilla.redhat.com/show_bug.cgi?id=1327255 shows
the case when carrier is 0 for all interfaces, including one that is
used to post back data, which is obvious non-sense.

This change removes check on carrier for the loop. To avoid 60 seconds
wait for people with several NIC's, it's changed to only wait for the
PXE booting NIC, which obviously must get an IP address.

This makes IP addresses in the inspection data for other NIC's somewhat
unreliable. A new option inspection_dhcp_all_interfaces is introduced
to allow waiting for all NIC's to get IP addresses.

This change should finally fix bug 1564954.

Change-Id: I8b04bf726980fdcf6bd536c6bb28e30ac50658fb
Related-Bug: #1564954
2016-05-10 18:12:46 +02:00
Jenkins
2d8e139f03 Merge "Set modification time in tarfile of ramdisk logs" 2016-04-08 12:41:28 +00:00
Dmitry Tantsur
3deb25a3ce Wait for the interfaces to get IP addresses before inspection
In the DIB build the DHCP code (provided by the dhcp-all-interfaces element)
races with the service starting IPA. It does not matter for deployment itself,
as we're waiting for the route to the Ironic API to appear. However, for
inspection it may result in reporting back all NIC's without IP addresses.
Inspection fails in this case.

This change makes inspection wait for *all* NIC's to get their IP addresses up
to a small timeout. The timeout is 60 seconds by default and can be changed
via the new ipa-inspection-dhcp-wait-timeout kernel option (0 to not wait).

After the wait inspection proceedes in any case, so the worst downside
is making inspection 60 seconds longer.

To avoid waiting for NIC's that are not even connected, this change extends the
NetworkInterface class with 'has_carrier' field.

Closes-Bug: #1564954
Change-Id: I5bf14de4c1c622f4bf6e3eadbe20c44759da5d66
2016-04-05 20:03:33 +02:00
Miles Gould
3f715a20fd Set modification time in tarfile of ramdisk logs
If we do not set this explicitly, tar will warn "journal: implausibly
old time stamp" when the user tries to untar the log files.

Change-Id: I4a5a1ffd4eeca9697cdcf16e02d3ff3c22d7132c
2016-04-04 17:29:16 +01:00
Dmitry Tantsur
58f86d0353 Stop trying to log stdout when fetching logs during inspection
Logging the whole journalctl output is not the best idea. Fortunately,
it does not work right now and fails with a traceback :)

This change adds a new log_stdout argument to utils.execute() and uses it in
the "logs" inspection collector.

Also do not log the logs while logging the collected data.

Change-Id: Ibc726ac2c4f5eb06c73ac4765bb400077b84a6cc
2016-03-08 16:31:18 +01:00
Dmitry Tantsur
5fa258b708 Fix "logs" inspection collector when logs contain non-ascii symbols
Somehow it didn't pop earlier. Updated tests to contain some creepy
russian letters :)

Closes-Bug: #1517913
Change-Id: I4c6712ea1e813d1f0f0d0aedaccfa1187526e0ec
2015-12-08 14:32:16 +01:00
Jenkins
2bce5f6065 Merge "Use oslo.log instead of original logging" 2015-11-02 17:44:19 +00:00
ZhiQiang Fan
9e75ba5460 Use oslo.log instead of original logging
We are using oslo.log now, but some of the modules still use logging.
We should use oslo.log to keep consistency, besides, oslo.log can
provide fine wrapper for OpenStack projects.

Change-Id: Ibe57e503b88b39e284a9e4b11a1886cd4e8d4ccf
2015-10-24 03:22:36 -06:00
Zhenguo Niu
18d5d6aba3 Replace deprecated LOG.warn with LOG.warning
Change-Id: Ib3d566f6e608ee453659e15cabcf8e9332aedc52
Closes-Bug: #1508442
2015-10-22 14:42:57 +08:00
Dmitry Tantsur
9d6b0864e3 Add "logs" and "extra-hardware" inspection collectors
This is a port of downstream inspector ramdisk plugins we found helpful.
* logs - sends journald logs with inspection data.
* extra-hardware - uses hardware-detect utility to collect bigger
  hardware inventory and to run benchmarks.

Change-Id: If05402606c45185d618279eef46e68c51209f82b
2015-10-01 18:25:30 +02:00
Dmitry Tantsur
3b70647358 inspection: prepare for future deprecations
1. cleanly separate deprecated and non-deprecated properties
2. add root disk to inspection data, so that we can have a proper
   fallback when root device hints are not given.

Change-Id: Ie19b82ff2a914873ff4b2395b02643e086b934b1
2015-09-16 14:26:57 +02:00
Dmitry Tantsur
e3e6000524 Follow-up to inspection patch 096830414b
Change-Id: I7ec05e501ec40802efa14cabe14752972919c7a9
2015-09-16 10:36:33 +00:00
Dmitry Tantsur
096830414b Add support for inspection using ironic-inspector
Adds a new module ironic_python_agent.inspector and new entry point
for extensions, which will allow vendor-specific inspection.

Inspection is run on service start up just before the lookup.
Due to this early start, and due to the fact we don't even know
MAC address of nodes on inspection (to say nothing about IP addresses),
exception handling is a bit different from other agent features:
we try hard not to error out until we send at least something to inspector.

Change-Id: I00932463d41819fd0a050782e2c88eddf6fc08c6
2015-09-07 18:22:54 +02:00