45 Commits

Author SHA1 Message Date
Zuul
6d35c1e949 Merge "Make inspection URL optional if the collectors are provided" 2024-02-07 23:06:34 +00:00
Dmitry Tantsur
0010f5c11a
Also retry inspection on HTTP CONFLICT
The new implementation can return it when unable to lock the node.

Other possible errors are 400 and 404 (should not be retried), as well as
5xx (already retried).

Change-Id: I74c2f54a624dc47e8e2d1e67ae4c6a6078e01d2f
2024-01-26 16:21:24 +01:00
Dmitry Tantsur
6cd36a750f
Make inspection URL optional if the collectors are provided
With the new in-band inspection, we can derive the callback URL from
the Ironic URL, there is no need to duplicate it. This change uses
the presence of collectors as a sign to run inspection.

The previous approach of setting an inspection URL, with or without
explicitly setting collectors, still works for compatibility with
ironic-inspector.

Change-Id: Ie4279ee6d2995c9686f1dcdef1d6e5dc1dd20871
2024-01-10 08:55:42 +01:00
Dmitry Tantsur
0d4ae976c2
Support several API and Inspector URLs
Allows nodes with a single IP stack to be deployed from a dual-stack
Ironic.

Detecting advertised address and usable Ironic URLs are done completely
independently which does open some space for a misconfiguration. I hope
it's not likely in the reality, especially since this feature is
targetting advanced standalone users.

Change-Id: Ifa506c58caebe00b37167d329b81c166cdb323f2
Closes-Bug: #2045548
2024-01-09 16:43:23 +01:00
Dmitry Tantsur
2bb74523ae
Add missing headers to the inspection callback
Somehow, it has worked correctly for years, but now I've discovered that
the new inspection is (no longer?) tolerant to the missing header.

While here, copy all headers from the heartbeat code.

Change-Id: I9e5c609eb4435e520bc225dea08aedfdf169744b
2024-01-09 16:38:46 +01:00
Iury Gregory Melo Ferreira
03b6b0a4ab Fix inspector retries to not take a long time
Since we moved to exponential wait we increased the amount of time
to run unit tests, now we can configure the max time to wait

- before: Ran: 33 tests in 22.6581 sec.
- after: Ran: 33 tests in 4.0256 sec.

Change-Id: Ibdcfebacad0489d17183e43ceb0d603fce67e72b
2023-12-19 14:26:59 -03:00
Iury Gregory Melo Ferreira
801da9ec1f Retry in ProxyError during post inspector data
* ProxyError is derived from ConnectionError, but it's necessary
to check the Response object to identify.

- Added ProxyError in retry_if_exception_type
- Updated _post_to_inspector to proper handle ProxyError
- Updated the wait to use wait_exponential instead of wait_fixed.

Closes-Bug: 2045429
Change-Id: Iefe3fe581cd4e7c91a0da708e6f6d0fdaacab6fe
2023-12-06 12:01:35 -03:00
Zhou Ya
76ad06225a Get numa_node info when collecting pci devices info
IPA now includes information about numa node id when collecting
information about PCI devices.

Closes-bug: #1622940
Co-Authored-By: Jay Faulkner <jay@jvf.cc>
Change-Id: I70b0cb3eff66d67bb8168982acbbf335de0599cd
2023-10-24 14:27:21 -07:00
Julia Kreger
78c1343a54 Fix Bandit errors
Bandit 1.7.5 released with a timeout check for all requests and
urllib calls.

Fixed those.

In the process, then exposed a bandit b310 issue, which was already
covered by the code, but explicitly marked it as such.

Also, enables bandit checks to be voting for CI..

Change-Id: If0e87790191f5f3648366d571e1d85dd7393a548
2023-06-06 08:34:55 -07:00
Dmitry Tantsur
62672de131 Reduce the duration of retries in the inspector tests
Currently the test takes 5*5=25 seconds. Re-arrange the code so
that it's possible to change the retry delay in tests.

Change-Id: Ia559dad4bc656f8ad6b2cb8cb0137a97e2614db7
2020-10-07 12:39:01 +02:00
Julia Kreger
bb27badf76 Add basic retries for inspection
A transitory connection failure, such as one caused by
a port being held down for traffic forwarding, can experience
intermittent connectivity failures which result in failed
introspections.

Now the agent retries.

Change-Id: I72c5e3aca000d3854a17f8a461b1a2935e5c0d9b
2020-09-14 22:38:18 +00:00
Dmitry Tantsur
d50ff06b6b Enable the logs collection by default
It's incredibly helpful when debugging and most of consumers seem
to enable and rely on it.

Change-Id: I33bf58b3eb16b63b70f2a23e8a04449dc88fd94c
2020-08-19 17:25:24 +02:00
Kaifeng Wang
b424fbfa35 Extends pci devices metrics
Collects PCI class, revision, and bus information for the pci-devices
collector, these metrics as well as vendor id and device id are
components which can be used to construct device information like
lspci output, which is how cyborg agent collects accelerator devices.

Accelerator device based scheduling is possible after ironic has such
information in place.

Change-Id: I6c37c554f37dd5f1d21c8fd4fad2a4f44a3c75d7
Story: 2007971
Task: 40474
2020-08-04 23:32:37 +08:00
Julia Kreger
c76b8b2c21 Limit Inspection->Lookup->Heartbeat lag
Caches hardware information collected during inspection
so that the initial lookup can occur without any delay.

Also adds logging to track how long inventory collection takes.

Co-Authored-By: Dmitry Tantsur <dtantsur@protonmail.com>
Change-Id: I3e0d237d37219e783d81913fa6cc490492b3f96a
2020-07-03 10:32:26 +02:00
Riccardo Pittau
d5d62c8dbf Use unittest mock from standard library
Drop the third party mock library to use unittest mock from
standard library.

Change-Id: Ib64b661572e4869a24865c02a6c84a6603930394
2020-04-06 14:35:50 +02:00
Dmitry Tantsur
31b73b4984 Expose collector and hardware manager names via introspection data
This change adds a new introspection data field 'configuration'
with two lists: managers and collectors.

Change-Id: Ice0d7e6ecff3f319bc3a4f41617059fd6914e31c
2020-01-22 11:15:38 +01:00
Riccardo Pittau
ca7a46b113 Stop using six library
Since we've dropped support for Python 2.7, it's time to look at
the bright future that Python 3.x will bring and stop forcing
compatibility with older versions.
This patch removes the six library from requirements, not
looking back.

Change-Id: I4795417aa649be75ba7162a8cf30eacbb88c7b5e
2019-11-29 10:18:14 +01:00
Julia Kreger
696606f682 manual introspection trigger command
Change-Id: I64e66682c1e54f6edc260a22f46f5f6df8e85af1
Story: 2005896
Task: 33756
2019-07-08 07:43:40 -07:00
Dmitry Tantsur
5c5328ccaa Supports fetching API endpoints from mDNS
This change enables IPA to receive API endpoints and configuration
via multicast DNS.

Story: #2005393
Task: #30382
Change-Id: Ibbf07052bea8f5c0305dda098b2879bcbc2fece5
2019-05-29 16:58:24 +02:00
Arne Wiebalck
fb74b55606 Add secondary sorting by name when guessing root disk
As some BIOSes try to boot only from the "first" disk, Ironic
should order potential disks not only by size, but also by name.
This patch proposes to add secondary sorting by device name when
identifying the root disk.

Change-Id: I4017c839eeb9d00d2b4ad5b90e4e9b65b74296c7
Story: #2004976
Task: #29434
2019-02-11 17:53:47 +01:00
Dmitry Tantsur
f153a741e1 Clean up deprecated items in the inspection code
* Remove support for setting IPMI credentials (removed from inspector in Pike)
* Stop sending the ipmi_address field (bmc_address is used instead since Pike)

Change-Id: I1696041db62ba27e5d31e8481cb225a43d7e2a46
Closes-Bug: #1654318
2017-09-19 14:05:13 +02:00
ChangBo Guo(gcb)
30e0da15ea Remove usage of parameter enforce_type
Oslo.config deprecated parameter enforce_type and change its
default value to True in Ifa552de0a994e40388cbc9f7dbaa55700ca276b0.
Remove the usage of it to avoid DeprecationWarning: "Using the
'enforce_type' argument is deprecated in version '4.0' and will be
removed in version '5.0': The argument enforce_type has changed its
default value to True and then will be removed completely."

Change-Id: I0f0fb540c43edde64e489915c5199da40a0da9c1
Related--Bug: #1517839
2017-06-14 13:47:29 +08:00
Jenkins
c2687c6223 Merge "Prevent tests' unmocked access to utils.execute()" 2017-05-15 04:25:45 +00:00
Julian Edwards
f57cbccf8b Prevent tests' unmocked access to utils.execute()
This change introduces a new base test class that mocks out
utils.execute and forces an exception if it gets called.
This has rooted out many tests that were doing this as a side effect of
calling other functions, doing things like modprobe and running iscsi
on the host's actual machine.

The tests are all now appropriately patched in places where this was
happening, and the new base class permanently prevents this from
accidentally happening again.

If you really want to call utils.execute() then you need to re-mock it
in your unit test.

Change-Id: Idf87d09a9c01a6bfe2767f8becabe65c02983518
2017-05-15 10:48:43 +10:00
Javier Pena
32ed01448b Set valid inspection_dhcp_wait_timeout value in tests
inspection_dhcp_wait_timeout is defined as IntOpt, but its value was
set in tests to 0.01. This was ok until oslo.config started enforcing
types in [1], after that unit tests fail for test_timeout.

Fixing by setting the value to 1, and mocking time-related functions
to avoid a longer wait.

[1] https://review.openstack.org/328692

Change-Id: I732c4aa3d1760c3159d9672e3fae81f8bd72497c
2017-04-25 12:12:49 +02:00
Jenkins
fd7f10b993 Merge "Configure and use SSL-related requests options" 2017-02-07 09:57:49 +00:00
Pavlo Shchelokovskyy
fdd11b54a5 Configure and use SSL-related requests options
This patch adds standard SSL options to IPA config and makes use of them
when making HTTP requests.

For now, a single set of certificates is used when needed.
In the future configuration can be expanded to allow per-service
certificates.

Besides, the 'insecure' option (defaults to False) can be overridden
through kernel command line parameter 'ipa-insecure'.
This will allow running IPA in CI-like environments with self-signed SSL
certificates.

Change-Id: I259d9b3caa9ba1dc3d7382f375b8e086a5348d80
Closes-Bug: #1642515
2017-01-13 11:33:44 +02:00
Dmitry Tantsur
10bff0a518 Remove compatibility with old bash-based introspection ramdisk
Inspector is using inventory directly, so we can remove the bits sending
processed network and scheduling properties.

Change-Id: I6c58bc3c5ea78fd2dbda82b38515f332ce2e8d4a
2017-01-09 14:10:47 +01:00
Gábor Antal
4facf2c385 Changed an assert to more specific assert method
Following OpenStack Style Guidelines [1], I changed:
assertFalse(sth in sth) to assertNotIn(sth, sth).

After this change, a more specific message is shown on error.

[1]: http://docs.openstack.org/developer/hacking/#unit-tests-and-assertraises

Change-Id: I5d47d775dcff194693d97db6b797b7b027cbab56
2016-08-29 15:40:44 +02:00
Lucas Alvares Gomes
af81914ce7 Add a log extension
The log extension is responsible for retrieving logs from the system,
if journalctl is present the logs will come from it, otherwise we
fallback to getting the logs from the /var/log directory + dmesg logs.

In the coreos ramdisk, we need to bind mount /run/log in the container
so the IPA service can have access to the journal.

For the tinyIPA ramdisk, the logs from IPA are now being redirected to
/var/logs/ironic-python-agent.log instead of only going to the default
stdout.

Inspector now shares the same method of collecting logs, extending its
capabilities for non-systemd systems.

Partial-Bug: #1587143
Change-Id: Ie507e2e5c58cffa255bbfb2fa5ffb95cb98ed8c4
2016-06-28 17:02:11 +01:00
Szymon Borkowski
f7e080c8bf Add PCI devices collector to inspector
Adds a new collector, which gathers list of PCI devices.
Each entry is a dictionary containing 2 keys:
- vendor-id
- product-id
Such information can then be used by the inspector to distinguish
appropriate PCI devices.

Change-Id: Id7521d66410e7d408d7eada692b6123e769ce084
Partial-Bug: #1580893
2016-06-24 14:50:58 +02:00
Moshe Levi
1ef8c32de0 Replace assertRaisesRegexp with assertRaisesRegex
This patch replace assertRaisesRegexp with
assertRaisesRegex which is deprecated in python3
https://docs.python.org/3.2/library/unittest.html
Also it update the base tests to be oslotest
BaseTestCase for python2.7 and python3 compatibility

Change-Id: I02571946f0643247e208d98dc91ea78cd9d351ee
2016-06-22 00:47:37 +00:00
Dmitry Tantsur
53b187a4c3 Add boot information into the inventory
Adds a new BootInfo object with 2 fields:

* current_boot_mode - bios or uefi, detected from presence of /sys/firmware/efi
  as per the following answer: http://askubuntu.com/a/162896
  This field will be used for setting the boot_mode capability in ironic-inspector
* pxe_interface - PXE booting interface, if it can be detected.
  This fields is already used by ironic-inspector, added here for consistency.

Change-Id: Ib36b592ffaba3bfa055d65c9526607867d302584
Partial-Bug: #1571580
2016-05-26 17:05:11 +02:00
Dmitry Tantsur
6da6ace384 [inspection] wait for the PXE DHCP by default and remove the carrier check
We hoped that checking /sys/class/net/XXX/carrier will allow us
to not wait for interfaces that are not connected at all.
In reality this field turned out to be unreliable. For example, it is
also set to 0 when interface is down or is being configured.
The bug https://bugzilla.redhat.com/show_bug.cgi?id=1327255 shows
the case when carrier is 0 for all interfaces, including one that is
used to post back data, which is obvious non-sense.

This change removes check on carrier for the loop. To avoid 60 seconds
wait for people with several NIC's, it's changed to only wait for the
PXE booting NIC, which obviously must get an IP address.

This makes IP addresses in the inspection data for other NIC's somewhat
unreliable. A new option inspection_dhcp_all_interfaces is introduced
to allow waiting for all NIC's to get IP addresses.

This change should finally fix bug 1564954.

Change-Id: I8b04bf726980fdcf6bd536c6bb28e30ac50658fb
Related-Bug: #1564954
2016-05-10 18:12:46 +02:00
Jenkins
2d8e139f03 Merge "Set modification time in tarfile of ramdisk logs" 2016-04-08 12:41:28 +00:00
Dmitry Tantsur
3deb25a3ce Wait for the interfaces to get IP addresses before inspection
In the DIB build the DHCP code (provided by the dhcp-all-interfaces element)
races with the service starting IPA. It does not matter for deployment itself,
as we're waiting for the route to the Ironic API to appear. However, for
inspection it may result in reporting back all NIC's without IP addresses.
Inspection fails in this case.

This change makes inspection wait for *all* NIC's to get their IP addresses up
to a small timeout. The timeout is 60 seconds by default and can be changed
via the new ipa-inspection-dhcp-wait-timeout kernel option (0 to not wait).

After the wait inspection proceedes in any case, so the worst downside
is making inspection 60 seconds longer.

To avoid waiting for NIC's that are not even connected, this change extends the
NetworkInterface class with 'has_carrier' field.

Closes-Bug: #1564954
Change-Id: I5bf14de4c1c622f4bf6e3eadbe20c44759da5d66
2016-04-05 20:03:33 +02:00
Miles Gould
3f715a20fd Set modification time in tarfile of ramdisk logs
If we do not set this explicitly, tar will warn "journal: implausibly
old time stamp" when the user tries to untar the log files.

Change-Id: I4a5a1ffd4eeca9697cdcf16e02d3ff3c22d7132c
2016-04-04 17:29:16 +01:00
Jenkins
c3e9aca2c0 Merge " make enforce_type=True in CONF.set_override" 2016-03-21 17:25:46 +00:00
Dmitry Tantsur
58f86d0353 Stop trying to log stdout when fetching logs during inspection
Logging the whole journalctl output is not the best idea. Fortunately,
it does not work right now and fails with a traceback :)

This change adds a new log_stdout argument to utils.execute() and uses it in
the "logs" inspection collector.

Also do not log the logs while logging the collected data.

Change-Id: Ibc726ac2c4f5eb06c73ac4765bb400077b84a6cc
2016-03-08 16:31:18 +01:00
LiuNanke
b563196a37 make enforce_type=True in CONF.set_override
Method CONF.set_override to change config option's
value with designated value in unit test, but never check if the
designated vaule is valid. Each config option has a type like strOpt,
BoolOpt, etc. StrOpt with parameter choices only allows values in set
of choices. In short word, each config option has limitation for type
and value. In production code, oslo.conf can ensure user's input is
valid, but in unit test, test methods can pass if we use method
CONF.set_override without parameter enforce_type=True even we pass wrong
type or wrong value to config option. This commit makes sure calling
method CONF.set_override with enforce_type=True.
This commit also fixes violations.

Note: We can't set enforce_type=True by default in oslo.config now, it
may break all project's unit test. We can switch enforce_type=True by
default when all project fix violations like this commit.

Change-Id: Iba3e7fca01fc91e4396e698fc00cad35ba8f3543
Related-Bug: #1517839
2016-01-12 20:57:31 +08:00
Dmitry Tantsur
5fa258b708 Fix "logs" inspection collector when logs contain non-ascii symbols
Somehow it didn't pop earlier. Updated tests to contain some creepy
russian letters :)

Closes-Bug: #1517913
Change-Id: I4c6712ea1e813d1f0f0d0aedaccfa1187526e0ec
2015-12-08 14:32:16 +01:00
Dmitry Tantsur
9d6b0864e3 Add "logs" and "extra-hardware" inspection collectors
This is a port of downstream inspector ramdisk plugins we found helpful.
* logs - sends journald logs with inspection data.
* extra-hardware - uses hardware-detect utility to collect bigger
  hardware inventory and to run benchmarks.

Change-Id: If05402606c45185d618279eef46e68c51209f82b
2015-10-01 18:25:30 +02:00
Dmitry Tantsur
3b70647358 inspection: prepare for future deprecations
1. cleanly separate deprecated and non-deprecated properties
2. add root disk to inspection data, so that we can have a proper
   fallback when root device hints are not given.

Change-Id: Ie19b82ff2a914873ff4b2395b02643e086b934b1
2015-09-16 14:26:57 +02:00
Dmitry Tantsur
e3e6000524 Follow-up to inspection patch 096830414b
Change-Id: I7ec05e501ec40802efa14cabe14752972919c7a9
2015-09-16 10:36:33 +00:00
Dmitry Tantsur
096830414b Add support for inspection using ironic-inspector
Adds a new module ironic_python_agent.inspector and new entry point
for extensions, which will allow vendor-specific inspection.

Inspection is run on service start up just before the lookup.
Due to this early start, and due to the fact we don't even know
MAC address of nodes on inspection (to say nothing about IP addresses),
exception handling is a bit different from other agent features:
we try hard not to error out until we send at least something to inspector.

Change-Id: I00932463d41819fd0a050782e2c88eddf6fc08c6
2015-09-07 18:22:54 +02:00