CI is memory intensive, and we realistically don't need 2 or
more API workers running for every single WSGI process which
does not implement it's own specific override value.
This should reduce the memory footprint by an average of six processes
which consume 60-90 MB each.
Change-Id: Ia0a986152c2b9fc9c5ff54cf698a351db452fbbd
Changes a neutron call to be project scoped as system
scoped can't create a resource and, and removes the unset
which no longer makes sense now that
I86ffa9cd52454f1c1c72d29b3a0e0caa3e44b829
has merged removing the legacy vars from devstack.
Also renames intenral use setting of OS_CLOUD to IRONIC_OS_CLOUD
as some services were still working with system scope or some sort
of mixed state occuring previously as some of the environment variables
were present still, however they have been removed from devstack.
This change *does* explicitly set an OS_CLOUD variable as well on
the base ironic job. This is because things like grenade for Xena
will expect the variable to be present.
Depends-On: https://review.opendev.org/c/openstack/devstack/+/818449
Change-Id: I912527d7396a9c6d8ee7e90f0c3fd84461d443c1
Change the default boot mode to UEFI, as discussed during the end
of the Wallaby release cycle and previously agreed a very long time
ago by the Ironic community.
Change-Id: I6d735604d56d1687f42d0573a2eed765cbb08aec
Neutron's firewall initialization with OVS seems
to be the source of our pain with ports not being found
by ironic jobs. This is because firewall startup errors
crashes out the agent with a RuntimeError while it is deep
in it's initial __init__ sequence.
This ultimately seems to be rooted with communication
with OVS itself, but perhaps the easiest solution is
to just disable the firewall....
Related: https://bugs.launchpad.net/neutron/+bug/1944201
Change-Id: I303989a825a7e35f1cb7b401134fd63553f6791c
Observed an OOM incident causing
ironic-tempest-ipa-partition-pxe_ipmitool to fail.
One vm started, the other seemed to try to start twice, but both times
stopped shortly into the run and the base OS had recorded in it an OOM
failure.
It appears the actual QEMU memory footprint being consumed when
configured at 3GB is upwards of 4GB, which obviously is too big to
fit in our 8GB VM instance.
Dialing back slightly, in hopes it stabilizes the job.
Change-Id: Id8cef722ed305e96d89b9960a8f60f751f900221
This is part of the work to add jobs which confirm ironic works with
FIPS enabled, but this change is also appropriate non-FIPS jobs.
Change-Id: I4af4e811104088d28d7be6df53c26e72db039e08
The devstack default limit enforcement for glance defaults
to 1GB, and unfortunately this is too small for many to use
larger images such as centos which includes hardware firmware
images for execution on baremetal where drivers need the vendor
blobs in order to load/run.
Sets ironic-base to 5GB, and updates examples accordingly.
Depends-On: https://review.opendev.org/c/openstack/devstack/+/801309
Change-Id: I41294eb571d07a270a69e5b816cdbad530749a94
Adds support to the ironic devstack plugin to configure
ironic to be used in a scope-enforcing mode in line with
the Secure RBAC effort. This change also defines two new
integration jobs *and* changes one of the existing
integration.
In these cases, we're testing functional crub interactions,
integration with nova, and integration with ironic-inspector.
As other services come online with their plugins and
devstack code being able to set the appropriate scope
enforcement configuration, we will be able to change the
overall operating default for all of ironic's jobs and
exclude the differences.
This effort identified issues in ironic-tempest-plugin,
tempest, devstack, and required plugin support in
ironic-inspector as well, and is ultimately required
to ensure we do not break the Secure RBAC.
Luckilly, it all works.
Change-Id: Ic40e47cb11a6b6e9915efcb12e7912861f25cae7
At current Zuul job in zuul.d/ironic-jobs.yaml, items of
required-project are like this (without leading hostname)
required-projects:
- openstack/ironic
- openstack/ABCD
but not like this (with leading hostname)
required-projects:
- opendev.org/openstack/ironic
- opendev.org/openstack/ABCD
With first format, if we have two openstack/ironic entries in
Zuul's tenant configuration file (Zuul tenant config file in 3rd
party CI environment usually has 2 entries: one to fetch upstream
code, another for Gerrit event stream to trigger Zuul job), we'll
have warning in zuul-scheduler's log
Project name 'openstack/ironic' is ambiguous,
please fully qualify the project with a hostname
With second format, that warning doesn't appear. And Zuul running at
3rd party CI environment can reuse Zuul jobs in zuul.d/ironic-jobs.yaml
in their Zuul jobs.
This commit modifies all Zuul jobs in zuul.d/ironic-jobs.yaml
to use second format.
Story: 2008724
Task: 42068
Change-Id: I85adf3c8b3deaf0d1b2d58dcd82724c7e412e2db
A recent magnum bug (https://storyboard.openstack.org/#!/story/2008494)
when running with uwsgi has yielded an interesting question if Ironic
is orphaning rpc clients or not. Ironic's code is slightly different
but also very similar, and similar bugs have been observed in the past
where the python garbage collection never cleans up the old connection
objects and effectively orphans the connection.
So we should likely at least try to collect some of the information
so we can determine if this is the case in our CI jobs. Hence this
change and attempt to collect that data after running CI.
Change-Id: I4e80c56449c0b3d07b160ae6c933a7a16c63c5c5
The non-base job is designed for the integrated gate and may have
unnecessary side effects. It has recently overriding the OVS agent
bridge settings, breaking our job.
Make the job voting again.
Change-Id: Ied8cafd32c3e634d498467ebe878a411f0b24e6d
All the tox jobs are based on openstack-tox, we should convert
ironic-tox-unit-with-driver-libs too.
Change-Id: I20836d586edccfb8cd8fed1f3a89f1497ff96943
We're seeing cases where cleaning barely manages to finish after
a 2nd PXE retry, failing a job.
Also make the PXE retry timeout consistent between the CI and
local devstack installations.
Change-Id: I6dc7a91d1a482008cf4ec855a60a95ec0a1abe28
As per victoria cycle testing runtime and community goal
we need to migrate upstream CI/CD to Ubuntu Focal(20.04).
keeping few jobs running on bionic nodeset till
https://storyboard.openstack.org/#!/story/2008185 is fixed
otherwise base devstack jobs switching to Focal will block
the gate.
Change-Id: I1106c5c2b400e7db899959550eb1dc92577b319d
Story: #2007865
Task: #40188
This change marks the iscsi deploy interface as deprecated and
stops enabling it by default.
An online data migration is provided for iscsi->direct, provided that:
1) the direct deploy is enabled,
2) image_download_source!=swift.
The CI coverage for iscsi deploy is left only on standalone jobs.
Story: #2008114
Task: #40830
Change-Id: I4a66401b24c49c705861e0745867b7fc706a7509
The minimum amount of disk space on CI test nodes
may be approximately 60GB on /opt with now only 1GB
of available swap space by default.
This means we're constrained on the number of VMs and
their disk storage capacity in some cases.
Change-Id: Ia6dac22081c92bbccc803f233dd53740f6b48abb
Infra's disk/swap availability has been apparently
reduced with the new focal node sets such that we
have ~60GB of disk space and only 1GB of swap.
If we configure more swap, then naturally that means
we take away from available VMs as well.
And as such, we should be able to complete grenade
with only four instances, I hope.
Change-Id: I36f8fc8130ed914e8a2c2a11c9679144d931ad73
Currently ironic-base defaults to 2 and our tests try to introspect
all of them. This puts unnecessary strain on the CI systems, return
the number back to 1.
Change-Id: I820bba1347954b659fd7469ed542f98ef0a6eaf0
As part of the plan to deprecate the iSCSI deploy interface, changing
this option to a value that will work out-of-box for more deployments.
The standalone CI jobs are switched to http as well, the rest of jobs
are left with swift. The explicit indirect jobs are removed.
Change-Id: Idc56a70478dfe65e9b936006a5355d6b96e536e1
Story: #2008114
Task: #40831
Removes the deprecated support for token-less agents which
better secures the ironic-python-agent<->ironic interactions
to help ensure heartbeat operations are coming from the same
node which originally checked-in with the Ironic and that
commands coming to an agent are originating from the same
ironic deployment which the agent checked-in with to begin
with.
Story: 2007025
Task: 40814
Change-Id: Id7a3f402285c654bc4665dcd45bd0730128bf9b0
Tinyipa is not that tiny anymore and we need to increase the base
memory for VMs in jobs that use it.
Change-Id: Ibd7e87c0b5676eef94512285edaca416635a29ef
Sets the settings to enable the ramdisk iso booting tests
including a bootable ISO image that should boot the machine.
NB: The first depends-on is only for temporary testing of another
which changes the substrate ramdisk interface. Since this change pulls
in tempest testing for iso ramdisk and uses it, might as well
use it to test if the change works or not as the other two patches
below are known to be in a good state.
Change-Id: I5d4213b0ba4f7884fb542e7d6680f95fc94e112e
The kernel for the UEFI PXE job seems to download
without issue, however the required ramdisk does not
seem to be making it.
As such, changing the job to use TinyCore to see if the smaller
helps resolve these issues.
Change-Id: Ie248de2269a63a41b634f7205468edabccc53738
Our ramdisks have swelled, and are taking anywhere from 500-700
seconds to even reach the point where IPA is starting up.
This means, that a 900 second build timeout is cutting it close
and intermittent performance degredation in CI means that a job
may fail simply because it is colliding with the timeout.
One example I deconstruted today where a 900 second timout was
in effect:
* 08:21:41 Tempest job startes
* 08:21:46 Nova instance requested
* Compute service requests ironic to do the thing.
* Ironic downloads IPA and stages it - ~20-30 seconds
* VM boots and loads ipxe ~30 seconds.
* 08:23:22 - ipxe downloads kernel/ramdisk (time should be completion
unless apache has changed logging behavior for requests.)
* 08:26:28 - Kernel at 120 second marker and done decompressing
the ramdisk.
* ~08:34:30 - Kernel itself hit the six hundred second runtime
marker and hasn't even started IPA.
* 08:35:02 - Ironic declars the deploy failed due to wait timeout.
([conductor]deploy_callback_timeout hit at 700 seconds.)
* 08:35:32 - Nova fails the build saying it can't be scheduled.
(Note, I started adding times to figure out the window to myself, so
they are incomplete above.)
The time we can account for in the job is about 14 minutes or 840
seconds. As such, our existing defaults are just not enough to handle
the ramdisk size AND variance in cloud performance.
Change-Id: I4f9db300e792980059c401fce4c37a68c438d7c0
After the recent changes we're running 5 tests already, some of them
using several VMs. This should cover scheduling to different conductors
well enough, the nova test just adds random failures on top.
This allows reducing the number of test VMs to 3 per testing node
(6 totally), reducing the resource pressure and allowing giving
each VM a bit more RAM.
Also adding missing VM_SPECS_DISK to the subnode configuration.
Change-Id: Idde2891b2f15190f327e4298131a6069c58163c0
Since we merged the change to have partition and wholedisk
testing on basic_ops most of the jobs started requiring 2 VMs
to run the tempest tets.
Let's increase on the ironic-base so all jobs will be default to 2.
Removing IRONIC_VM_COUNT=2 from jobs that uses ironic-base as parent.
Change-Id: I13da6275c04ffc6237a7f2edf25c03b4ddee936a
Devstack is changing the Neutron default to OVN backend. This patch is
to make sure Ironic gate will not get broken by this change as currently
OVN doesn't support baremetal nodes.
Change-Id: I0745e07d32e3455fad2a2249c31f279fd1d38b5b
Signed-off-by: Jakub Libosvar <libosvar@redhat.com>
Alaises the old name of the cross gating job to the new name
so we can change jobs in other projects without breaking the world.
Change-Id: I9e17f48f83444b5e2cab63a2041e77e860ce6df5
In the last PTG the Neutron team discussed and decided to undeprecate
the neutron-legacy module in DevStack because that's the module being
used (almost) everywhere and it works. The lib/neutron was an attempt
to refactor the old module but, in the last few years it hasn't gained
any traction and due to the lack of features and people to work on it,
it's going to be removed from DevStack eventually.
Below is a snippet from the PTG summary email [0] about this topic:
<snippet>
In Devstack there are currently 2 modules which can configure
Neutron. Old one called "lib/neutron-legacy" and the new one called
"lib/neutron". It is like that since many cycles that "lib/neutron-legacy"
is deprecated. But it is still used everywhwere. New module isn't still
finished and isn't working fine. This is very confusing for users as
really maintained and recommended is still "lib/neutron-legacy" module.
During the discussion Sean Collins explained us that originally this
new module was created as an attempt to refactor old module, and to
make Neutron in the Devstack better to maintain. But now we see that
this process failed as new module isn't still used and we don't have
any cycles to work on it. So our final conclusion is to "undeprecate"
old "lib/neutron-legacy" and get rid of the new module.
</snippet>
This patch changes the Ironic jobs to use the old Neutron module in
DevStack.
[0]
http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015368.html
[1]
http://codesearch.openstack.org/?q=neutron-api%3A%20true&i=nope&files=&repos=
Change-Id: Ief043a0a01a800ea2d01a602000f0854df9e629f
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
Let's use the default timeout from ironic-base for all jobs
so we can avoid job timeout in our CI.
Change-Id: I5e753c4bbcb8075a1889754a468d9c3dd8310a08
A large ramdisk image tends to take an undesirable amount
of time performing the initial uncompression into memory before
the system is booted and available. This sets the number of CPU cores
by default for all jobs to 2, and only sets that back to 1 where
TinyIPA is being used.
Change-Id: I88c57a1345edb1b14c760753638ad927641b34a2
This patches update the devstack to automatically
set the new tempest configuration `boot_mode`,
it will use the value from IRONIC_BOOT_MODE variable.
Increase the number of VM's in ironic-tempest-ipa-partition-pxe_ipmitool
and ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa
to 2 since it runs cleanning and now we run two tempest tests.
Depends-On: https://review.opendev.org/735960
Change-Id: Ic6faf73430e56e2b1ff19a72b1b03f8ef34eff5f
The ovmf pacakge in bionic doesn't really work in our CI.
As a workaround we use the old package from xenial, but we can't keep
using it also in Ubuntu Focal.
This patch aims to convert the uefi jobs to use Ubuntu Focal as
base operating system and use the native ovmf package.
Story: 2007785
Task: 40025
Change-Id: I653e5da2672b14eae88c6cab923b8617432f1dc1
Adds an ability to generate network boot templates even for nodes that
use local boot via the new ``[pxe]enable_netboot_fallback`` option.
This is required to work around the situation when switching boot devices
does not work reliably.
Depends-On: https://review.opendev.org/#/c/736191/
Change-Id: Id80f2d88f9c92ff102340309a526a9b3992c6038
Story: #2007610
Task: #39600
This change achieves functional test coverage for using http_basic
auth for json-rpc requests.
Since json-rpc is aimed at standalone environments, using http_basic
instead of keystone auth for internal requests is a more realistic
test scenario.
For now, ironic-standalone-redfish is left with the inherited keystone
auth strategy.
Change-Id: I993741684eaa8f237ffb20535da7167bc589e72c
Story: 2007656
Task: 39827
The job is not required anymore since grenade is using
the zuulv3 ironic-grenade[1]
[1] https://review.opendev.org/#/c/731773/
Change-Id: I616a238bfb7864bf8752ec1475c8f611b2d28493