At current Zuul job in zuul.d/ironic-jobs.yaml, items of
required-project are like this (without leading hostname)
required-projects:
- openstack/ironic
- openstack/ABCD
but not like this (with leading hostname)
required-projects:
- opendev.org/openstack/ironic
- opendev.org/openstack/ABCD
With first format, if we have two openstack/ironic entries in
Zuul's tenant configuration file (Zuul tenant config file in 3rd
party CI environment usually has 2 entries: one to fetch upstream
code, another for Gerrit event stream to trigger Zuul job), we'll
have warning in zuul-scheduler's log
Project name 'openstack/ironic' is ambiguous,
please fully qualify the project with a hostname
With second format, that warning doesn't appear. And Zuul running at
3rd party CI environment can reuse Zuul jobs in zuul.d/ironic-jobs.yaml
in their Zuul jobs.
This commit modifies all Zuul jobs in zuul.d/ironic-jobs.yaml
to use second format.
Story: 2008724
Task: 42068
Change-Id: I85adf3c8b3deaf0d1b2d58dcd82724c7e412e2db
As discussed during the upstream ironic community meeting on
Monday Dec 14 2020, the lower-constraints job is being removed.
Change-Id: I116d99014a7bf77ca77b796ea3b759800dd808ce
The non-base job is designed for the integrated gate and may have
unnecessary side effects. It has recently overriding the OVS agent
bridge settings, breaking our job.
Make the job voting again.
Change-Id: Ied8cafd32c3e634d498467ebe878a411f0b24e6d
* move pep8 dependencies from test-requirements to tox.ini,
they're not needed there and are hard to constraint properly.
* add oslo.cache to l-c to avoid bump of dependencies
Change-Id: Ia5330f3d5778ee62811da081c28a16965e512b55
All the tox jobs are based on openstack-tox, we should convert
ironic-tox-unit-with-driver-libs too.
Change-Id: I20836d586edccfb8cd8fed1f3a89f1497ff96943
The standalone job at present has a high chance of failure
due to two separate things occuring:
1) The deployed nodes from raid tests can be left in a dirty state
as the raid configuration remains and is chosen as the root
device for the next deployment. IF this is chosen by any job,
such as rescue or a deployment test that attempts to login,
then the job fails with unable to ssh. The fix for this is
in the ironic-tempest-plugin but we need to get other fixes
into stablilize the gate first.
https://review.opendev.org/#/c/757141/
2) Long running scenarios run in cleaning such as deployment with
RAID in the standalone suite can encounter conditions where
the conductor tries to send the next command along before the
present configuration command has completed. An example is
downloading the image is still running, while a heartbeat
has occured in the background and the conductor then seeks
to perform a second action. This then causes the entire
deployment to fail, even though it was transitory.
This should be a relatively easy fix.
https://review.opendev.org/759906
Change-Id: I6b02be0fa353daac90abf2b1576800c0710f651e
We're seeing cases where cleaning barely manages to finish after
a 2nd PXE retry, failing a job.
Also make the PXE retry timeout consistent between the CI and
local devstack installations.
Change-Id: I6dc7a91d1a482008cf4ec855a60a95ec0a1abe28
As per victoria cycle testing runtime and community goal
we need to migrate upstream CI/CD to Ubuntu Focal(20.04).
keeping few jobs running on bionic nodeset till
https://storyboard.openstack.org/#!/story/2008185 is fixed
otherwise base devstack jobs switching to Focal will block
the gate.
Change-Id: I1106c5c2b400e7db899959550eb1dc92577b319d
Story: #2007865
Task: #40188
It was supposed to be made voting shortly after the split, but we
sort of forgot. It provides coverage for things (like ansible deploy)
that we used to have voting jobs for.
Change-Id: Id99586d5e01b940089d55c133d9181db05bfdc7e
This change marks the iscsi deploy interface as deprecated and
stops enabling it by default.
An online data migration is provided for iscsi->direct, provided that:
1) the direct deploy is enabled,
2) image_download_source!=swift.
The CI coverage for iscsi deploy is left only on standalone jobs.
Story: #2008114
Task: #40830
Change-Id: I4a66401b24c49c705861e0745867b7fc706a7509
The minimum amount of disk space on CI test nodes
may be approximately 60GB on /opt with now only 1GB
of available swap space by default.
This means we're constrained on the number of VMs and
their disk storage capacity in some cases.
Change-Id: Ia6dac22081c92bbccc803f233dd53740f6b48abb
Infra's disk/swap availability has been apparently
reduced with the new focal node sets such that we
have ~60GB of disk space and only 1GB of swap.
If we configure more swap, then naturally that means
we take away from available VMs as well.
And as such, we should be able to complete grenade
with only four instances, I hope.
Change-Id: I36f8fc8130ed914e8a2c2a11c9679144d931ad73
Currently ironic-base defaults to 2 and our tests try to introspect
all of them. This puts unnecessary strain on the CI systems, return
the number back to 1.
Change-Id: I820bba1347954b659fd7469ed542f98ef0a6eaf0
As part of the plan to deprecate the iSCSI deploy interface, changing
this option to a value that will work out-of-box for more deployments.
The standalone CI jobs are switched to http as well, the rest of jobs
are left with swift. The explicit indirect jobs are removed.
Change-Id: Idc56a70478dfe65e9b936006a5355d6b96e536e1
Story: #2008114
Task: #40831
Removes the deprecated support for token-less agents which
better secures the ironic-python-agent<->ironic interactions
to help ensure heartbeat operations are coming from the same
node which originally checked-in with the Ironic and that
commands coming to an agent are originating from the same
ironic deployment which the agent checked-in with to begin
with.
Story: 2007025
Task: 40814
Change-Id: Id7a3f402285c654bc4665dcd45bd0730128bf9b0
Tinyipa is not that tiny anymore and we need to increase the base
memory for VMs in jobs that use it.
Change-Id: Ibd7e87c0b5676eef94512285edaca416635a29ef
Sets the settings to enable the ramdisk iso booting tests
including a bootable ISO image that should boot the machine.
NB: The first depends-on is only for temporary testing of another
which changes the substrate ramdisk interface. Since this change pulls
in tempest testing for iso ramdisk and uses it, might as well
use it to test if the change works or not as the other two patches
below are known to be in a good state.
Change-Id: I5d4213b0ba4f7884fb542e7d6680f95fc94e112e
The kernel for the UEFI PXE job seems to download
without issue, however the required ramdisk does not
seem to be making it.
As such, changing the job to use TinyCore to see if the smaller
helps resolve these issues.
Change-Id: Ie248de2269a63a41b634f7205468edabccc53738
The default dhcp client in tinycore does not automatically trigger
IPv6 address acquisition.
This is a problem when the random spread of nodes and devstack
cause tinycore to get pulled in for the v6 job.
Change-Id: I635a69dfd7450a218474ccb7cecf1c9e29c0a43c
Our ramdisks have swelled, and are taking anywhere from 500-700
seconds to even reach the point where IPA is starting up.
This means, that a 900 second build timeout is cutting it close
and intermittent performance degredation in CI means that a job
may fail simply because it is colliding with the timeout.
One example I deconstruted today where a 900 second timout was
in effect:
* 08:21:41 Tempest job startes
* 08:21:46 Nova instance requested
* Compute service requests ironic to do the thing.
* Ironic downloads IPA and stages it - ~20-30 seconds
* VM boots and loads ipxe ~30 seconds.
* 08:23:22 - ipxe downloads kernel/ramdisk (time should be completion
unless apache has changed logging behavior for requests.)
* 08:26:28 - Kernel at 120 second marker and done decompressing
the ramdisk.
* ~08:34:30 - Kernel itself hit the six hundred second runtime
marker and hasn't even started IPA.
* 08:35:02 - Ironic declars the deploy failed due to wait timeout.
([conductor]deploy_callback_timeout hit at 700 seconds.)
* 08:35:32 - Nova fails the build saying it can't be scheduled.
(Note, I started adding times to figure out the window to myself, so
they are incomplete above.)
The time we can account for in the job is about 14 minutes or 840
seconds. As such, our existing defaults are just not enough to handle
the ramdisk size AND variance in cloud performance.
Change-Id: I4f9db300e792980059c401fce4c37a68c438d7c0
After the recent changes we're running 5 tests already, some of them
using several VMs. This should cover scheduling to different conductors
well enough, the nova test just adds random failures on top.
This allows reducing the number of test VMs to 3 per testing node
(6 totally), reducing the resource pressure and allowing giving
each VM a bit more RAM.
Also adding missing VM_SPECS_DISK to the subnode configuration.
Change-Id: Idde2891b2f15190f327e4298131a6069c58163c0
Since we merged the change to have partition and wholedisk
testing on basic_ops most of the jobs started requiring 2 VMs
to run the tempest tets.
Let's increase on the ironic-base so all jobs will be default to 2.
Removing IRONIC_VM_COUNT=2 from jobs that uses ironic-base as parent.
Change-Id: I13da6275c04ffc6237a7f2edf25c03b4ddee936a