We have made many improvements to connection handling in Zuul.
Bring those back to Nodepool by copying over the zuul/zk directory
which has our base ZK connection classes.
This will enable us to bring other Zuul classes over, such as the
component registry.
The existing connection-related code is removed and the remaining
model-style code is moved to nodepool.zk.zookeeper. Almost every
file imported the model as nodepool.zk, so import adjustments are
made to compensate while keeping the code more or less as-is.
Change-Id: I9f793d7bbad573cb881dfcfdf11e3013e0f8e4a3
We use glean in the gate tests to enable the network. Because we are
in a DHCP environment with the devstack environment, it's possible
that glean didn't start but we still manage to get an address anyway
via autoconfiguration. But the point of having glean is that in
clouds without DHCP things work too. This is most common with new
distributions, where glean matching may not have updated to configure
things fully.
We could replicate a non-DHCP environment, at some considerable
initial cost of figuring out how to setup devstack to do that and then
ongoing costs of having a fairly bespoke environment. A simpler path
is to just check the glean services and make sure they started
correctly.
Change-Id: Idb8d67cd2ba6e8c5d7157177e9cfd6be7b99cacd
As described in the dependent change, the testing done here is better
done by the quickstart jobs these days. The dependent change has
removed the tox environment this calls in Zuul. This removes the job
definiton and related files from nodepool.
Change-Id: I17e1002012e9ac6abc434454af989f1da1c379b7
Depends-On: https://review.opendev.org/c/zuul/zuul/+/826772
As noted inline, check the kernel flags on booted images to increase
confidence the bootloader is making generic images.
Change-Id: Ic15487f61a8d5f4c0c8f1941815d9649ed730add
After updating images to bullseye
(I21cfbd3935e48be4b92591ea36c7eed301230753) we can use the native
podman packages. These are slightly older, but should be fine for the
intended usages.
Change-Id: Ica62392ebf4a665a04cd65458dda9e0a7545ccc8
Similar to Zuul (I71182e9d3e6e930977a9f983b37743ee3300ec91), the base
images have updated to Bullseye.
This updates various things to get a building Bullseye image.
We have upgrade to 3.9-based images here because OpenDev builds ARM64
wheels for a bullseye+arm64 combo, which we use to speed up the ARM64
cross-build (we do not have any repository of <3.7|3.8>+bullseye ARM64
wheels, so it makes it difficult to use these combos as the
cross-build can take a very long time)
Depends-On: https://review.opendev.org/c/openstack/diskimage-builder/+/806318
Change-Id: I21cfbd3935e48be4b92591ea36c7eed301230753
This installs podman inside the nodepool container, which is used by
the dependent change in DIB to extract initial chroot environments
from upstream containers. This eliminates the need to run non-native
tools on build hosts (rpm/zypper on Ubuntu, etc.).
As noted in the config, podman defaults to assuming systemd is
installed and using various systemd interfaces.
Additionally, we map the a volume into the container which allows
nested podman to do what it needs to do.
Needed-By: https://review.opendev.org/700083
Change-Id: I6722aa2b32db57e099dae4417955a8a2cd28847e
This adds a simple load testing script that basically maintains a
stable number of pending node requests. Combined with max-ready age
this can be easily used for load testing instance creations and
deletions.
Change-Id: I2f754e88fdc541914f929511c713a43eb910a344
This is intended as an aid for developers (since we have moved the
ZK setup which was in test-setup.sh to a playbook for tox jobs).
Change-Id: I9ca03831a74928ec6875c5f6668cfcfcdedb37fd
Require TLS Zookeeper connections before making the 4.0 release.
Change-Id: I69acdcec0deddfdd191f094f13627ec1618142af
Depends-On: https://review.opendev.org/776696
We had been using the linaro mirror to get arm64 python wheels.
Unfortunately this mirror has been having some reliability issues. The
wheels themselves are served from all our mirrors which means we can
switch to rax.dfw which should be more reliable.
Change-Id: I3da953fe49d8e4600d4835224a33ab558af88a06
We updated python-base and python-builder to include arm64 images in
support of nodepool's arm64 python-builder image. In doing so we have
discovered a number of issues, but the biggest is slowness of building
python packages in an emulated environment.
In order to speed up package builds we consume the OpenDev linaro
cloud arm64 wheel cache. This doesn't have wheels for every package we
need, but for the things that it does have it will speed up our builds.
One of the risks with this setup is that we're relying on wheels built
for openstack on arm64 and those follow openstack's contraints. In order
to mitigate this risk we set pip install's --prefer-binary flag in the
pip.conf. This means that if openstack's constraints lag what is
availale on pypi we should use the existing wheels as long as they are
valid version according to requirements rather than trying to build from
sdist.
Co-Authored-By: James E. Blair <corvus@inaugust.com>
Co-Authored-By: Ian Wienand <iwienand@redhat.com>
Change-Id: I3b358721eebbceafc12daf9d706306634048b196
We see timeouts trying to get this key fairly frequently in the gate.
Store it locally and use that in the container build.
Change-Id: Ifd706849f1fad88c8ec4afc79090df4afb88abb4
This code was already reverted in the zuul images, it doesn't
actually provide the value is claims to add and it breaks the
running under podman.
Revert "Dockerfile: add support for arbritary uid"
This reverts commit da2701e0b1.
Revert "Dockerfile: add user to shadow file too"
This reverts commit 747e957263.
Change-Id: Iff606c65c6a3223f13d963d90455fa895193cce8
There is a lot of logic in the check.sh script of the openstack
functional tests. Extract into a single location in /tools and call
it from the install and container tests.
Change-Id: Ib5728f5cee917c73d0da276d36da5776dee279fc
Without an entry in the shadow file, this user can't use sudo with the
following error:
account validation failure, is your account locked
(which I include here for future googling because it's pretty obscure,
you have to have this odd situation, or a pretty broken PAM to see it).
The "nodepool" user (10001) is in the root group, which is why the
uid_entrypoint script can update the /etc/passwd file. We need to
change the ownership of the /etc/shadow file for this to work. It
feels a bit weird, but there's no password to actually guess anyway.
Change-Id: I8846757edffe31f96df58999d05727910c9fca43
This change allows you to specify a dib-cmd parameter for disk images,
which overrides the default call to "disk-image-create". This allows
you to essentially decide the disk-image-create binary to be called
for each disk image configured.
It is inspired by a couple of things:
The "--fake" argument to nodepool-builder has always been a bit of a
wart; a case of testing-only functionality leaking across into the
production code. It would be clearer if the tests used exposed
methods to configure themselves to use the fake builder.
Because disk-image-create is called from the $PATH, it makes it more
difficult to use nodepool from a virtualenv. You can not just run
"nodepool-builder"; you have to ". activate" the virtualenv before
running the daemon so that the path is set to find the virtualenv
disk-image-create.
In addressing activation issues by automatically choosing the
in-virtualenv binary in Ie0e24fa67b948a294aa46f8164b077c8670b4025, it
was pointed out that others are already using wrappers in various ways
where preferring the co-installed virtualenv version would break.
With this, such users can ensure they call the "disk-image-create"
binary they want. We can then make a change to prefer the
co-installed version without fear of breaking.
In theory, there's no reason why a totally separate
"/custom/venv/bin/disk-image-create" would not be valid if you
required a customised dib for some reason for just one image. This is
not currently possible, even modulo PATH hacks, etc., all images will
use the same binary to build. It is for this flexibility I think this
is best at the diskimage level, rather than as, say a global setting
for the whole builder instance.
Thus add a dib-cmd option for diskimages. In the testing case, this
points to the fake-image-create script, and the --fake command-line
option and related bits are removed.
It should have no backwards compatibility effects; documentation and a
release note is added.
Change-Id: I6677e11823df72f8c69973c83039a987b67eb2af
Nodepool container fails with error message:
whoami: extra operand ‘/dev/null’
Try 'whoami --help' for more information.
Change-Id: I7ef5b6527eb08d00b9b27e37b5d5b5dce69bb4ef
Infra has a mirror for Debian Buster now, add boot tests
Depends-On: https://review.openstack.org/649496
Change-Id: Ib1567b2576631c078fe11d0f250aeb4e6f9fa0b3
We currently only need to setup the zNode caches in the
launcher. Within the commandline client and the builders this is just
unneccessary work.
Change-Id: I03aa2a11b75cab3932e4b45c5e964811a7e0b3d4
Replace Fedora 28 with Fedora 29 functional testing.
Note this changes our Red Hat platforms to use NetworkManager for
interface configuration, rather than legacy scripts. Fedora 29 has
split the legacy scripts into a new package and it marked for future
removal. NetworkManager is the default on Centos 7 and will also be
on Centos 8, so it makes sense to use it there too.
Depends-On: https://review.openstack.org/619120
Change-Id: I640838c68a05f3b22683c1e90279725a77678526
Running a bit behind on this transition ... s/27/28/ to update to
Fedora 28. This is the default in dib now.
Change-Id: I648ab9d9ba4bba7323c432c65f3ef056703f4303
We have been running into what appear to be zookeeper performance issues
causing tests to fail. Run the zookeeper on a tmpfs to reduce the impact
of iops to disk.
Other alternatives include using something like eatmydata to make writes
and syncs fast but unsafe.
Change-Id: Iea5e44af6844281c7f2078da57da9f13691e2642
This allows us to set parameters for server boot on various images.
This is the equivalent of the "--property" flag when using "openstack
server create". Various tools on the booted servers can then query
the config-drive metadata to get this value.
Needed-By: https://review.openstack.org/604193/
Change-Id: I99c1980f089aa2971ba728b77adfc6f4200e0b77
openSUSE Leap 15 is the latest version of openSUSE, bring an image
online to validate we can properly build it.
Depends-On: https://review.openstack.org/#/c/572424/
Change-Id: Ib0f48d9788aafd763e857c2d33784c4f75af4c17
Now that debian-stretch is working as expected, we can remove
debian-jessie.
Change-Id: If897757023772bb4549e40e7fcd048998175fb5b
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
We are seeing consistent failures where the Trusty root partition is
64MiB shorter than we expect. Unfortunately we don't currently have a
concrete explanation for this but rounding due to alignment is
suspected.
Reduce the expected size to something bigger than the images, but not
so close to 5GiB. Also add a more useful failure message; currently
you have to dig back through the logs to see where it went wrong.
Change-Id: Iba1fafdb1fe0f3c1b751772af939f079c429fcf3
If a build has a systematic failure, we currently just let nodepool
run, looping builds until we hit the overall job timeout (1.5 hours).
This adds a count of output log files; if we see three failed builds,
then assume the problem won't get better and fail early.
Change-Id: Id7e163b4937dd57cc8afbf72ed795f73b46a05b1
It seems that 2MiB of fudge isn't quite enough; Trusty has been seen
to round it's root partition size down and we miss it. Increase the
fudge factor.
Change-Id: I26e3bc7b5f68ea6642b8b57119fbd286688d593e
Test that we see the root partition grow.
Increase the root disk size to 5gb, and check that the booted vm has
grown the disk to at least that. Add disk size tracking so we can
more clearly see what's being built into the images.
Change-Id: I377beffc4896e03f0c2d01c0061c5f8652e8b1d1
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Increase our test coverage for debian-stretch, as this is the latesst
stable version of debian.
Change-Id: I05cbfe9735eb0b3900203fbd423f68483b1cbf5d
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Several fixes to this job:
We need to run bindep twice - once for nodepool, once for zuul. Add
invocation of bindep role and also copy install-distro-packages so that
the job works, next step is remove the install-distro-packages from
zuul.
Add a post-run to copy the nodepool logs, so you can diagnose what's
going wrong if the jobs fails
Fix up a configuration issue, it tries to write build-logs to
/var/log/nodepool which it doesn't own, redirect to the temp area.
Add it as a non-voting check job
Depends-On: https://review.openstack.org/545163
Change-Id: I12db55d3e4c7a71b9af56567858df0a620ee3b73