As noted inline, we are having problems running podman on the
production hosts (why this doesn't happen in the gate is still a
mystery...). Explicitly install uidmap package alongside podman.
Change-Id: Ic7817cf1b1279dfde5b4cf9538f5067176024b73
While iterating over zk nodes with node IDs from the from the node
cache, there can be runtime errors when the cache was updated during the
iteration process.
(``RuntimeError: dictionary changed size during iteration``)
Avoid this by iterating over a copy of the node IDs, rather than the
dict_keys directly.
Change-Id: Iecd88b4484cf48ea2127348bfb2905443eaaf49f
As described in the dependent change, the testing done here is better
done by the quickstart jobs these days. The dependent change has
removed the tox environment this calls in Zuul. This removes the job
definiton and related files from nodepool.
Change-Id: I17e1002012e9ac6abc434454af989f1da1c379b7
Depends-On: https://review.opendev.org/c/zuul/zuul/+/826772
This adds QuotaSupport to all the drivers that don't have it, and
also updates their tests so there is at least one test which exercises
the new tenant quota feature.
Since this is expected to work across all drivers/providers/etc, we
should start including at least rudimentary quota support in every
driver.
Change-Id: I891ade226ba588ecdda835b143b7897bb4425bd8
This re-enables the siblings job in nodepool but omits master
openstacksdk which is currently in an incompatible state. This
also fixes the dib gate, which uses this job as the basis for
its functional testing.
Change-Id: Id268993dd88079f54516a020555b3e2b40de8394
The OpenStackSDK project is in the process of making a breaking
API change and has stopped running this job on master. Without
an interest in adjusting Nodepool to match before the release,
it is no longer useful to run this job on this repo. Stop running
it until the sdk project is ready to re-establish co-gating.
We also set a constraint to avoid installing SDK 1.0.0 since we
expect that to include the breaking API change.
Change-Id: If9f45e24a71f349a85e94150e6a4d9ee9672173b
If a provider (or its configuration) is sufficiently broken that
the provider manager is unable to start, then the launcher will
go into a loop where it attempts to restart all providers in the
system until it succeeds. During this time, no pool managers are
running which mean all requests are ignored by this launcher.
Nodepool continuously reloads its configuration file, and in case
of an error, the expected behavior is to continue running and allow
the user to correct the configuration and retry after a short delay.
We also expect providers on a launcher to be independent of each
other so that if ones fails, the others continue working.
However since we neither exit, nor process node requests if a
provider manager fails to start, an error with one provider can
cause all providers to stop handling requests with very little
feedback to the operator.
To address this, if a provider manager fails to start, the launcher
will now behave as if the provider were absent from the config file.
It will still emit the error to the log, and it will continuously
attempt to start the provider so that if the error condition abates,
the provider will start.
If there are no providers on-line for a label, then as long as any
provider in the system is running, node requests will be handled
and declined and possibly failed while the broken provider is offilne.
If the system contains only a single provider and it is broken, then
no requests will be handled (failed), which is the current behavior,
and still likely to be the most desirable in that case.
Change-Id: If652e8911993946cee67c4dba5e6f88e55ac7099
Apparently if nodepool moves too quickly for EC2 we can get a
ClientError because the cloud doesn't know about the instance when we
get its state due to eventual consistency. Guard against this, sleep,
and retry in the existing retry loop for the instance to go running.
Story: 2009781
Change-Id: I581888a67e2401b85043c02876acd4df857e13b0
This adds a new statsd gauge which, in addition to the existing provider
limits, exports the currently configured tenant limits. This is in the
form ``nodepool.tenant_limits.TENANT.[cores,ram,instances]``.
Change-Id: I8e10a0974210d25d071dbbd63849a921fc8b79a2
A DibImageFile represents one dib image-file on disk, so the extension
is required (see prior change
I214581ad80b7740e7ca749b574672d2c33b92474 where we modified callers
who were using this interface).
This fixes a bug by removing code; the pathlib with_suffix replacement
is not safe for image names with a period in them; consider
>>> pathlib.Path('image-v1.2-foo').with_suffix('.vhd')
PosixPath('image-v1.vhd')
We can now simply unconditionally append the extension in
DibImageFile.to_path().
Change-Id: I1bc812ddffacbcc414b8f7f372d9fca78bd87292
This refactors the _buildImage function to not use DibImageFile to
construct the path to the dib output files.
DibImageFile represents one on-disk image. The dib build argument is
different -- you give it the basename in "-o" and the output types,
and it creates "basename.ext" where "ext" represents the varioius
output types it produced.
This converts _buildImage to the simple thing and makes it a bit
clearer this is a basename.
Change-Id: I214581ad80b7740e7ca749b574672d2c33b92474
The static method from_path is only used from one place and is simply
joining a path; we can inline this and remove it for clarity.
Change-Id: Iade6e024516bf9ce212491d6461e00affb5971a0
To match change I2870450ffd02f55509fcc1297d050b09deafbfb9 in Zuul.
The default domain is changed to zuul which uncovered a reference
error which is fixed.
Change-Id: I71db35252d018feed41d9e87aa702c6daa61902b
This driver supplies "static" nodes that are actually backed by
another nodepool node. The use case is to be able to request a single
large node (a "backing node") from a cloud provider, and then divide
that node up into smaller nodes that are actually used ("requested
nodes"). A backing node can support one or more requested nodes, and
backing nodes should scale up or down as necessary.
Change-Id: I29d78705a87a53ee07dce6022b81a1ce97c54f1d
The most important thing in this release is [1] which fixes bootloader
issues with CentOS and Fedora hosts (see also
Ic15487f61a8d5f4c0c8f1941815d9649ed730add for enhanced testing
relating to this in this repo).
[1] https://review.opendev.org/c/openstack/diskimage-builder/+/818851
Change-Id: I759de000bbd65ea1cc02c062186b417c000d2863
The assertEquals method has been deprecated since it was renamed
to assertEqual in Python 3.2.
https: //docs.python.org/3/library/unittest.html#deprecated-aliases
Change-Id: I306d43862eb6c7a36dad1d3a50822c2758fae5fe
As noted inline, check the kernel flags on booted images to increase
confidence the bootloader is making generic images.
Change-Id: Ic15487f61a8d5f4c0c8f1941815d9649ed730add
This point release has a single fix that corrects the way
source-repositories references cached directories with git
(Iadb23454e29d8869e11407e1592007b0f0963e17).
Change-Id: I2d9e3b7949cf005afb8307453791188177b63e36
Under Azure, an admin password is required in order to launch a
VM from a Windows image. Add support for that.
Also, shorten the node name to less than 15 characters in order
to accomodate Windows restrictions.
Change-Id: I899f3e02046ffdb5f9fd19fe90c4bc9afdb01a7c