Update the documentation to match recent changes to the driver.
Also, update the driver to use the standard timeout options.
Change-Id: Iccdb4f4ce5470eebbdee9eccc403497a635e105e
This change prevents a TypeError when deleting pods:
TypeError: delete_namespaced_pod() takes 3 positional arguments but 4 were given
Change-Id: I6d31af352e3151eb1bdef52472a4b0b5fa00ecd8
This change moves the kubernetes client creation to a common
function to re-use the exception handling logic.
Change-Id: I5bdd369f6c9a78e5f79a926d8690f285fda94af9
Trigger a rebuild of nodepool images with DIB 3.12.0. The main reason
is for people to use the containerfile element to build later Fedora
releases.
Change-Id: Iaeab9f27296c4432071fd2d8649b78b81efe656b
Now that the state-machine-based driver for Azure is complete,
replace the current driver with it. This changis is generally
simply file renames, with some minor bugfixes to satisfy
edge-cases that appear in the tests.
This also updates the fake azure to accomodate the additional
methods used by the new driver, and adds a test for image uploads
and diskimage building.
Change-Id: I6b5cf72501ea83a8a7a2f753ee6ed8d2e484a5d2
The leaked resource detection can race with resource creation and
erroneously delete a newly created resource. To fix this,
verify that a leaked resource shows up in two subsequent runs.
We could do this in the adapter, but if we generalize resource
representation, then we could do it once in the statemachine driver
and make adapter implementation simpler, so that's what this change
does.
Change-Id: I725bbf6901839b91781d738a7a9f07c1ebfa3369
We could conceivable leak disks used for image uploads as well as
the images themselves. Handle that in the cleanup method.
Change-Id: I6e72682995c50a57684cd5a0407a85049cdbda16
This adds support for querying the provider quota as well as
getting resource usage information for hardware profiles.
This also includes a change to the general quota utils for
all drivers:
If a driver raises an exception when querying available quota
and we have a cached value, use the cached value. This avoids
errors from transient provider API/network issues.
Change-Id: I7199ced512226aaf8bdb6106d41796a5cbca97c0
The current config file for azure requires the user to supply a full
object id for the subnet. That looks something like this:
subnet-id: /subscriptions/........-....-....-....-............/resourceGroups/nodepoolRG/providers/Microsoft.Network/virtualNetworks/nodepool/subnets/default
It's not trivial to find this in the web portal either. To make it
easier for users, let's deprecate this and specify it via any of
the following:
network:
resource-group: nodepoolRG
network: nodepool
subnet: default
or:
network:
network: nodepool
or (the most likely case):
network: nodepool
Where resource-group will default to the resource group of the provider
and subnet will default to "default" which is what Azure initially creates.
Change-Id: I423fb6739089a44116ec00ba9b9ba219b5563fc2
Azure supports the following:
Private IPv4 (with or without public IPv4)
Private IPv6 (with or without public IPv6)
Update the Azure state machine driver to handle all of the possible
variants, and pick the best SKU/allocation method for the
circumstances.
Change-Id: Ia81edd5ccb8ac7b8f9e87cb6ce0a890748a80210
This fixes libffi bindep installation on Ubuntu Focal
The Python 3.6 tox tests are switched back to bionic, as Focal nodes
don't have Python 3.6.
Additionally, we squashed the following change into this to unblock
the gate:
Remove nodepool-functional-openstack
This test installs devstack and then nodepool on a bionic host (in
contrast to the -containers variant that builds a container from the
Dockerfile and installs/runs that).
Firstly, devstack support for Bionic is going away soon so we have to
update this. We don't really need to test if we run ontop of a plain
Bionic/Focal host. We have tox jobs testing various Python versions
for compatability, so running on here isn't providing any extra
coverage. DIB can't build many things on plain Bionic/Focal due to
updates or incompatabilities in "alien" versions of RPM, Zypper,
debootstrap, etc. The container incorporates fixes as required and is
where anyone is going to put attention if there are build issues;
hence we're not testing anything useful for image building paths.
Finally we also have nodepool-zuul-functional, which brings up Zuul
and nodepool on a plain Bionic host anyway. Per the prior reasons,
that covers basically the same thing this is providing anyway.
openstacksdk is using this on older branches, but is switched to using
the container job in the dependent changes.
Depends-On: https://review.opendev.org/c/openstack/openstacksdk/+/788414
Depends-On: https://review.opendev.org/c/openstack/openstacksdk/+/788416
Depends-On: https://review.opendev.org/c/openstack/openstacksdk/+/788418
Depends-On: https://review.opendev.org/c/openstack/openstacksdk/+/788420
Depends-On: https://review.opendev.org/c/openstack/diskimage-builder/+/788404
(was : Change-Id: I87318e9101b982f3cafcf82439fdcb68767b602b)
Change-Id: Ifc74e6958f64be70386cdb3e05768d94db75c3bb
The static provider will create nodes in state "building" in case a
label is currently not available. When a freed up node can be assigned
to a waiting node we must use the request prio to decide which node to
update.
Before nodepool ordered the waiting nodes by creation time. This can
lead to node requests being fulfilled in the wrong order in certain
cases.
Change-Id: Iae4091b1055d6bb0933f51ce1bbf860e62843206
Configuration has all moved to containers.conf; write the cgroup
option into that. Also disable log messages trying to go to systemd,
which puts out warnings about the journal socket not existing.
Change-Id: Ia4d31d826daf6f9b43757b8b4ae446092afd42c8
It seems some packages that are really quite important are only
recommends depdencies and cause failures when dib containerfile
element tries start podman for extracting base images. Add
--install-recommends.
Since the podman things are getting a little complex now, consolidate
them into one section for clarity.
Change-Id: Ie77ee0a0c5318d8c12eb1b0e68b3b6fa8358ece0
If the kernel in the container doesn't support this option it causes
podman to fail to start when using the containerfile dib element.
Disable metacopy option for compatability.
Change-Id: I168bd1a50b6b20da051b00c3e88daedb5ed6e5e9
This installs podman inside the nodepool container, which is used by
the dependent change in DIB to extract initial chroot environments
from upstream containers. This eliminates the need to run non-native
tools on build hosts (rpm/zypper on Ubuntu, etc.).
As noted in the config, podman defaults to assuming systemd is
installed and using various systemd interfaces.
Additionally, we map the a volume into the container which allows
nested podman to do what it needs to do.
Needed-By: https://review.opendev.org/700083
Change-Id: I6722aa2b32db57e099dae4417955a8a2cd28847e
If the job stops before the required directories here are created we
get a failure copying, which stops us collecting things that may be
useful to diagnosing the error. Put the copying in a block that
ignores errors.
Also just copy the /var/log/syslog -- I think this came in originally
from something running on CentOS which doesn't have this, but now we
run on Ubuntu hosts and syslog has interesting things for diagnosing
errors.
Change-Id: Iaca4801a652ef4a67772c804271ea5c1db377051
The latest docker-ce release does not start properly if it is installed
after installing and removing docker.io on Ubuntu. It is likely that the
docker.io package is leaking something across to the docker-ce install
that causes this problem. We simplify things by relying on ensure-docker
to install docker for us instead. This avoids the install, uninstall,
and broken install process.
Change-Id: I9e08dbf1ee3e6e146fb9ee6ba3435b3048096f5b
Change I670d4bfd7d35ce1109b3ee9b3342fb45ee283a79 added the
quota['compute'] property to ZK nodes for Zuul to keep track of.
Instead of accessing the property directly, this adds a
get_resources() function to QuotaInformation which returns it, and
makes a point to note that the property is used so that we don't
modify it in incompatible ways in future.
This is a no-op refactor, but helpful in a follow-on change that also
adds this field to leaked nodes.
Change-Id: Id78b059cf2121e01e4cd444f6ad3834373cf7fb6
Change If21a10c56f43a121d30aa802f2c89d31df97f121 modified nodepool to
not use the inbuilt TaskManager but use openstackapi's task handling
instead.
The statsd arguments added here don't actually do anything and are
ignored; an openstack.Connection() object doesn't setup the stats
configuration. Things are somewhat working because of the
STATSD_<HOST|PORT> environment variables -- openstacksdk notices these
and turns on stats reporting. However, it uses the default prefix
('openstack.api') which is a regression over the previous behaviour of
logging operations on a per-cloud basis.
I have proposed the dependent-change that will allow setting the
prefix for stats in the "metric" section of each cloud in the
openstacksdk config file. This will allow users to return to the
previous behaviour by setting each cloud with an individual prefix in
the cloud configuration (or, indeed keep the current behaviour by not
setting that). So along with removing the ineffective arguments, I've
updated the relevant documentation and added a release note detailing
this.
Depends-On: https://review.opendev.org/c/openstack/openstacksdk/+/786814
Change-Id: I30e57084489d822dd6152d3e5712e3cd201372ae
Admins need to know why requests were declined, and major decisions
should be logged at info level.
Note that the complementary "Accepting node request" may be okay at
debug level as there is a more informative "Assigning node request"
message at info.
Change-Id: I67ac8c75e5581cc4da77cba42a98c5c3785640ff
Remember how Ib36483bbed95a04fb6a0e656b1890138c8002203 had some fixes
for ARM64 building? ... well this has more to fix the bootloader
installation :)
Change-Id: I4ce8f4af8c7cdd4610a723ccd8379982c9ad4cc3
This adds a simple load testing script that basically maintains a
stable number of pending node requests. Combined with max-ready age
this can be easily used for load testing instance creations and
deletions.
Change-Id: I2f754e88fdc541914f929511c713a43eb910a344
The config objects in the Azure driver have drifted a bit. This
updates them to match the actual used configuration. It also
reorganizes them to be a little easier to maintain by moving the
initializers into the individual objects.
Finally, the verbose __eq__ methods are removed in favor of a
simpler __eq__ method in the superclass.
Since the OpenStack, k8s, and OpenShift drivers calls super() in
__eq__ methods, they need to be updated at the same time.
This also corrects an unrelated error with a misnamed parameter
in the fake k8s used in the k8s tests.
Change-Id: Id6971ca002879d3fb056fedc7e4ca6ec35dd7434
It can take up to 30 minutes after creation for a disk to appear
in the list of disks returned by the API. This means that if we
rely on querying the API to find the disk to delete when deleting
a VM, we will not see it if the VM was used for less than 30m. In
that case, we would always rely on the leaked resource cleanup to
delete the disk, and disks would stay in use longer than necessary.
To correct this, save the os disk name when we start deleting the
VM, and use that to delete the disk.
Change-Id: I97756165c426371833c6d014125fce341dad960e