62 Commits

Author SHA1 Message Date
Mark Goddard
760c2b796d CI: use retries for control host bootstrap in seed VM jobs
All instances of 'kayobe control host bootstrap' in the development
scripts use a helper function, except for during seed_hypervisor_deploy.
The helper adds a retry mechanism to combat flakiness often seen during
Ansible Galaxy installs.

This change fixes the issue.

TrivialFix

Change-Id: I954cb604a18874744b3673ebf2e2c29caa18ce8f
2021-04-12 15:58:10 +00:00
Mark Goddard
3084cf67ce CentOS Stream 8: Use /usr/bin/which instead of bash function
A bug has been introduced to the which package in CentOS Stream 8 which
causes it to fail when used with the following bash options:

set -u
set -o pipefail

Then, when running which we see the following output:

environment: line 1: _declare: unbound variable

As found by Pierre, this seems to be caused by the implementation of
which as a bash function which references an unbound variable
(_declare). It's tracked in Fedora by
https://bugzilla.redhat.com/show_bug.cgi?id=1944877#.

This change works around the issue by using the /usr/bin/which binary.

Co-Authored-By: Pierre Riteau <pierre@stackhpc.com>

Change-Id: I468d4e0460c13791b9f01d5854ef45472528c6fe
Story: 2008795
Task: 42215
2021-04-06 12:29:33 +01:00
Mark Goddard
df00ba22e7 CI: increase Ansible Galaxy retries & add delay
We still see flakiness when downloading content from Ansible Galaxy,
often HTTP 520. This change increases the retries from 3 to 10, and adds
a 5 second delay between attempts.

Change-Id: I0c46e5fcc6979027dc6f1bc5cc49e923a205f654
Related: https://github.com/ansible/galaxy/issues/2429
2021-03-26 17:34:40 +00:00
Mark Goddard
775620733b CI: Fix IP address detection in baremetal compute test
The 'openstack server show <server> -f value -c addresses' command
previously had output like this:

    <network name>=<IP>

Now it shows a Python output like this:

    {'<network name>': ['IP']}

This broke the parsing of the command output when determining which IP
address to use to access a bare metal instance via SSH.

This change fixes the issue by querying the server's port in Neutron,
and using the fixed IP address.

Change-Id: I55b5f185fb7136d3c6fa565aa46598f21c94eb43
2021-03-23 12:03:20 +00:00
Zuul
3d7f15faa9 Merge "CI: display VM console log on ping or SSH failure" 2021-03-10 14:08:25 +00:00
Mark Goddard
9f41cefc15 CI: add Ubuntu overcloud deploy job
* Use source images
* Need to specify bash for &> syntax

Issues worked around:

* Manually configuring bridge via ip commands makes ifup fail to bring
  up the link. Adds a kayobe-network-bootstrap Zuul CI role that adds
  persistent configuration for the all-in-one network.

* bridge not active after interfaces role bounce. Added a pause, similar
  to https://github.com/michaelrigart/ansible-role-interfaces/pull/31

* fails installing docker python module for kolla user. WARNING: The
  repository located at mirror-int.ord.rax.opendev.org is not a trusted
  or secure host and is being ignored ERROR: No matching distribution
  found for docker===4.4.0 Adding trusted host for PyPI mirror.

* Tenks fails to create block devices - missing qemu-img (in qemu-utils)

* Tenks qemu emulator is different on Ubuntu

Remaining issues:

* Bare metal testing is unreliable on Ubuntu - some jobs see IPMI
  failures such as the following:

    ipmitool chassis bootdev pxe

    Error setting Chassis Boot Parameter 5\nError setting Chassis Boot
    Parameter 0\n

  Bare metal testing is disabled on Ubuntu for now.

Depends-On: https://review.opendev.org/766984
Depends-On: https://review.opendev.org/766958

Story: 2004960
Task: 29393

Change-Id: I1985efae7c18f55c3ff7c27c17d6242523904f3e
2021-03-01 17:57:51 +00:00
Mark Goddard
f9d9afcfba CI: Fix overcloud and seed VM jobs on vexxhost clouds
This partially reverts commit 47bbb96b29ab30764d6220cdd43e63a1d2072533
which triggered a retry on vexxhost clouds.

The issue was introduced in Ie8fd965165e8d347d27528a2c16d0647e412ccdc,
which applied some fixes for CentOS 8.3, and inadvertently removed
the Tenks variable that forces the use of qemu for 'bare metal' VMs.
This lead to autodetection of KVM, which does not work well when nested
in all CI cloud providers.

This change fixes the issue by forcing the use of qemu for the overcloud
once more. It also adds a similar option for the seed VM job.

Change-Id: I6bc8da2b8da903e09b97df8cd95c68a562c11db9
2021-02-26 11:11:18 +00:00
Mark Goddard
25ae0be2f9 CI: display VM console log on ping or SSH failure
Also increase attempts to 12, in line with Kolla Ansible CI.

Change-Id: I81cabf27f44af3c8135efe8e427db1ffee5f0091
2021-02-24 11:54:49 +00:00
Pierre Riteau
c84a9757dd Test building seed deployment images in the seed job
This requires stackhpc.os-images v1.10.0 or newer, for compatibility
with CentOS 8 when SELinux is enabled: we disable SELinux, but without
rebooting it stays enabled.

This Ansible role was updated to v1.10.2 in master and stable/victoria
by I5efdbd52556721914fe69d7c6ba454b2c721b643, for another reason.
Remember to bump the requirement when backporting to earlier releases.

It also needs changes in the way we interact with Bifrost to avoid using
the env-vars file which has been removed. This is implemented by change
I25078e69acdf41a4ef9957f99fe5047de54b778d.

Finally, it requires building seed deployment images only after
deploying Bifrost, because the task copying images onto the seed expects
/etc/kolla/bifrost to exist.

We also copy log files to identify issues when the job fails.

Change-Id: I4719b4d397c01b35c78cb84c6d686dd27742d1c0
2021-02-05 11:50:15 +01:00
Mark Goddard
4398856ec8 Fixes for CentOS 8.3
* Bump stackhpc.libvirt-host to v1.7.1. On seed-hypervisors installed
  using CentOS 8.2 or earlier, interaction with libvirt may fail due to
  libgcrypt being incompatible. See
  https://github.com/stackhpc/ansible-role-libvirt-host/issues/42

* Bump MichaelRigart.interfaces to v1.9.2. The CentOS 8.3 cloud image
  includes an ifcfg-ens3-1 file. See
  https://github.com/michaelrigart/ansible-role-interfaces/pull/93

* Previously a second libvirt daemon was installed by Tenks on the host,
  however changes in libvirt 6.0.0 to separate libvirtd into multiple
  daemons do not allow for customisation of the PID files used by the
  new daemons. This leads to a conflict between the container and host
  daemons. Update the Tenks config to use the containerised Nova libvirt
  daemon. This depends on a change to the stackhpc.libvirt-host role:
  https://github.com/stackhpc/ansible-role-libvirt-host/pull/44

* Not CentOS 8.3 related, but tox jobs are now failing on python
  dependencies. Remove upper limits from docker and paramiko.

* Not CentOS 8.3 related, but Bifrost has enabled authentication by
  default. We are not ready to support this, so override it.

Story: 2008429
Task: 41378

Change-Id: Ie8fd965165e8d347d27528a2c16d0647e412ccdc
2020-12-16 11:04:48 +00:00
Mark Goddard
6a4e7c4e91 dev: fix test scripts when ironic is disabled
While we always test baremetal compute in CI, development environments
may not. Given that Ironic is now disabled by default, we should make
this work out of the box.

Story: 2008207
Task: 41003

Change-Id: Id3128380f5ff74d24265f6b2132c6d7992bf00ba
2020-10-02 14:24:30 +00:00
Mark Goddard
081222753c CI: Add a CentOS 8 overcloud job with TLS enabled
Change-Id: I5fc49fb734d0fe94f5f75c66eb4c1a935774ef30
2020-10-01 09:49:21 +00:00
Zuul
e0491a1d0a Merge "CI: Update IPA images during upgrade" 2020-06-17 19:08:41 +00:00
Zuul
79f9a1cc25 Merge "IPA: Switch to IPA builder and CentOS 8" 2020-06-17 19:03:01 +00:00
Mark Goddard
c16597aa2d Add seed VM provisioning CI job
Adds the kayobe-seed-vm-centos8 CI job to configure the Zuul VM as a
seed hypervisor, and use nested virt to provision a seed VM.  This
ensures that the seed hypervisor code paths are tested.

The job uses a Cirros image for the seed VM rather than the usual CentOS
cloud image. This is to reduce bandwidth required to download the image.
It does mean that the resulting seed VM cannot be used as a seed, but
nested virt would make this slow and unreliable anyway. Cirros does not
load cdrom drivers by default, so we add the configdrive as a disk
rather than a cdrom device.

Depends-On: https://review.opendev.org/617161

Change-Id: I2268a1ddf9a2870c713f32a40689e1686365aabd
Story: 2001655
Task: 6683
2020-06-16 17:19:47 +01:00
Mark Goddard
3d9c586134 CI: Update IPA images during upgrade
This ensures we are using the appropriate IPA images in the upgraded
environment.

Change-Id: I4a72d9ae49ad41716522c3074c16d8ca23c3ff94
2020-06-12 16:25:11 +00:00
Mark Goddard
20fb05bfb4 IPA: Switch to IPA builder and CentOS 8
Switches to use the IPA builder project for building IPA images.

Switches the IPA images used by default to CentOS 8 based image.

Changes the file extension of the IPA kernel image from vmlinuz to
kernel.

Story: 2007070
Task: 37953

Change-Id: I82fc455f41f48dacb453e135870dd776895d7c99
Story: 2006574
Task: 39485
2020-06-12 17:24:31 +01:00
Mark Goddard
b9d76f6ef5 Remove support for CentOS 7 and Python 2
* Always use Python 3
* Drop code paths for CentOS 7
* Drop support for Yum
* Remove support for host NTP daemon, always use chrony
* Switch references from 'yum_install_epel' to 'dnf_install_epel'
* Remove overcloud host image workaround for tagged VLAN admin network
* Remove the kayobe.utils.yum_install function, which is unused

Change-Id: I368f6edafed9779658798fc342116b4c1b3ffd48
Story: 2006574
Task: 39481
2020-05-28 10:25:51 +01:00
Zuul
558276a8a6 Merge "CI: Add overcloud host configure job" 2020-04-24 00:03:24 +00:00
Zuul
7932314e54 Merge "Use upper constraints when installing Tenks" 2020-04-22 20:11:14 +00:00
Zuul
6b19b817cf Merge "CI: Test SSH connectivity to deployed instances" 2020-04-22 00:28:44 +00:00
Pierre Riteau
27779992b1 Use upper constraints when installing Tenks
Backport: train, stein, rocky

This fixes issues seen with a-universe-from-nothing using stable/train.

Change-Id: Ib477de5f3af2e4c182d0c2999c274dbb5553531c
Story: 2007572
Task: 39469
2020-04-19 15:30:36 +02:00
Mark Goddard
92a437f63c CI: Add overcloud host configure job
Tests various non-default configuration:

* Custom users
* Network interfaces, VLANs, bridges, bonds
* Software RAID
* LVM & docker devicemapper
* timezone
* Package mirrors
* yum-cron / DNF automatic

This improved test coverage allows us to be more confident about these
features working on CentOS 8.

Change-Id: I36148e4356deb7d5ec00d8d3ebeb2d3932ff4f94
Story: 2006574
Task: 38938
2020-04-16 15:44:49 +00:00
Pierre Riteau
ef33e6ecb7 Install python-openstackclient using upper constraints
Detect current branch from .gitreview and use upper constraints to
install python-openstackclient, to guarantee compatibility with the
Python version in use.

Change-Id: Ie44508fe3d3b08190afa5a43748e43548a63ff82
2020-04-02 16:58:50 +02:00
Pierre Riteau
15109ccb54 Make Kayobe code compatible with Python 3
Co-Authored-By: Mark Goddard <mark@stackhpc.com>

Change-Id: I2a7a82d7f576739c5516a0072f953712ffa5c233
Story: 2004959
Task: 29392
2020-02-27 11:10:29 +00:00
Mark Goddard
02327b1c54 CI: Test SSH connectivity to deployed instances
Add testing of the dataplane to the overcloud jobs. To support both
baremetal provisioning and VMs with VXLAN tenant networks, we use the
provision-net as our external network to which floating IPs are
attached.

We depend on a backport of parts of this patch to allow testing of SSH
connectivity after upgrades.

Depends-On: https://review.opendev.org/708915/
Depends-On: https://review.opendev.org/709145/

Change-Id: I6316e8959cff987e4e97280889e1038a9537ed32
2020-02-25 14:12:05 +00:00
Andreas Jaeger
030ede06e8 Fix error logging of dev/functions
In case of failures in kayobe-overcloud-centos, the error message fails
with:
kayobe/dev/functions: line 569: LOGDIR: unbound variable

Example:
https://zuul.opendev.org/t/openstack/build/ce1fadc3ee6d4842a599da57a670cc18

This can be reproduced with:
set -eu

if [[ -n ${LOGDIR} ]]; then
    echo "LOGDIR set"
else
    echo "else"
fi

Fix the error reporting with assigning an empty string to LOGDIR by
default.

Change-Id: Ieef73950f89e4dfb727ddc59ef2750d9b81f3c58
2020-02-13 15:32:49 +00:00
Mark Goddard
c619579fc1 CI: Remove workaround for upgrading from Rocky
We no longer need to cd to the kayobe repo path, since all tested
versions of kayobe support pip installation.

Change-Id: Id7b574cbc6bb5195eee16cfc8443075712ce3254
2019-11-26 14:51:25 +00:00
Will Szumski
cc71fd03f9 CI: Don't set cpu mode
Not setting cpu-mode will reduce the number of TCG emulated features.
Hopefully this will make CI more reliable.

Change-Id: I24a8832c02db6ba019ab8f5c2b9d7216a9b7d213
2019-10-21 10:28:07 +00:00
Will Szumski
a3d3649c5e Install libffi headers
This is to resolve the following issue in CI:

    c/_cffi_backend.c:15:17: fatal error: ffi.h: No such file or directory
     #include <ffi.h>
                     ^
    compilation terminated.
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /home/zuul/kayobe-venv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-PV3WhJ/cffi/setup.py'"'"'; __file__='"'"'/tmp/pip-install-PV3WhJ/cffi/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-ZvlZVY/install-record.txt --single-version-externally-managed --compile --install-headers /home/zuul/kayobe-venv/include/site/python2.7/cffi Check the logs for full command output.

Change-Id: I1bd724be7dc77058870f37cb1c9404472fa466ca
see: https://zuul.opendev.org/t/openstack/build/c20a316a699b4073abf75960634ebfcd
2019-10-15 13:35:15 +01:00
Mark Goddard
4b180502a5 Fix kayobe-overcloud-centos CI job flakiness
Docker CE has added a default DROP policy to the FORWARD chain.  When
nova-compute runs on the controller, kolla ansible sets the
net.bridge.bridge-nf-call-iptables sysctl to 1, which causes iptables to
process frames forwarded across bridges.

Currently, the kayobe-overcloud-centos job is failing quite frequently
with timeouts when deploying bare metal compute. Experimentation with
iptables hasn't revealed why this only happens sometimes, or exactly
what traffic is being blocked, but opening up the firewall does seem to
fix the issue. We won't see this in production since control and compute
services are on separate hosts.

This change updates the iptables configuration used in CI to forward all
frames on the bridge, breth1.

Change-Id: If96437b73b9b5c58600ba1b004f53ee0c1f14398
Story: 2006534
Task: 36590
2019-09-17 16:42:09 +01:00
Zuul
6b3462d384 Merge "Add alternative tenks deploy and teardown entrypoints." 2019-08-15 15:35:27 +00:00
Mark Goddard
0707e2c53d CI: Don't create an external network in init-runonce
This network conflicts with the ironic provision-net since both are flat
networks on physnet1. We don't need the external network in our CI, so
don't create it.

Change-Id: Id17d4ea00ceb684b34bf12e911758d504c520de0
Depends-On: https://review.opendev.org/#/c/670129/
2019-07-11 06:58:35 +00:00
Mark Goddard
7ed7b27bc4 Add retries to ansible galaxy install for all envs
Galaxy install often fails in CI, scuppering an otherwise good run.

Change-Id: I3d02afe33cdf32b1d285bd4bdc4a8074883113ca
2019-07-04 13:02:04 +00:00
Isaac Prior
0d598bf01d Add alternative tenks deploy and teardown entrypoints.
Allows users to explicitly specify which type of tenks
deployment they wish to create / destroy.
Preserves existing behaviour with defaults.
Modifies Zuul tests to use new tenks-deploy entrypoints.

Change-Id: I9aafed8481fd7564c0fc0abe5f6b21eceb824d75
2019-06-06 14:03:58 +01:00
Mark Goddard
c2a35ce211 Remove inspector_manage_firewall variable
This is supported in kolla-ansible via the ironic_inspector_pxe_filter
variable, which can be added to globals.yml. The default value for that
variable changed in the Stein release from 'iptables' to 'dnsmasq',
since the iptables filter does not work with Docker CE [1].

This change removes the inspector_manage_firewall variable.

This change also adds an iptables rule in CI tests to allow DHCP packets
to be forwarded, to ensure bare metal servers can be deployed.

[1] https://bugs.launchpad.net/kolla-ansible/+bug/1823044

Depends-On: https://review.openstack.org/649673
Change-Id: Idac6777b4d97fbd17698fc2086ceb068d7b2e326
Related-Bug: #1823044
2019-04-09 13:53:59 +01:00
Mark Goddard
08bb1441eb Prevent use of KVM for Tenks VMs in CI
Currently nested virtualisation under KVM does not seem to be working in
CI. This breaks the 'bare metal' deployment testing using Tenks, which
lead us to disable it in 749ef8243e9ae855cf8ceb54dc3f88c6c1b2fea0.

This commit forces Tenks to use QEMU for its VMs, allowing us to revert
commit 749ef8243e9ae855cf8ceb54dc3f88c6c1b2fea0..

Change-Id: Id382c218f3b37979341f0d96718a6011a1d9da37
Story: 2005316
Task: 30223
2019-03-29 15:26:04 +00:00
Mark Goddard
d7ae9f2df1 Don't cd to /tmp in environment-setup.sh
This script is used by developers to activate the kayobe virtual
environment and source the configuration's kayobe-env file. A cd to /tmp
is an unexpected outcome of running the script.

To test the location-independent installation, remove the chdir from the
zuul job tasks that execute kayobe commands.

Change-Id: I59194952901fa648382489f48dc7aafb03d3a682
Story: 2004252
Task: 29347
2019-02-05 16:49:49 +00:00
Zuul
29c0ad98c0 Merge "Update development scripts for control plane deployment" 2019-02-05 12:59:15 +00:00
Mark Goddard
ab205197b5 Update development scripts for control plane deployment
This adds support for deploying a virtualised control plane via Tenks, using
the Kayobe development scripts tenks-deploy.sh and tenks-teardown.sh.

Change-Id: I752455af9eb44cdb0f9921fd0c876fc2dfb50a5c
2019-02-05 09:36:35 +00:00
Mark Goddard
fb70e99a4b Support including and excluding files from config save
Currently in the upgrade job we are seeing the OOM killer kick in during
the 'overcloud service configuration save' command. Ansible is quite
inefficient when copying large files around, so excluding the large IPA
images should relieve some memory pressure.

Change-Id: I3a230b0a699154606ca8faa00a85d45ae815c599
Story: 2004704
Task: 28733
2019-02-04 09:21:07 +00:00
Will Szumski
84172bfbe0 Support complete installation of Kayobe as a python package
This adds the ansible playbooks required by kayobe to the manifest by
using the data_files option in setuptools. When using pip to install
kayobe into a virtualenv, these files will be placed in
<venv>/kayobe/share/.

In an editable install, e.g using `pip install -e .`, data_files are not
installed into the virtualenv. Instead, we must follow the egg-link file
to find out the actual location.

Story: 2004252
Task: 27787
Change-Id: Ibef040eceb547476007f83c0d5dcdb2bc6986d1e
2019-02-01 12:55:27 +00:00
Mark Goddard
7593a8b925 Test upgrading seed services in CI
Adds the kayobe-seed-upgrade-centos job, which performs an upgrade of
the seed services from the previous release to the current release.

Change-Id: Ia3eb39cf81cb3618fd94c4456bd576b52098c946
Story: 2004308
Task: 27873
2018-12-21 15:21:47 +00:00
Will Miller
d0e9c50fd2 Add tenks-deploy.sh dev script
tenks-deploy.sh deploys a minimal virtualized baremetal test cluster
locally. It also adds it to the overcloud-base CI job. To make the new
CI job work, we need to configure the firewall on the test machine to
allow the baremetal machines to communicate with the openstack services.

Change-Id: I7487a2606cf0bac71c5c63db41b2b518a6f6398b
Depends-On: https://review.openstack.org/#/c/615939
Depends-On: https://review.openstack.org/#/c/618003
Story: 2004297
Task: 27850
2018-11-20 18:53:09 +00:00
Mark Goddard
935d3cef6a Update dependencies to Rocky
Use stable/rocky branch of:

* Kolla
* Kolla ansible
* Bifrost
* IPA
* OpenStack services
* Requirements

Also updates Kolla Ansible inventory template.

The seed deploy job has been made non-voting and non-gating, because we
are waiting for bifrost change https://review.openstack.org/#/c/618740
to merge, be released, and for the kolla bifrost image to use the new
package.

Change-Id: Id5e7fdbd196f96e1e75ffc68bc93aab18fa38aa7
Story: 2001864
Task: 27798
Depends-On: I58e4f951d4a3dd89e0784fd82d8a62dbba374f79
2018-11-19 14:37:33 +00:00
Mark Goddard
6266312fa1 Test upgrades in CI
There is currently no coverage of upgrades in CI, which leaves us open
to regression in this infrequently tested but crucial area. This change
adds the required scripts and Zuul configuration.

A control plane is first deployed using the Kayobe stable/pike branch
and associated default configuration. The control plane is tested by
deploying then deleting a server instance. An upgrade to Queens is
performed, using the Kayobe master branch, or code in review if
applicable. The upgraded control plane is tested by deploying then
deleting a second server instance.

A workaround was required to restart the nova_compute service after the
upgrade, since the SIGHUP sent to it by Kolla Ansible during upgrade
appeared to be putting it into a degraded state.

A future improvement to this test could be to leave a server instance
running during the upgrade.

Change-Id: I0e595524e39d1131fe3ec6aaf2aeec3ff3d6a536
Story: 2003472
Task: 24732
2018-11-05 12:02:31 +00:00
Mark Goddard
7fe53c3d16 Check nova VM status in CI
Previously we were not checking the status of created VMs in CI, meaning
that VM creation failures would be silently ignored. This change checks
the status, requiring it to be ACTIVE for a pass.

Change-Id: Ia559c81b4944c2c6c7dedfca0c6c570db390704b
2018-09-25 19:36:29 +01:00
Doug Szumski
6c2e68a545 Support configuring tunnel network
Support configuring a separate tunnel network for tenant
overlay network traffic.

Change-Id: I74274823d6fe3a42aabcca00c8cd20e1abb3d219
Story: 2003054
Task: 23091
2018-07-20 13:57:03 +01:00
Will Miller
2d5fd703a0 Reconcile all 'Ansible control host' references
Ensure all references to the Ansible control host are worded as such, to
ensure consistency and avoid potential confusion with the OpenStack
controllers.

Change-Id: Id92e537ccbfdd55287b8eae296f649640c70ce17
2018-07-11 17:19:18 +01:00
Mark Goddard
0ec7edffa7 Test nova server (VM) boot in overcloud job
We use the demo script from kolla ansible, init-runonce, to create
resources in the control plane to make it ready for booting a VM. We
then create a nova server, and wait for it to become active. We do not
currently test that the VM boots successfully by accessing it via SSH.

Change-Id: I61be554366565decd9f4ff7805a3969aa37da4b9
2018-05-10 18:39:07 +01:00