This change adds the lxc.haltsignal option to the container config which
ensures containers are gracefully stopped but quickly.
Presently, when a container is restart they can hang for 20 - 30 seconds
which is due to the fact that the default stop signal is SIGPWR.
While the hang when stopping a container is not 100% reproducible in all
environments it can be seen when simply executing `lxc-stop`. If the
user were to stream the container journal while stopping the container
it's would be seen that the container hangs when trying to shutdown some
systemd services. If the `lxc-stop` command is executed a second time the
container is stopped more forcibly with SIGRTMIN+3. This change is using
an example stop signal from the lxc documentation [0] which is
implementing a Real-time signal, SIGRTMIN+n. More on the signal used can
be found here [1].
[0] http://manpages.ubuntu.com/manpages/xenial/en/man5/lxc.container.conf.5.html
[1] http://manpages.ubuntu.com/manpages/xenial/en/man7/signal.7.html
Change-Id: I01e82eabf17d2ac5a89c13ef56616fd1fe0607dd
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
When following the example playbook for the role, it is possible for
properties undefined to be raise. Because default() filters don't work
with undefined dictionary keys, ensure a default dictionary for
properties exists.
Change-Id: Iee2e992efe8ee801506e5de622bd90ac3915a33c
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
In order to allow the use of the environment variable which informs
Ansible which user executed the playbook, we pass the USER env var
into the environment that tox builds.
Change-Id: I5ba653fceec3db1073ab639d835f6a250b11a4e6
Implements: blueprint python-build-install-simplification
Signed-off-by: Jesse Pretorius <jesse.pretorius@rackspace.co.uk>
This patch changes the flush routes handler to flush the entire
interface config from the interface. This is needed because
systemd-networkd does not restore the route of non-DHCP interfaces
when flushing routes and restarting systemd-networks.
Change-Id: I17748b0dd2307fd9bee705140c67883140090298
Signed-off-by: Major Hayden <major@mhtx.net>
Patch I0d83fd4895d4c5beaf5a84a239c1a1ed71521dee dropped the ARP=yes
option for networkd because it's not supported by old systemd releases.
This however brings back a problem where the default one sysctl
arp_notify option in the kernel may not correctly set for our use case.
Containers are created with random MAC addresses so we need to ensure
that ARP entries are populated correctly when a container is restarted.
Instead of having to implement some sort of a new workaround on the host,
it's probably better to create all containers with fixed MAC addresses from
now on.
Change-Id: I8ad390fc3ce27756f26c57c92aaa3adc8e506a17
We should use domain names for the external network testing task in
order to verify no only that the default gateway works properly but also
that our DNS is able to resolve hostnames.
Change-Id: I3aebcf1dff8268e4dbaebae8fb598ee27e3f481d
Depends-On: I316c3851f40f08d272b7bb5f7165e010e3a95c3a
Depends-On: Ied7632037f737c3f32c34dac70531065c54496c9
Depends-On: I14f8373897da28dea2ea03500c2be46c5b40d51c
Depends-On: I0d83fd4895d4c5beaf5a84a239c1a1ed71521dee
The ARP option has been added in systemd-232. As such, current stable
distributions may not support it so drop the option and let the kernel
decide what to do with ARP. Fixes the following warning:
[/etc/systemd/network/eth0.network:14] Unknown lvalue 'ARP' in section 'Link'
Link: https://github.com/systemd/systemd/pull/3854
Link: 99d2baa2ca
Depends-On: I14f8373897da28dea2ea03500c2be46c5b40d51c
Change-Id: I0d83fd4895d4c5beaf5a84a239c1a1ed71521dee
The UseDNS option requires the systemd-resolved service so set this
option based on the lxc_container_enable_resolved variable.
Change-Id: I5b7c3f01534f5ccbaf76aced673aefc6ec7fcf6e
When using a static route we need to set a route metric to ensure the
priority of the routes being passed in. This change ensures we maintain
our expected interface and functionality should any static routes be
passed into the container.
Before the implementation of networkd, EIN would amend the main table
with the defined routes in the order they were written. However
systemd-networkd inserts the defined routes at the top of the default
table which can cause confusion and conflict. This change simply adds
a route metric to all defined routes and increments the metric integer
based on the list index which explicitly ensures all defined routes
are prioritized in the order in which they were written.
Change-Id: I13768580fbd926033fde4a74cbbf90b9eda24658
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
openSUSE and CentOS have been voting jobs for a while so we should
start testing all the scenarios on them. The only job that hasn't
been added is the ZFS one since there is no such package on openSUSE
or CentOS.
Change-Id: Icde5ed7a4e6be8ac19412f15b84febf2096ba404
The ansible_distribution variable is causing some troubles since it can
contain spaces etc. As such, we can simply use the ansible_pkg_mgr
module to figure out the name of the package we want to install.
Change-Id: Ic92eb1f9030df2883b049b9868e031ff4f0d42f2
Unify container network interfaces using Systemd Networkd for ubuntu,
centos, and openSUSE. This change allows the role to use a single way to
configure container networks.
Care has been taken to ensure we're able to cleanly upgrade to the new
capabilities within existing environments without breaking any feature
compatibility or causing any container restarts.
It's also worth noting that all of the pre/post networking up/down
script options have been converted to systemd "oneshot" services. This
retains the ability to run adhoc scripts post network availability
while also opening up this capability, which used to be ubuntu only,
to all of our supported operating systems.
> Our usage of `lxc-attach` was removed in favor of `nsenter` to fix a
issue where multiple `lxc-attach` commands issued to a single physical
host could result in a hang.
> Scripts that were being generated inline have been placed into
template files. This solves a long standing memory consumption issue
when creating lots of containers. The old shell tasks will now be
executed from a generated script. While this should also help with
debugging, the main driver is to ensure better system stability.
> A lot of cleanup has been done throughout the task files and
templates. In the process of updating the role to use unified
networking a lot of duplicate tasks, scripts, and processes have
consolidated.
> Handlers have been added for network connection wait conditions and
to various service restarts.
> The OSA plugins have been added to this role as a dependency. We
rely on the connection plugins throughout the stack however we were
doing a lot of workarounds to cater to the possibility of a deployer
running this role without them. This change simply adds the plugins
as a known dependency which allows for a more streamlined setup.
Change-Id: I5d3ddcfa11d575648a69a04f2fb30236c2c89da3
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The systemd machine-id needs to be unique on all network attached
devices. This change ensures that when a container comes online, a
unique machine-id is generated if one was not already present. When
the machine-id is created for the first time the container will restart
so the new ID can take effect.
More information on the machine-id can be found here:
https://www.freedesktop.org/software/systemd/man/machine-id.html
Change-Id: Ib25aeeecf1e6001e6c6b1a7d6b6d50eca7ab45fa
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Zuul no longer requires the project-name for in-repo configuration.
Omitting it makes forking or renaming projects easier.
Change-Id: Ie0a2f156de3c136439ac4dc5e28b16ed5509288c
The overlayfs backing store doesn't play well with the unconfined
profile and many tools (eg ping, traceroute) are failing to work
with the following error:
ping: error while loading shared libraries: libcap.so.2: cannot stat
shared object: Permission denied
As such, lets switch to the lxc-openstack profile is overlayfs is used
as the backing store.
Change-Id: Ibe1149ee4fedd2b3d487887e504c500c96165467
Related-Bug: #1612412
The handler would try and stop a container before restarting it however
if the container was already stopped the handler would fail instead of
simply moving on to the next task. This change makes the "stop" portion
of the task detect the return status code of "2" when restarting the
container. If the return code is "2" we know that the container is
already stopped and that no change has occurred.
For the sake of consistency and to ensure the greatest chance for
success the test task that stops a container has also been given the
same setup.
Change-Id: Ia4856f36b2d106d987e3c774f31493e25a23d4b5
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
With the merge of https://review.openstack.org/520177 in the
tests repo some ansible-lint failures which previously were
not being picked up are now detected.
This adds the appropriate skip tags to the tasks so that they
are not evaluated by ansible-lint.
Change-Id: Ia91f73d4f17e94a150c93c75c618778c25823d0d
Release notes are version independent, so remove version/release
values. We've found that projects now require the service package
to be installed in order to build release notes, and this is entirely
due to the current convention of pulling in the version information.
Release notes should not need installation in order to build, so this
unnecessary version setting needs to be removed.
This is needed for new release notes publishing, see
I56909152975f731a9d2c21b2825b972195e48ee8 and the discussion starting
at
http://lists.openstack.org/pipermail/openstack-dev/2017-November/124480.html
.
Change-Id: I035c3f5c0d4f63d24e015c74a0d25979553e920a
The block/rescue we were using in the mac generation task did not work
as expected. Because we use an iterator on the task and we can't iterate
over the entire block the task would fail when mac address lookups
within a running container didn't already exist but needed to be
created. This resulted in the task failing for a single host and being
removed from the inventory instead of running a rescue for only the
missing network.
The use of block/rescue has been removed. If the feature to loop over a
block is ever implemented [0] we can revisit how this action is done.
[0] - https://github.com/ansible/ansible/issues/13262
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Change-Id: Ie4bc3b130874047a5cbd36b98ed86a731ae5c317
The container create role will fail when adding a new network
to an existing set of container. The role attempts to read the hwaddr
file prior to its existence due to our use of local_facts. This change
simplifies how the mac addresses are generated by using a block / rescue
which will simply consume the known mac address from the container when
needed or fall back to the generation tasks as needed.
While working on this issue I noticed that we were still using a
pre-Ansible 2.0 failure notice which is no longer relevant as we've been
using Ansible 2.x for quite a while now. The old assertion has been
removed.
A new assertion has been added to notify the deployer that the
physical_host variable must be set to use this role.
Change-Id: Ic2800f1c17d10180e4e9a7be7f9b435ff8cc5487
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Initially the intent for adding this was to better test
any patches for roles together before they merge, but it
has had the unintended side-effect of causing patches to
take much longer to merge (because they all get lined up
in a single queue, rather than independent queues) and
a lot more infra resources are used (because a patch
that fails at the top of the queue will result in all
subsequent patches restarting all their tests).
As discussed in the channel, we'd prefer to revert back
to the previous independent queue method of testing. It
has served us well.
Change-Id: If392e0c3ff723db7ef4631a62ff03728fb09c680
This patch fixes an issue with the test package name.
Depends-On: I913284b0dc0165e102d4016760947223fb129a92
Change-Id: I84752bd83d76b7d5e7ac38a5c2b8a81d75d5ceb7
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Currently the linters test is in the project-config
repository, but those are meant to be used for standard
jobs which do not require any repository other than the
one given. Our lint tests use the 'openstack-ansible-tests'
repository, so we should rather use our own job definition.
Change-Id: Iaac6d522220481b2f69e82a0ac0892666c3eb9ad
Depends-On: I0391ec310c4eede436011a48490e3c524c8ddf4d