This role will introduce quite a bit of state chagne within the host
it's deployed on. After the run we should force regather facts to ensure
we have the most up-to-date information before running any other
playbooks/roles on the host.
Change-Id: I05d71964f96a8e025aa0f89f37f8dcb2a705a2e5
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This change implements the machinectl quota system and qgroups when
they're enabled and available. This change is being implemented to
resolve an issue where machinectl based containers using a loopback file
system spam DMESG with the following:
* BTRFS error (device loop0): could not find root $INT
While various upstream sources say this error is benign[0], it raises
an inconsistency flag within the host system and is speculatively the
cause of our inconsistent read-only/Full-FS issues we've seen in the
integrated gate. Once the qgroups are properly setup the system will
remove the inconsistency flag and the message spam will stop.
* BTRFS info (device loop0): qgroup scan completed (inconsistency flag cleared)
To resolve this issue the quota system is being enabled by default
within the "lxc_host" role. This change essentially acknowledges
the built-in quota system and when enabled provides for the ability
to set / define specific quota (qgroup) options as necessary. While
many deployers may never use these options or this tooling, the role
will now properly set everything up should it ever be needed.
[0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1651435
Closes-Bug: #1753790
Depends-On: I34a41ac8a9fe4419254284c83f4600efee274c04
Change-Id: Ica79472568799098ebf83c6cefc585f117975f37
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
machine_id is not registered until further down in the file, so
this will fail with "The error was: |changed expects a dictionary"
We don't see the failure in our gates because the two preceding
conditions: not ((default_configuration_container | changed) or
(bind_configuration_container | changed)
are always true, so the machine_id test is never used.
In an existing environment where the container is being updated
from an old configuration to the new networkd installation, it is
very possible that default_configuration_container and
bind_configuration_container are not changed, so the machine_id
var is checked for changed state. At that point ansible fails
because the var is undefined.
Change-Id: I0b95c6c5d0f52344d476e52219c1ce31edcf65da
Now that run_tests.sh handles the tests repo clone, we can
remove the use of the older tests-repo-clone.sh script.
Change-Id: Iead678057f3888fe7aaddce6685865f4fcdfed53
The container and host can link journals giving operators the ability to
log stream and check on the health of a system without needing to login
(attach) to the container. This change implements journal linking for
LXC containers following the reference systemd specification.
Reference implementation:
https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html#--link-journal=
Change-Id: Id68cf39a77b5dd9c13c010829b47cd7a414378bc
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The variable `lxc_user_defined_container` has been added which allows a
deployer to define the container variable file in use for a given
container type.
Depends-On: https://review.openstack.org/554383
Change-Id: Ia1373bfa916b4add49a8444d2e4553f898650328
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Allow the role to collect facts for the physical host if missing,
since the role has a hard dependency on checking the physical host's
kernel version.
In the OSA container create playbook[1], facts are collected only
if the physical host itself is included in the playbook scope. When
a '--limit containername' parameter is used, no physical host facts
are collected and the role fails with:
The conditional check 'hostvars[physical_host]['ansible_kernel'] |
version_compare('3.18.0-0-generic', '<')' failed. The error was:
Version comparison: 'dict object' has no attribute 'ansible_kernel'
Change-Id: Id84aefed6c0129909cb6153258863564c7cc914a
This change sets the hostname of containers using the hostnamectl
command which has several enhancements over legacy method. By using
hostnamectl the command will validate the hostname for correctness
ensuring the container hostnames are conforming the the RFC.
The old methods have been removed and the command has been made part of
the handlers and will be run after the activation of dbus.
Change-Id: I158a5deb0685d2dcd436d7dd92caecb9966a025e
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
With the implementation of networkd the ENI scripts and config files for
the default interfaces shipped with the lxc container images we use is
no longer useful. These old files can cause conflicts in networking
should the old scripts and networkd get confused especially when it
comes to an interface that is setup for DHCP. This change simply defines
the default interfaces for both suse and ubuntu and ensures they're
deleted.
The interface flush handler has been set to failed when false because on
initial container create the eth0 device may not exist until
systemd-networkd is restarted for the first time.
Change-Id: I70abb5ec4226a81a065e495e19f5e7e0c569e1b0
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This change adds the lxc.haltsignal option to the container config which
ensures containers are gracefully stopped but quickly.
Presently, when a container is restart they can hang for 20 - 30 seconds
which is due to the fact that the default stop signal is SIGPWR.
While the hang when stopping a container is not 100% reproducible in all
environments it can be seen when simply executing `lxc-stop`. If the
user were to stream the container journal while stopping the container
it's would be seen that the container hangs when trying to shutdown some
systemd services. If the `lxc-stop` command is executed a second time the
container is stopped more forcibly with SIGRTMIN+3. This change is using
an example stop signal from the lxc documentation [0] which is
implementing a Real-time signal, SIGRTMIN+n. More on the signal used can
be found here [1].
[0] http://manpages.ubuntu.com/manpages/xenial/en/man5/lxc.container.conf.5.html
[1] http://manpages.ubuntu.com/manpages/xenial/en/man7/signal.7.html
Change-Id: I01e82eabf17d2ac5a89c13ef56616fd1fe0607dd
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
When following the example playbook for the role, it is possible for
properties undefined to be raise. Because default() filters don't work
with undefined dictionary keys, ensure a default dictionary for
properties exists.
Change-Id: Iee2e992efe8ee801506e5de622bd90ac3915a33c
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
This change adds an auto mount entry into config which ensures containers
have access to the cgroups, even if they're read only. Without this
change containers see a notable slowdown and repeating message regarding
a failure when resetting the device list. This option has no effect and
is not needed on newer kernels (4.15+) as cgroup namespaces and device
access is inherent to the creation of a container namespace.
> Example Error: http://paste.openstack.org/show/702764
While this change is introducing new config into the container it is not
forcing a container restart. This is approach has been taken to ensure
we're correcting the issue on greenfield deployments but not impacting
running ones.
Change-Id: I31b1b5a044687f52b1c54049ba03c65ecda34b51
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
In order to allow the use of the environment variable which informs
Ansible which user executed the playbook, we pass the USER env var
into the environment that tox builds.
Change-Id: I5ba653fceec3db1073ab639d835f6a250b11a4e6
Implements: blueprint python-build-install-simplification
Signed-off-by: Jesse Pretorius <jesse.pretorius@rackspace.co.uk>
This patch changes the flush routes handler to flush the entire
interface config from the interface. This is needed because
systemd-networkd does not restore the route of non-DHCP interfaces
when flushing routes and restarting systemd-networks.
Change-Id: I17748b0dd2307fd9bee705140c67883140090298
Signed-off-by: Major Hayden <major@mhtx.net>
Patch I0d83fd4895d4c5beaf5a84a239c1a1ed71521dee dropped the ARP=yes
option for networkd because it's not supported by old systemd releases.
This however brings back a problem where the default one sysctl
arp_notify option in the kernel may not correctly set for our use case.
Containers are created with random MAC addresses so we need to ensure
that ARP entries are populated correctly when a container is restarted.
Instead of having to implement some sort of a new workaround on the host,
it's probably better to create all containers with fixed MAC addresses from
now on.
Change-Id: I8ad390fc3ce27756f26c57c92aaa3adc8e506a17
We should use domain names for the external network testing task in
order to verify no only that the default gateway works properly but also
that our DNS is able to resolve hostnames.
Change-Id: I3aebcf1dff8268e4dbaebae8fb598ee27e3f481d
Depends-On: I316c3851f40f08d272b7bb5f7165e010e3a95c3a
Depends-On: Ied7632037f737c3f32c34dac70531065c54496c9
Depends-On: I14f8373897da28dea2ea03500c2be46c5b40d51c
Depends-On: I0d83fd4895d4c5beaf5a84a239c1a1ed71521dee
The ARP option has been added in systemd-232. As such, current stable
distributions may not support it so drop the option and let the kernel
decide what to do with ARP. Fixes the following warning:
[/etc/systemd/network/eth0.network:14] Unknown lvalue 'ARP' in section 'Link'
Link: https://github.com/systemd/systemd/pull/3854
Link: 99d2baa2ca
Depends-On: I14f8373897da28dea2ea03500c2be46c5b40d51c
Change-Id: I0d83fd4895d4c5beaf5a84a239c1a1ed71521dee
The UseDNS option requires the systemd-resolved service so set this
option based on the lxc_container_enable_resolved variable.
Change-Id: I5b7c3f01534f5ccbaf76aced673aefc6ec7fcf6e
When using a static route we need to set a route metric to ensure the
priority of the routes being passed in. This change ensures we maintain
our expected interface and functionality should any static routes be
passed into the container.
Before the implementation of networkd, EIN would amend the main table
with the defined routes in the order they were written. However
systemd-networkd inserts the defined routes at the top of the default
table which can cause confusion and conflict. This change simply adds
a route metric to all defined routes and increments the metric integer
based on the list index which explicitly ensures all defined routes
are prioritized in the order in which they were written.
Change-Id: I13768580fbd926033fde4a74cbbf90b9eda24658
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
openSUSE and CentOS have been voting jobs for a while so we should
start testing all the scenarios on them. The only job that hasn't
been added is the ZFS one since there is no such package on openSUSE
or CentOS.
Change-Id: Icde5ed7a4e6be8ac19412f15b84febf2096ba404
The ansible_distribution variable is causing some troubles since it can
contain spaces etc. As such, we can simply use the ansible_pkg_mgr
module to figure out the name of the package we want to install.
Change-Id: Ic92eb1f9030df2883b049b9868e031ff4f0d42f2
Unify container network interfaces using Systemd Networkd for ubuntu,
centos, and openSUSE. This change allows the role to use a single way to
configure container networks.
Care has been taken to ensure we're able to cleanly upgrade to the new
capabilities within existing environments without breaking any feature
compatibility or causing any container restarts.
It's also worth noting that all of the pre/post networking up/down
script options have been converted to systemd "oneshot" services. This
retains the ability to run adhoc scripts post network availability
while also opening up this capability, which used to be ubuntu only,
to all of our supported operating systems.
> Our usage of `lxc-attach` was removed in favor of `nsenter` to fix a
issue where multiple `lxc-attach` commands issued to a single physical
host could result in a hang.
> Scripts that were being generated inline have been placed into
template files. This solves a long standing memory consumption issue
when creating lots of containers. The old shell tasks will now be
executed from a generated script. While this should also help with
debugging, the main driver is to ensure better system stability.
> A lot of cleanup has been done throughout the task files and
templates. In the process of updating the role to use unified
networking a lot of duplicate tasks, scripts, and processes have
consolidated.
> Handlers have been added for network connection wait conditions and
to various service restarts.
> The OSA plugins have been added to this role as a dependency. We
rely on the connection plugins throughout the stack however we were
doing a lot of workarounds to cater to the possibility of a deployer
running this role without them. This change simply adds the plugins
as a known dependency which allows for a more streamlined setup.
Change-Id: I5d3ddcfa11d575648a69a04f2fb30236c2c89da3
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The systemd machine-id needs to be unique on all network attached
devices. This change ensures that when a container comes online, a
unique machine-id is generated if one was not already present. When
the machine-id is created for the first time the container will restart
so the new ID can take effect.
More information on the machine-id can be found here:
https://www.freedesktop.org/software/systemd/man/machine-id.html
Change-Id: Ib25aeeecf1e6001e6c6b1a7d6b6d50eca7ab45fa
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Zuul no longer requires the project-name for in-repo configuration.
Omitting it makes forking or renaming projects easier.
Change-Id: Ie0a2f156de3c136439ac4dc5e28b16ed5509288c