With I57047682cfa82ba6ca4affff54fab5216e9ba51c Heat has added
a new template version for wallaby. This would allow us to use
2-argument variant of the ``if`` function that would allow for
e.g. conditional definition of resource properties and help
cleanup templates. If only two arguments are passed to ``if``
function, the entire enclosing item is removed when the condition
is false.
Change-Id: I25f981b60c6a66b39919adc38c02a051b6c51269
This changes all these parameters as heat would correctly
parse all values. Also, drops all yaql shenanigans
used for their handling and heat conditions.
Also fixes wrong usage of non-existent NeutronWrapperDebug
parameter in ovn-metadata-container-puppet.yaml.
We had converted all ``Debug`` parameters to boolean with
Ib6c3969d4dd75d5fb2cc274266c060acff8d5571.
Change-Id: Ia2bffffde34aa248a4cc60c3895464f1f9d1ded2
- removes duplicate keys from yaml files by assuming that the last
one was the desired one (matches current loader behavior)
- prevent regressions by activating yaml lint rule that detects them
(yaml skip was silencing all yaml checks, so the long list seen
is in fact shorter than just 'yaml')
- includes sorting of some of the keys, was needed in order to spot
the duplicates.
Change-Id: Idf5c0041a0c6d3ed7d5d49fb68be856719916663
We do not need to add an if: internal_tls_enabled in a number of
ansible tasks. enabled_internal_tls is already defined as an ansible
fact in common/deploy-steps.j2:
enable_internal_tls: {get_param: EnableInternalTLS}
So when the service uses the enable_internal_tls condition and it points
to the EnableInternalTLS param, we can just use the ansible fact
directly. Note that if the enable_internal_tls condition points to
something else than the mere EnableInternalTLS we may not do this
cleanup.
Change-Id: Idb07cbc8fc3a4d73ff52c54d869310fd6c49b502
The ML2/OVN driver does support AZ-aware routing scheduling from
stable/train and newer versions.
This patch also removes the check from the DHCP agent container because
although not a typical deployment, the Neutron DHCP agent can be
deployed with OVN for special cases such as baremetal provisioning where
the agent would serve DHCP to the baremetal instances.
Change-Id: I8941c4d9a8e68eb775c910495de4aff9fbc67206
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
This is using linux-system-roles.certificate ansible role,
which replaces puppet-certmonger for submitting certificate
requests to certmonger. Each service is configured through
it's heat template.
Partial-Implements: blueprint ansible-certmonger
Depends-On: https://review.rdoproject.org/r/31713
Change-Id: Ib868465c20d97c62cbcb214bfc62d949bd6efc62
This was mainly there as an legacy interface which was
for internal use. Now that we pull the passwords from
the existing environment and don't use it, we can drop
this.
Reduces a number of heat resources.
Change-Id: If83d0f3d72a229d737a45b2fd37507dc11a04649
Adding the ability to specifies the private key size
used when creating the certificate. We have defined the
default value the same as we have before 2048 bits.
Also, it'll be able to override the key_size value
per service.
Depends-on: I4da96f2164cf1d136f9471f1d6251bdd8cfd2d0b
Change-Id: Ic2edabb7f1bd0caf4a5550d03f60fab7c8354d65
Fix the tasks that remove the temporary namespace when running in check
mode.
Checking that the rc variable is actually defined.
Change-Id: I1f0512532f564d58343440bd0a6594da9609b65d
Now that the FFU process relies on the upgrade_tasks and deployment
tasts there is no need to keep the old fast_forward_upgrade_tasks.
This patch removes all the fast_forward_upgrade_tasks section from
the services, as well as from the common structures.
Change-Id: I39b8a846145fdc2fb3d0f6853df541c773ee455e
When running Ansible in check mode (aka dry run), some tasks need some
changes, specially around variables and make sure they are actually
defined.
Change-Id: I337aa287f1c88a0e2707b441fc6b19b997d52385
These tasks are really should be managed a single time against the host
rather than at deployment time.
Change-Id: I535d8360493267d50196aebb6365124b67e9ba78
Related-Bug: #1883609
This reverts commit 1517df0fc30b7b10263aa96fe48978d7bf17a0fe.
We reverted the sidecar wrappers, we don't need this anymore.
Change-Id: Ia69b7e489db9b26db852083bf5991b64df5b80a5
We've found that the systemd sidecars tend to drop events when spawning
multiple processes at once. Rather than continue to try and patch it, we
need to go back to the drawing board. This change reverts the various
patches that were related to the systemd side car code.
Revert "Use exec when spawning any neutron sidecar container"
This reverts commit 5b799136facc15d4e69bcede52b60d39a4a02464.
Revert "Remove neutron wrappers usage"
This reverts commit f4f3045c413e7da083dbd8495ef758c2ac86870d.
Revert "Use a systemd service to handle sidecar containers"
This reverts commit 2dc7066b050ecf22dc9e5909061272ffe765ebfc.
Change-Id: I8b9578b7c7d6bd23f0b677f64afae7be76ddcadf
The lock used in the wrapper is under /var/lock in the container which
is not shared with the host so the sync script never waits for the
wrapper to be done. Moving the lock file to a path on a shared mount in
the container seems to solve that particular race.
Partial-bug: #1874470
Change-Id: Iaa3a19bc47241e6eb686d65c1a198ec69505398e
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
Similarly to dnsmasq [0], other processes can receive SIGHUP. This is
allowed by rootwrap filters for all processes [1], and I found some
examples when running neutron-tempest-plugin tests checking
l3-agent.log files:
Running command (rootwrap daemon): ['radvd-kill', 'HUP', '712810'] execute_rootwrap_daemon /usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py:103
Running command (rootwrap daemon): ['keepalived-kill', 'HUP', '402009'] execute_rootwrap_daemon /usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py:103
To prevent additional similar issues, apply similar fix for these
sidecar containers too
Related-Bug: #1867192
[0] I1af2ecd9e3996de4f43224f66a8bdb81eab07022
[1] https://opendev.org/openstack/neutron/src/branch/master/etc/neutron/rootwrap.d
Change-Id: I31237d21527a2909a1669cb6c80cc0fa9be798a6
With I2feb9e81bc40e44cb2c7a2972366fa4b16590227, we don't need to
set wrapper parameters as everything is deployed by Ansible.
Change-Id: Ie03450aa0796614a686f6c390c9b0088fcf591f0
Blueprint: safe-side-containers
We see some deployment failures where the overcloud is unable to PXE/DHCP boot during the initial bits of the deployments. The following errors are seen in neutron dhcp logs:
2020-03-11 17:58:33.737 54481 DEBUG neutron.agent.dhcp.agent [req-6caace19-095f-4115-be85-644f7a8baa7f - - - - -] Resync event has been scheduled _periodic_resync_helper /usr/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:277
2020-03-11 17:58:33.737 54481 DEBUG neutron.common.utils [req-6caace19-095f-4115-be85-644f7a8baa7f - - - - -] Calling throttled function clear wrapper /usr/lib/python3.6/site-packages/neutron/common/utils.py:110
2020-03-11 17:58:33.738 54481 DEBUG neutron.agent.dhcp.agent [req-6caace19-095f-4115-be85-644f7a8baa7f - - - - -] resync (a187b137-b68c-476e-bd37-39253158e762): [ProcessExecutionError("Exit code: 125; Stdin: ; Stdout: ; Stderr: + exec\n+ trap 'exec 2>&4 1>&3' 0 1 2 3\n+ exec\n",)] _periodic_resync_helper /usr/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:294
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent [-] Unable to reload_allocations dhcp for a187b137-b68c-476e-bd37-39253158e762.: neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: ; Stderr: + exec
+ trap 'exec 2>&4 1>&3' 0 1 2 3
+ exec
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py", line 160, in call_driver
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent getattr(driver, action)(**action_kwargs)
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3.6/site-packages/neutron/agent/linux/dhcp.py", line 528, in reload_allocations
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent self._spawn_or_reload_process(reload_with_HUP=True)
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3.6/site-packages/neutron/agent/linux/dhcp.py", line 470, in _spawn_or_reload_process
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent pm.enable(reload_cfg=reload_with_HUP, ensure_active=True)
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3.6/site-packages/neutron/agent/linux/external_process.py", line 92, in enable
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent self.reload_cfg()
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3.6/site-packages/neutron/agent/linux/external_process.py", line 100, in reload_cfg
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent self.disable('HUP')
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3.6/site-packages/neutron/agent/linux/external_process.py", line 113, in disable
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent utils.execute(cmd, run_as_root=self.run_as_root)
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py", line 147, in execute
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent returncode=returncode)
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: ; Stderr: + exec
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent + trap 'exec 2>&4 1>&3' 0 1 2 3
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent + exec
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent
2020-03-11 17:58:33.738 54481 ERROR neutron.agent.dhcp.agent
2020-03-11 17:58:33.740 54481 DEBUG neutron.agent.linux.utils [-] Unable to access /var/lib/neutron/external/pids/a187b137-b68c-476e-bd37-39253158e762.pid.haproxy get_value_from_file /usr/lib/pyt
The issue is that the dhcp side containers are spawned with the following processes:
|-conmon(375906)-+-dumb-init(375918)---bash(375935)---dnsmasq(375938)
| `-{conmon}(375908)
Now when neutron wants to send a SIGHUP to the dnsmasq it actually invokes the following command:
nsenter --net=/run/netns/qdhcp-e11ac152-745f-4292-b423-d282fbf97f13 --preserve-credentials -m -t 1 podman kill --signal HUP 0b7371f6de52cfe377858...
The problem is that podman kill will send the signal to "dumb-init
--single-child" (pid1 for this container) which will then forward it only to
bash, which will cause dnsmasq to be terminated and will eventually be later
respawned with a different pid (stored in
/var/lib/neutron/dhcp/86884abc-f7d7-4118-923f-38b247fee8e9/pid).
So if multiple ports are created concurrently this is racy and one of them will fail to reload dnsmasq with the error above, because one process might use a pid file that is no longer valid.
TLDR: this all works if SIGHUP to the dnsmasq process does not change pids under the hood all of a sudden.
Otherwise a SIGHUP used to reload dnsmasq will be sent to the bash
process father of dnsmasq which will then terminate dnsmasq and break
all the things
Tested this on three runs and did not experience the issue any longer
Co-Authored-By: Bernard Cafarelli <bcafarel@redhat.com>
Co-Authored-By: Slawomir Kaplonski <skaplons@redhat.com>
Closes-Bug: #1867192
Change-Id: I1af2ecd9e3996de4f43224f66a8bdb81eab07022
- deploy-steps-tasks-step-1.yaml: Do not ignore errors when dealing
with check-mode directories. The file module is resilient enough to
not fail if the path is already absent.
- deploy-steps-tasks.yaml: Replace ignore_errors by another condition,
"not ansible_check_mode"; this task is not needed in check mode.
- generate-config-tasks.yaml: Replace ignore_errors by another
condition, "not ansible_check_mode"; this task is not needed in check mode.
- Neutron wrappers: use fail_key: False instead of ignore_errors: True
if a key can't be found in /etc/passwd.
- All services with service checks: Replace "ignore_errors: true" by
"failed_when: false". Since we don't care about whether or not the
task returns 0, let's just make the task never fail. It will only
improve UX when scrawling logs; no more failure will be shown for
these tasks.
- Same as above for cibadmin commands, cluster resources show
commands and keepalived container restart command; and all other shell
or command or yum modules uses where we just don't care about their potential
failures.
- Aodh/Gnocchi: Add pipefail so the task isn't support to fail
- tripleo-packages-baremetal-puppet and undercloud-upgrade: check shell
rc instead of "succeeded", since the task will always succeed.
Change-Id: I0c44db40e1b9a935e7dde115bb0c9affa15c42bf
While they are, at SELinux level, exactly the same (one is an alias to
the other), the "container_file_t" name is easier to understand (and
shorter to write).
A second pass in a couple of days or weeks will be needed in order to
change files that were merged after this first pass.
Change-Id: Ib4b3e65dbaeb5894403301251866b9817240a9d5
The next iteration of fast-forward-upgrade will be
from queens through to train, so we update the names
accordingly.
Change-Id: Ia6d73c33774218b70c1ed7fa9eaad882fde2eefe
Ansible has decided that roles with hypens in them are no longer supported
by not including support for them in collections. This change renames all
the roles we use to the new role name.
Depends-On: Ie899714aca49781ccd240bb259901d76f177d2ae
Change-Id: I4d41b2678a0f340792dd5c601342541ade771c26
Signed-off-by: Kevin Carter <kecarter@redhat.com>
Make sure we depends on a systemd service by having the .service in the
service name that we depend on.
Otherwise it leads to errors in /var/log/messages:
Failed to add dependency on openvswitch, ignoring: Invalid argument
Change-Id: I35230c6dfd8bc7ea2c45f7d2e1e5b5f4316a9375
When podman parses such volume map it removes the slash
automatically and shows in inspection volumes w/o slash.
When comparing configurations it turns to be a difference and
it breaks idempotency of containers, causing them to be recreated.
Change-Id: Ifdebecc8c7975b6f5cfefb14b0133be247b7abf0
This reverts commit af80a0d914d9663079ad30c7dcdf73e1060c33e7.
Reason: the added SELinux rule actually allows openvswitch to write in
container_file_t - not the contrary. We therefore still need the ":z" flag.
A possible follow-up would be to drop the "shared" flag (useless) and
remove the duplicated mount.
Change-Id: Idc8813792b5c6d4d4226491f81de2965beeaadbe
This change converts our filewall deployment practice to use
the tripleo-ansible firewall role. This change creates a new
"firewall_rules" object which is queried using YAQL from the
"FirewallRules" resource.
A new parameter has been added allowing users to input
additional firewall rules as needed. The new parameter is
`ExtraFirewallRules` and will be merged on top of the YAQL
interface.
Depends-On: Ie5d0f51d7efccd112847d3f1edf5fd9cdb1edeed
Change-Id: I1be209a04f599d1d018e730c92f1fc8dd9bf884b
Signed-off-by: Kevin Carter <kecarter@redhat.com>
This change switches the neutron dhcp, l3 and ovn containers to use
ansible on the host to write out systemd & service scripts that can be
used to trigger side car containers to be launched from within the
target containers.
Change-Id: I2feb9e81bc40e44cb2c7a2972366fa4b16590227
Blueprint: safe-side-containers
Depends-On: https://review.opendev.org/693442
When upgrading from Rocky to Stein we moved also from using the docker
container engine into Podman. To ensure that every single docker container
was removed after the upgrade a post_upgrade task was added which made
use of the tripleo-docker-rm role that removed the container. In this cycle,
from Stein to Train both the Undercloud and Overcloud work with Podman, so
there is no need to remove any docker container anymore.
This patch removes all the tripleo-docker-rm post-upgrade task and in those
services which only included a single task, the post-upgrade-tasks section
is also erased.
Change-Id: I5c9ab55ec6ff332056a426a76e150ea3c9063c6e
Moving all the container environments from lists to dicts, so they can
be consumed later by the podman_container ansible module which uses
dict.
Using a dict is also easier to parse, since it doesn't involve "=" for
each item in the environment to export.
Change-Id: I894f339cdf03bc2a93c588f826f738b0b851a3ad
Depends-On: I98c75e03d78885173d829fa850f35c52c625e6bb
Use the ipversion parameter for firewall rules to contain
rule creation in either iptables or ip6tables. Add rules
in ironic-inspector and neutron deployment template to
add rules for DHCPv6 in ip6tables.
DHCPv6 relay and DHCPv6 server both use port 547 so 547
need to be open for both INPUT and OUTPUT.
Related-bug: #1845153
Depends-On: Id872c55cfc6b958fef3ccda2d923f821a1fe6a13
Depends-On: I8b453f7c13c2015aa208ed1bddcdca246cdca58d
Change-Id: If91b883459488856ae54e3ca0d0fb97d4d248f97
This patch removes fluentd composable service in favor of rsyslog composable service
and modifies *LoggingSource configuration accordingly.
Change-Id: I1e12470b4eea86d8b7a971875d28a2a5e50d5e07
The tripleo-docker-rm role has been replaced by tripleo-container-rm [0].
This role will identify the docker engine via the container_cli variable
and perform a deletion of that container. However, these tasks inside the
post_upgrade_tasks section were thought to remove the old docker containers
after upgrading from rocky to stein, in which podman starts to be the
container engine by default.
For that reason, we need to ensure that the container engine in which the
containers are removed is docker, as otherwise we will be removing the
podman container and the deployment steps will fail.
Closes-Bug: #1836531
[0] - 2135446a35
Depends-On: https://review.opendev.org/#/c/671698/
Change-Id: Ib139a1d77f71fc32a49c9878d1b4a6d07564e9dc
Neutron introduced "kill script" support for its agents, allowing to do
more than a simple "kill <pid>".
This patch intends to activate this new feature, allowing to avoid
dangling containers with failed exit state.
It supports the "HUP" and "9" signal - first one invokes the "kill
--signal HUP" commande from the container_cli, while the second one will
stop and delete the container.
Other signals will return an error, since they aren't known.
The kill-script also supports the global Debug flag for a more verbose
output.
This patch also adds a soon to be deprecated parameter
DockerAdditionalSockets in order to make the change compatible with
setups still using Docker (HA deploy on Centos-7 and RHEL-7 for
example).
For more information about Neutron new kill script feature, please have
a look at this change: I29dfbedfb7167982323dcff1c4554ee780cc48db
Depends-On: https://review.opendev.org/661760
Change-Id: Iafa57b462f5ee205345a8d6e6d460ab68f312099
NeutronMechanismDrivers is a list and we should examine it
if it contains 'ovn' or not.
Change-Id: I2bd9d7150c1f08f078f1a3a709138fbe3e66d365
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
Allow Neutron to be configured for use of AZs w/o additional t-h-t
services introduced.
Limitations for the used NeutronMechanismDrivers:
* Right now OVN doesn't support AZ aware routing scheduling (later in
Train cycle OVN ml2 driver will be extended to support it).
* Nor there is Neutron agents deployed normally for OVN.
* We do allow the L3 agent taking AZ configs disregard of
the used NeutronMechanismDrivers.
* But we take the safe path for the DHCP agent and prohibit AZ
configuration for it in the OVN case.
So there is effectively nothing applies there for Neutron AZs and OVN
as it takes a little to no sense to do that yet.
Related blueprint split-controlplane-templates
Change-Id: I0d97b004c4f162fdefc97a7b603c0136686fa21c
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
This converts all Docker*Image parameter varients into
Container*Image varients.
The commit was autogenerated with the following shell commands:
for file in $(grep -lr Docker.*Image --include \*.yaml --exclude-dir releasenotes); do
sed -e "s|Docker\([^ ]*Image\)|Container\1|g" -i $file
done
Change-Id: Iab06efa5616975b99aa5772a65b415629f8d7882
Depends-On: I7d62a3424ccb7b01dc101329018ebda896ea8ff3
Depends-On: Ib1dc0c08ce7971a03639acc42b1e738d93a52f98