This changes adds an additional task to make sure the
output of the useradd module is consistent when the
HOME dir for a given user is created.
In particular this tasks ensure the HOME dir has the
right owner/group associated to the user that has been
created.
Closes-Bug: #1917856
Change-Id: I64846594123b9d5f333082b3f7714186713caffb
Adds a module tripleo_service_vip which manages
a neutron API port resource for service virtuap IPs.
(redis and ovn_dbs) when neutron service is available.
When the service network is 'ctlplane' the module does
a find_port for the controlplane_virtual_ip so that the
ctlplane VIP is used in this case.
When the neutron service is not available the module
looks for a pre-defined ip address in the fixed_ips
option. If present this address is used, if not present
an exception is raised.
The module updates the 'service_vip_vars.yaml' file in
the playbook directory so that include_vars can be used
to load the variables.
The change also updates action/tripleo_all_nodes_data.py
and tripleo_hieradata templates to use the new variable
to source the service virtual IPs for redis and ovn_dbs
services.
Related: blueprint network-data-v2-ports
Depends-On: https://review.opendev.org/777252
Change-Id: I6b2ae7388f8af15f2fd3dcbc5e671c169700fff6
ceph-ansible still requires that all the facts be collected by default
and inject fact vars is enabled. We want to turn that off for tripleo,
but in the mean time we need to force it back on when in run ansible for
ceph.
Change-Id: I607c29c45148b57dee34741397cf7a16ced8ef78
Related-Bug: #1915761
Related-Bug: #1917621
When running fetch with become, the slurp module will also
be used to fetch the contents of the file for determining
the remote checksum. This effectively doubles the transfer
size [0] and shows up as a MemoryError when the file size
is large enough.
In TripleO this is problematic in large & old deployments
when transferring the /var/lib/mysql folder.
This patch switches to using rsync directly between the src
and dst hosts to transfer the data. This is advantageous not
only for solving the above-mentioned bug, but is also faster.
A simpler implementation using synchronize was attempted [1],
but there were issues with the mistral container which
prevented that approach from being successful.
[0] https://docs.ansible.com/ansible/latest/collections/ansible/builtin/fetch_module.html#notes
[1] https://review.opendev.org/c/openstack/tripleo-ansible/+/776565/11
Closes-Bug: #1908425
Closes-Bug: rhbz#1904681
Closes-Bug: rhbz#1916162
Depends-On: https://review.opendev.org/778456
Change-Id: Ifc03f9eb1cb4ca3faec194569f4cb2dace93323f
When Ceph is deployed by TripleO but no services using
it are explicitly enabled in TripleO, an empty pool list
if generated (for the same reason), and the osd caps
results in an empty string, turning the cluster in a
non healthy status.
This patch introduces two new tasks to selectively create
osd caps when profiles >= 0.
Closes-Bug: #1917440
Change-Id: Id165c19121c9036a33b10d6b6d51f3bdc528307b
A change in the desired public key was not being reflected in the
Overcloud on stack update/upgrade. TripleO should replace the keypair
even when the desired public key changed, i.e. its fingerprint does not
match the one (possibly) already existing in the overcloud. We should
compare fingerprints and replace (delete and create, no keypair update
option) when they mismatch.
Closes-Bug: #1861031
Change-Id: I953c35c9ec24844598108bc173e84868393a98aa
The current driver is podman which creates problems when we
try to use the synchronize module for the file copy between
hosts. To make room for this option, we switch to using the
delegated driver and two inventory hosts (which are localhost)
instead which is more efficient, more portable and opens more
options.
We also add 'any_errors_fatal: true' to the play
arguments to ensure that any error stops the whole test.
The converge playbook is targeted at localhost, rather than
all instances, because that is the way that the role being
tested is used in TripleO.
Related-Bug: #1908425
Related-Bug: rhbz#1904681
Related-Bug: rhbz#1916162
Change-Id: I4d5049ef863c5685b1d817a865a8a44c4429480c
Since we're still provisioning the Ceph cluster at step2 we need
to call the cephadm playbook the same way as ceph-ansible.
The purpose of this role is to be able to run the cephadm playbook
using external_deploy_steps_tasks.
The actions implemented in this role are:
1. prepare: build a cephadm dir within the config-download dump
2. enable_ceph_admin_user via cli-enable-ssh-admin.yaml playbook
3. translates the tht paramters and make them available to the role
4. call the ansible playbook that runs cephadm and apply the spec
Change-Id: If066dd19f1e9c75fd6581fddb5b55cb37eb57809
The task is called PauseXXXX and it realy does the opposite,
this name set the name properly to unPause
Change-Id: Ice9482c635d104c30f1f44e92475dc517a7fd527
In order to ANSIBLE_INJECT_FACT_VARS=False we have to use ansible_facts
instead of ansible_* vars. This change switches our distribution and
hostname related items to use ansible_facts instead.
Change-Id: Id01e754f0cf9f6e98c02f45a4011f3d6f59f80a1
Related-Bug: #1915761
This role is provided as part of the implementation of the
tripleo-ceph spec. It is an Ansible wrapper to call the
cephadm Ceph tool and it contains the Ansible modules
ceph_key and ceph_pool from ceph-ansible for managing
cephx keys and ceph pools.
Implements: blueprint tripleo-ceph
Change-Id: I60d6857b888ef97242c4f4bbf20fbc62de5ef29f
Extend FRR configuration to set source IPv6 address, similar to
Change-Id I43852cb3570b8cb12a35f4bc641a42ddfd8ad7f1 for IPv4.
Change-Id: I0b4e3762aea3e25398e82be9f9be3adcc38ee685
We've seen that large amount of facts for hosts have a direct impact on
task execution as part of the deployment. This change reduces the
amount of data that we are collecting when we use facts and leverages
more targeted methods to collect the required information.
Change-Id: I49e6ca02c2b4791641fb27ebf258ef6c9d52dd9e
Related-Bug: #1915761
It was earlier possible to override network config
for undercloud with a custom config using
``net_config_override`` in undercloud.conf or with
``UndercloudNetConfigOverride`` parameter. Though it's
now possible to change ``UndercloudNetworkConfigTemplate``
parameter to override the default config, we probably
still need to support``net_config_override`` and
``UndercloudNetConfigOverride`` for backward compatibility.
Closes-Bug: #1915585
Depends-On: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/775471
Change-Id: Ied2caf9cd5c1b14d30d0badb4e949f620263c96e
FRR supports enforcing Generalized TTL Security Mechanism (GTSM) where
only neighbors that are the specified number of hops away will be
allowed to become neighbors.
This patch adds a new option to set the number of hops allowed,
defaulting to 1 for strict security out of the box. Setting value to
zero or less will disable GTSM.
Change-Id: I1166f22fef8e3f6b825343b4e2792ce9cfb10547
Starting With podman 2.X the default pids-limits has been halved from
4096 to 2048:
$ rpm -q podman && podman run --rm -it --net=host --name 'pids' edecd409281d sh -c 'cat /sys/fs/cgroup/pids/pids.max'
podman-2.2.1-3.module+el8.3.1+9392+c5f6d096.x86_64
2048
With podman-1.6.4 the global default pid-limits was hardcoded to 4096
and we had no way to tweak it.
With podman 2.X it is possible to override this in
/etc/containers/containers.conf with the pids_limit setting inside
[containers] in the /etc/containers/containers.conf ini-file:
$ cat /etc/containers/containers.conf
[containers]
pids_limit=6666
$ podman run --rm -it --net=host --name 'pids' edecd409281d sh -c 'cat /sys/fs/cgroup/pids/pids.max'
6666
By adding this we keep the older 4096 default so we do not regress and
at the same time we allow an operator to override this globally.
Related-Bug: #1915122
Change-Id: Id5d5fb9d20c0295763c78171190b9eda13508617
This also removes the modules ``tripleo_reset_params`` module
that was used to reset parameters in plan and changed
``tripleo_derived_parameters`` role to not try and update the
plan with derived parameters.
Change-Id: I9b087452ef56b9ff53d08406158d8e1e5a3328f0
While we need to start the legacy network config on boot, we don't need to
try and start it on task invocation. Starting the legacy network on an
already running host will result in failure because ports are un-able to
be rebound. This change removes the start action and just makes sure that
the legacy network service is enabled.
Change-Id: I88cd4fd907262d6a1bedbe7b76bc025eeb45a837
Signed-off-by: Kevin Carter <kecarter@redhat.com>
Currently users have to pass in a dictionary of HOSTNAME:SOURCE_IPV4 of
all nodes in the deployment. An example is as follows from THT:
FrrBgpIpv4LoopbackMap:
ctrl-1-0: 99.99.1.1
ctrl-2-0: 99.99.2.1
ctrl-3-0: 99.99.3.1
cmp-1-0: 99.99.1.2
cmp-2-0: 99.99.2.2
cmp-3-0: 99.99.3.2
cmp-1-1: 99.99.1.3
cmp-2-1: 99.99.2.3
cmp-3-1: 99.99.3.3
This is rather time consuming, prone to typos and requires updating at
node scale up/down. It would be much easier if users could just pass in
the network and have tripleo_frr get the IP from given network.
Snip from tripleo-ansible-inventory.yaml:
ControllerRack1:
hosts:
ctrl-1-0: {ansible_host: 192.168.1.101, canonical_hostname: ctrl-1-0.bgp.ftw,
main_network_hostname: ctrl-1-0.mainnetwork.bgp.ftw, main_network_ip: 99.99.1.1,
[...]
Change-Id: I43852cb3570b8cb12a35f4bc641a42ddfd8ad7f1
By default, the ReaR rpm installs a cronjob on /etc/cron.d/rear with
a default programming. This programming runs a backup job at
1:30AM every day. During this automatic backup, the services are not
appropiately paused. This causes the created backup to have non-valid
data.
As the most recent backup overwrites the older ones, this deletes
the "good" backup taken at operator's discretion with a new one
that cannot be used to restore the control plane state.
This change aims to optionally disable this default rpm behaviour.
Closes-Bug: #1912764
Change-Id: I2582c1ba74ae115a94ecb4524ba34e79ea5b43e8
* Hosts with an rhsm subscription can interfere with local testing
* /etc/rhsm-host is symlinked to rhsm config mounted from the host
* This disables it to prevent host repos from mixing with test repos
Change-Id: Ie08c7196dbc1af7429f6904c84b300d4ce08d9a2
When a user creates a HA load balancer in Octavia, Octavia creates
server groups as part of the load balancer resources. However because
the default quotas related to server group are very low and we have all
load balancer resources in the common service project, users can create
very limited number of HA load balancers by default.
This patch disables the quota limits of the server-group-members and
server-groups of the service project, so that HA load balancer creation
is not blocked by these quotas.
Closes-Bug: #1914018
Change-Id: I0048fec8c1e19bd20b1edcd23f2490456fe1cd12
Satellite server does not have an actual namespace in the url so the
container location is just host/container:tag. We need to be able to
properly set the headers on blobs without namespaces.
Change-Id: Ia6728d68305ba3c662e99ea067b4d4feef9eeea0
Related-Bug: #1913782
This allow the master playbook used for update to set
tripleo_redhat_enforce to false on a per role basis on Red Hat
environment.
The default in defaults/main.yml is now "true" so that it keeps its
behavior of being run by default if nothing is changed in the role
definition.
We then avoid running it on other plateform than Red Hat by adding an
explicit test in that tasks/main.yml file.
Overall the behavior is as follows:
| Red Hat Env | tripleo_enforced | Test run |
|-------------+----------------------+----------|
| True | Unset | Yes |
| True | Set to true in role | Yes |
| True | Set to false in role | No |
| False | Doesn't matter | No |
Change-Id: I6268a01d16f8288bf862003d19184fc93b88282a
Partial-Bug: #1912512
If the NFS server firewalld does not open the ports, ReaR cannot correctly mount the NFS server
while performing the backup and/or restore, and subsequently the action fails and the
openstack-ansible playbook stops running.
This change checks whether the server chosen to be NFS server has firewalld running, and implies that
if it is running, the operator must declare the firewalld zone where the ports must be opened.
Closes-Bug: #1912366
Change-Id: Ic6816fa647653baf8297dc62cdd99ee522b86535
Do not assume we will always have hostvars[<node>]['storage_ip'].
Instead use the service_net_map, found in global_vars.yaml of
config-download. Within this directory, if ceph_mon on the list
tripleo_enabled_services, then there will be a service_net_map
containing the ceph_mon_network. As per tripleo_common/inventory.py,
this network name, whatever it is composed to, will have an '_ip'
appended to it which will map to the correct IP address. Without
network isolation ceph_mon_network will default to ctlplane. With
network isolation it will default to storage, but it could also
be composed to anything, so we can use this method to pick up
whatever it is.
Closes-Bug: #1912218
Change-Id: I7c1052b1c27ea91c5f97f59ec80c906d60d5f13e
Given how config-download runs in the main branch it's no longer
necessary to use become when creating the work directories for
ceph-ansible to be executed or when running the tripleo_ceph_client
role. Using become introduces the bug this change resolves. Also,
as we are not using become we won't set the owner of the directory.
Instead we will use the default owner of whoever created the directory.
Change-Id: I65cd66ed5c94b548b775b9b4829717c202837d7e
Closes-Bug: #1912103
When priviledge mode is set, don't add any capabilities as they
are included.
Use 1.6.4 podman because 2.0.5 rootless doesn't work with
systemd [1]
Disable Selinux on host.
[1] https://github.com/containers/podman/issues/8965
Closes-Bug: #1910970
Change-Id: I73ac1c405e8a3539937a5578bb003cba0b935d94
We currently forcefully install pacemaker + pcs in a number of upgrade
tasks. This is suboptimal because on certain nodes pcs/pacemaker repos
are not enabled. Let's let puppet pull in those packages normally.
Tested this during a queen -> train FFU successfully on a composable
roles environment.
Closes-Bug: #1911684
Change-Id: I70d8bebf0d6cbaeff3f108c5da10b6bfbdff8ccf
In order to launch the container and connect via networking, we need
selinux disabled for a rootless container to still work. Let's move the
selinux disabling to first rather than later.
Change-Id: I345e8b8547b81e5791656d0fca6e90b1de48fdac
Add boolean option to distribute the private key which is
created by the cli-enable-ssh-admin.yaml playbook and update
the tripleo_create_admin role to distribute the private key
when it is true.
This option defaults to false as we normally don't want to
do this. However, cephadm needs a private key on all nodes
with the OS::TripleO::Services::CephMgr service in order to
manage a Ceph cluster. This option will likely only be used
for the ceph-admin user which is similar to but not the same
as the tripleo-admin user.
Also, remove old reference to Mistral in task name.
Implements: blueprint tripleo-ceph
Change-Id: I69c74c1869aa0f54c1695fd53098df7e78f64247
Add DCN map variable which can override Ceph Mon IPs, FSID, Name
and keys list. This variable may used to populate the fetch dir
with more than one set of keys and conf files per Ceph cluster
before the keys/conf file are synchronized. The user may then
iterate through a list of such maps and then inclue the role
for each of those maps.
Co-Authored-By: Francesco Pantano <fpantano@redhat.com>
Implements: blueprint tripleo-ceph-client
Change-Id: I938ab604859fda88f3491399444841a3a373d162
The tripleo_ceph_client role is supposed to replace the ceph-ansible
client and work for both cephadm and ceph-ansible based deployments.
The purpose of this role is to work with both internal and external
ceph, processing the input provided, generating the Ceph clients
(Nova, Cinder, Glance, Manila) configuration (keys and ceph.conf)
and push the generated files to the 'clients' group provided by the
TripleO inventory.
Implements: blueprint tripleo-ceph-client
Change-Id: Ia60bc6d5d1a04bd560f2fcb05a4b64078015ae9d
When the LVM filter is enabled, it should also run when the allow list
is empty.
Change-Id: I88ec250cb3e29c08ce7e7f5e02b5e7e48f997fea
Closes-Bug: #1907452
Due to the use of a folding block operator instead of the literal block
operator the check for existing namespaces does not work correct and
namespaces get created on subsequent deploy runs even if they already
exist. Now namespaces won't get created if the are already there.
Change-Id: I7ada7a7b78b7930a68d0204e217e3640c3dd5c73
cloud-init creates /etc/NetworkManager/conf.d/99-cloud-init.conf,
for NetworkManager to not update resolv.conf. However, subsequent
update of NetworkManager seems to remove that file as there is no
config(noreplace) directive for it. So after an update/upgrade of
overcloud nodes, reboot would cleanup the dns entries.
cloud-init do configuration only one time during first instance
boot. So this would ensure resolv.conf is not touched during
subsequent reboots.
Change-Id: I989e3f1d14fd33e97933032111cf48166dc5f50c
When enabling the OctaviaForwardAllLogs option, the amphora-agent
forwards all the logs to the rsyslog container. So rsyslog should accept
every incoming messages from the amphora and record them in
octavia-amphora.log (this configuration is similar to the rsyslog
configuration in the Octavia devstack plugin).
Change-Id: I63d64ebe7ea2f4cd8eeb09f256870dfd2e4c1e92
Closes-Bug: #1906414
When we delete and recreate a stack multiple times, it can lead to
outdated facts and cause tripleo-kernel to disable nic1 when it
shouldn't. This happens when we have a bond/team on nic1 at some point
during the deployment.
Closes-Bug: #1906082
Signed-off-by: David Vallee Delisle <dvd@redhat.com>
Change-Id: I808daf58b606c717ab1bbae7d3d869d5baa67352
With the Heat multi-nic example templates the
loop.index was used to set the nicX for each
interface, and ordered iteration was done over
all networks in network_data.yaml. The ctlplane
was always nic1, then nic2 was mapped to the
first network in the network_data.yaml definition.
When converting multi-nic templates to ansible
the ordering contract between network_data.yaml
and the nicX for each network was broken because
iteration happens only over networks associated
with the role.
This change restores the ordering contract in
the multi-nic templates by iterating over the
'networks_all' group_var which holds all enabled
networks in the order of apperance in
network_data.yaml.
Depends-On: https://review.opendev.org/763497
Closes-Bug: #1904894
Change-Id: I9d2767d7ce4f24645684fde6044c38a4b920dbb1
This is a follow up to https://review.opendev.org/763302
replacing the use of role_networks_lower in all the
templates.
Change-Id: Ia55b1de0a1a8c03b33a40f8c524c2a3b0f45f1b8
Related-Bug: #1904809
The 'networks_lower' group_var carries a mapping of
network.name to network.name_lower for all networks.
The external_bridge interface on DVR compute nodes
that do not have the 'External' network associated
with the role still need to be able to lookup the
name_lower for the 'External' network.
The lookup via role_networks_lower fail's since the
'External' network is'nt associated.
Also if the 'External' network is associated with the
role set the address and routes.
For the mutliple_nic_vlans add address and routes and
the vlan member interface on the bridge.
Note, also remove duplicate name entry on the 'Tenant'
network section.
Depends-On: https://review.opendev.org/763301
Closes-Bug: #1904809
Change-Id: I19a011bfbbcdbacbe257625310490077755a8b70
Currently ovs_dpdk molecule jobs are failing with openvswitch
package is not available.
test_deps_setup_tripleo for test_deps role fixes the issue.
Fixed the package name for openvswitch.
updated the job timig also.
Switch to ubi:8.2 base image in molecule job to fix package
conflict issue caused by ubi:8.3
Closes-Bug: #1905683
Closes-Bug: #1905687
Change-Id: Ie77be94849bba28f322a09d254be650a9ec687f5
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>