Currently in puppet-tripleo for the HA container we hardcode the following:
options => "--user=root --log-driver=journald -e KOLLA_CONFIG_STRATEGY=COPY_ALWAYS${tls_priorities_real}",
Since at least podman had some changes in terms of supported driver
backends (and bugs) it's best if we make this configurable. While we're
at it we should also switch to k8s-file as a driver when podman is being
used which is what all other containers are using. When docker is the
default container_cli we will stick to journald as usual.
Tested this on a Train environment and successfully verified that
we still see the correct logs in /var/log/containers/.../...
Change-Id: I5b1483826f816d11a064a937d59f9a8f468315a5
Closes-Bug: #1853517
Currently there are cases when collectd is trying to connect to QDR
on different interface than QDR is listening for connections.
This patch makes sure that those services are always in sync. And also
is changing interior configuration to work based on IPs instead
of hostnames.
Change-Id: Ia865bef9daf5b7e92b1b8c3712113416c9c8c176
We need to set ceph_grafana_vip on a network which is routable
and yet not the public network where the OpenStack APIs are exposed
but the service is *not* public.
Change-Id: I7d636c4513317162ec4b49aa12d88a959bf5c537
With I2feb9e81bc40e44cb2c7a2972366fa4b16590227, we don't need the
wrappers managed by Puppet anymore, everything is deployed by Ansible.
Blueprint: safe-side-containers
Depends-On: I2feb9e81bc40e44cb2c7a2972366fa4b16590227
Change-Id: I890fff9c7ead7e72fd4fe3a58b4ffce2e315b916
Currently when it is not defined the config contains sslProfile: undef which
makes nterior QDR communication malf`unctioning.
Change-Id: I62edac42204b28d9a81789723b331c79aa3358a6
This commit improves the way stonith levels are set up and their
resiliency against redeployments by introducing a stonith_levels
custom fact that collects the current stonith levels defined for
the specific server, so we can compare against the desired number
of levels defined in hiera.
If these do not match (for example if there are additional levels
that are no longer necessary), the clean up step also introduced
by this commit takes care of deleting the ones no longer necessary.
Change-Id: Ifae73ac2bf4481d0a11e89c0ea0916e85dd2db1d
pci_alias was removed from nova::api with
c3e5c7480f03949a824165349642b59a6077ec5d
We need to include ::nova::pci for the nova api service to have
pci/aliases configured in nova api.
Closes-Bug: #1849797
Change-Id: I5258028ff636e8a6287468499dd6974f6c7f6f6f
This patch prepares the ground for using the latest OVN. OVN is split from
openvswitch and it has its own code repo. After the split, OVN has its
own run dir (/var/run/ovn), db dir (/etc/ovn/), log dir (/var/logs/ovn)
and datadir - /usr/share/ovn/scripts.
With this patch, it supports running older version (2.11) or new
version (2.12) without any issues. It mounts the host directories accordingly
so that there is no impact when OVN is updated.
Change-Id: I5d778cbeb2863ec0fe649799863752e8eb16492f
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Add file to the reno documentation build to show release notes for
stable/train.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/train.
Change-Id: I4ea46c9bb7b55ef6382cf9d52038e7d26d66e6eb
Sem-Ver: feature
Under pressure, the default monitor timeout value of 20 seconds is not
enough to prevent unnecessary failovers of the ovn-dbs pacemaker resource.
While spawning a few VMs in the same time this could lead to unnecessary
movements of master DB, then re-connections of ovn-controllers (slaves are
read-only), further peaks of load on DBs, and at the end it could lead to
snowball effect. Now this value can be configurable by dbs_timeout in
tripleo::profile::pacemaker::ovn_dbs_bundle and by default is set to 60s.
Change-Id: Ib95c6b7614631eed264d42e6cf61672b705e7893
Signed-off-by: Kamil Sambor <ksambor@redhat.com>
Changing cache/enabled=False by default has dropped performance.
keystone local cache also got disabled with this.
This reverts commit 469d432195d1f5b5e15ce72ce1624d4ed4447e4e.
Depends-On: https://review.opendev.org/#/c/688770/
Closes-Bug: #1847585
Change-Id: I2af70755746f3fc3eb10eba2188ad2772704d988
Currently when adding some tuning options via hiera, galera won't start because
overriding even a single mysql option will reset the whole key in the hash. So
for example, when adding:
tripleo::profile::base::database::mysql::mysql_server_options:
mysqld:
# MySQL InnoDB equally divided in 1GB instances
innodb_buffer_pool_instances: 2
# Query network write timeout raised to 120 seconds
net_write_timeout: 120
# Query network read timeout raised to 120 seconds
net_read_timeout: 120
# MySQL connection timeout set to 8 hours
connect_timeout: 28800
Things will break because all the wsrep options that are set normally will be
overridden and galera will refuse to start
Tested by passing the above hiera keys and observing the deploy complete
successfully and the settings correctly applied to galera/mysql on the overcloud.
Change-Id: I30f03bc8eb81db0243c137d4af08924adeebc951
Closes-Bug: #1848060
We are transitioning from an array to an hash for the container
environment of each container:
I894f339cdf03bc2a93c588f826f738b0b851a3ad
Mainly to make it consummable by Ansible later; where the
podman_container module needs a dict instead of a list.
This patch just changes the default, and also adds support for an Hash
instead of a List, but still supporting the List.
Change-Id: I4e53a4a3464940660473bcbe74e30507a69a4019
Currently both nova evacuate and fence compute in the Instance HA
setup of tripleo user the keystone admin user in order to query nova,
evacuate instances, disable/enable the nova-compute service and
call the nova force-down API.
With this patch we introduce the keystone_tenant parameter which is
needed when moving to the nova service user as it is different than
keystone_admin in that case.
Tested as follows:
1. Deployed a normal unpatched OSP13 with IHA
2. Run a redeploy with the following addition:
parameter_defaults:
ExtraConfig:
tripleo::profile::base::pacemaker::instance_ha::keystone_password: "%{hiera('nova::keystone::authtoken::password')}"
tripleo::profile::base::pacemaker::instance_ha::keystone_admin: 'nova'
tripleo::profile::base::pacemaker::instance_ha::keystone_tenant: 'service'
3. Observe the following:
3.1. Both the fence_compute and nova evacuate resources have updated attributes
3.2. IHA still works correctly
Change-Id: If6b19ad05e0f91425f93a1c123947e92cf2ba949
When doing an upgrade to TLS Everywhere, vnc.crt is not always created
by the time the getcert command exits (even though it is run with the
-w flag). Puppet then ignores the instruction to change the file
permissions, resulting in an error at a later stage, when podman
tries to mount the file onto a container.
Change-Id: I0e0009d57cd1c90f8ae28a2cfc9337ecf8c75112
When Ironic Conductor class is called, it expects the
PXE directories exist. That is only the case for the step 4.
While there is also a case when the conductor class invoked
for the step 3 & db sync case. For that case also inlcude
the missing ironic::pxe class to ensure the PXE directories
created.
Closes-Bug: #1845222
Change-Id: I394f56ba9b213c75378bdf21999d23509632523c
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
These changes update router mode to edge or interior
depending on the node type. The mesh topology is
formed on controller nodes and remaining nodes will
connect to this topology.
Co-Authored-By: Martin Magr <mmagr@redhat.com>
Change-Id: I195dbf70f490984cedb32d48fe09f4562cbf94fa
Add a configurable delay to Nova Evacuate calls
In case /var/lib/nova/instances resides on NFS we have seen migrations
failing with 'Failed to get "write" lock - Is another process using the
image' errors.
This has been tracked down to grace/lease timeouts not having expired
before attempting the migration/evacuate, so in this cases it might be
desirable to delay the nova evacuate call to give the storage time to
release the locks.
Related resource-agents change: https://review.opendev.org/#/c/684777/
Change-Id: I5ec6a5b0c66579e068e811f49aae10a5f406158a
Resolves: rhbz#1740069
Tested by adding the following hiera key to the deployment:
parameter_defaults:
ExtraConfig:
tripleo::profile::base::pacemaker::resource_op_defaults:
timeout_test:
name: 'timeout'
value: '60s'
And correctly obtained:
[root@controller-0 tripleo]# pcs resource op defaults
timeout: 60
This allows an operator to raise global timeouts (which might be needed
in case podman performance turns out to be problematic)
Depends-On: https://review.opendev.org/664606
Change-Id: Ifa75eb9274705ea4b1c530b22659e4e106681250
Depending on the podman version, "json-file" is set to noop and makes
podman crash (true for at least podman 1.4.1), while older versions
re-add this json-file as an alias to k8s-file (true since 1.4.3).
Ensuiring we're using k8s-file will prevent issues depending on the
podman version.
Relates to https://bugzilla.redhat.com/show_bug.cgi?id=1754416
Closes-Bug: #1844856
Change-Id: I70eba8af06741ed81173689a03c4867421917cd6