Now that pacemaker resources are managed by puppet on
the host [1], the galera container gets restarted
before the puppet run that manages password update.
To make sure that any DB root password update gets
reflected in the running galera container, add a
deploy task that syncs up .myc.cnf if required.
Closes-Bug: #1960332
[1] Ie14819b66cecdb5a9cc6299b68a0cc70a7aa3370
Change-Id: I60f73939dac03b14389f37e5ffc67de5d929ee52
In current CentOS 9 Stream, configuration files of redis is placed in
/etc/redis. Following that, the puppet-redis module has been updated to
use the new path by [1].
This change ensures that the old path is consistently used in both
CentOS 8 and CentOS 9 as we require a hard-coded path to start up
redis in standalone mode(without pacemaker). The path can be updated
later, for example when we drop support for CentOS 8.
[1] https://github.com/voxpupuli/puppet-redis/pull/434
Closes-Bug: #1960620
Change-Id: Ibf7c2f8d01ddf73c92d183e51c378a0770002e72
puppet-redis supports [1,2] changing PAM limits for the
file descriptors consumed by Redis. However puppet fails
to run systemd reload inside container-puppet-redis.
We have to keep Exec resources (for old pcs setup and for
manipulating the Redis config file), so silent the Exec
resource in that call systemd reload.
Closes-Bug: #1960134
[1] https://github.com/voxpupuli/puppet-redis/issues/130
[2] https://github.com/voxpupuli/puppet-redis/pull/192
Change-Id: If3569c08edb4e820a4db11ac755f23de71059a10
We may want to be able to specific different containers at a role level.
This requires switching the container image parameters to be role
specific too allow for role based overrides.
Change-Id: I4090e889a32abd51e7c11139737a7a18e27d18e7
This new linter ensures we don't have any trailing "/" in the container
volume definitions.
Those trailing "/" may create issues with the containers, for instance
for specific mounts such as "/dev"[1].
This patch also takes the opportunity to fix those trailing "/" for the
affected files, in order to start on a clean basis.
[1] https://launchpad.net/bugs/1950176
Change-Id: If951f9643d67574c1225301aab7c9e4b0d316b7f
Related-Bug: #1950176
This parameter enables the systemd drop-in for mariadb.service but
the service has never used since all services were containerized.
Change-Id: I88e5f7c13861729c464a8ba88b5ca2c090597d93
During updates/upgrades, installing mariadb-server on the
host may impact the behaviour of containerized mysql.
During a FFU if mariadb-server is upgraded, it may happen
that the rpm scriptlets fail to start mariadb server and
leave a crash log behind (tc.log). This prevents the
regular online upgrade from happening.
During an upgrade, mariadb-server used to be force-upgraded
in the mysql service for historical reasons. It's not
necessary since mysql is containerized and can trigger
the same crash as explained above.
During a minor update, the same reasoning can apply if
RHEL channel ships a new mariadb-server rpm as scriptlets
will probably leave a crash behind as well.
Make sure mariadb-server is never installed, while
keeping mariadb CLI if already present on the host, to
avoid operational impacts.
Change-Id: Ib669bb4a5fcbb493d6d5edb5999bd1d87418558b
Closes-Bug: #1946742
In ansible, usage of true/false for boolean values, instead of yes/no,
is considered as a best practise and is enforced by ansible-lint with
the "truthy value should be one of false, true (truthy)" rule.
This change replaces usage of yes/no by true/false to follow that
practise.
Change-Id: I3313278f1ef6cbee0f906aca0a77bde1a3c53784
Mariadb 10.5 creates a user mysql@localhost when the
db is initialized. When bootstrapping the db on the
undercloud, do not try to recreate this user if it
already exists.
Change-Id: Ifde49bfb958252cba3d57a7d9e4f51690fec2ec2
Closes-Bug: #1942772
Configure tripleo_transfer's flags to mimick the way a galera SST
transfers the entire database between two nodes: in-place copy,
no delta-transfer, no compression, selective file copy.
Tested by running an FFU from Queens to Train on an composable
HA overcloud.
Change-Id: I557962e77d6558281603b40d0adf01b912e8c2f9
Closes-Bug: #1925260
This simplifies the ServiceNetMap/VipSubnetMap interfaces
to use parameter merge strategy and removes the *Defaults
interfaces.
Change-Id: Ic73628a596e9051b5c02435b712643f9ef7425e3
- tripleo::profile::base::database::mysql::client_bind_address
definition was removed. This parameter looks like an artifact
from Newton release when separate class for client configuration
tripleo::profile::base::database::mysql::client was introduced.
- tripleo::profile::pacemaker::database::redis_bundle::control_port
parameter was renamed to
tripleo::profile::pacemaker::database::redis_bundle::redis_docker_control_port
- tripleo::haproxy::panko definition was removed: related
puppet-tripleo definition was removed during Ussuri release
I3ef5c1433691dd31b619e0fdbd5ec433a181ec03
- tripleo::haproxy::ec2_api and tripleo::haproxy::ec2_api_metadata
parameters were removed: related puppet-tripleo definitons
were removed during Victoria release
I9ce13aefb82cbcada5466cd3dddf851cfc51bacc
- multiple tripleo::profile::pacemaker::rabbitmq_bundle::control_port
parameter definitions were renamed to
tripleo::profile::pacemaker::rabbitmq_bundle::rabbitmq_docker_control_port
Partial-Bug: #1916386
Change-Id: Icfb9c5c37283f1c94cd870305aa51c9605b901b3
With this change a Heat resource is no longer used to
create an undercloud neutron API port resource for the
redis and ovn_dbs service virtual IPs. Instead an
external deploy task at step 0 in the individual service
template uses the "tripleo_service_vip" ansible module
to mange a neutron API port resource for each service.
The interfaces to control the IP address and service
network (RedisVirtualFixedIPs, OVNDBsVirtualFixedIPs
and ServiceNetMap) remains the same.
It is also possible to include the 'use_neutron' boolean
in the FixedIPs parameter to instruct the ansible module
not to create a neutron API resource, and simply "echo"
the ip_address given in the FixedIPs parameter. For
example:
RedisVirtualFixedIPs:
- ip_address: 1.0.0.5
use_neutron: false
Alternatively the fixed-ips can be set using the
'ServiceVips' parameter, like this:
ServiceVips:
redis: 1.0.0.5
ovs_dbs: 1.0.0.6
NOTE: If the neutron service is not available the
tripleo_service_vip ansible module will "echo"
the IP provided in %service%VirtualFixedIPs.
Related: blueprint network-data-v2-ports
Depends-On: https://review.opendev.org/777307
Depends-On: https://review.opendev.org/779883
Change-Id: I4794418546363888e7a555a16b45b7a4417f1ef8
With I57047682cfa82ba6ca4affff54fab5216e9ba51c Heat has added
a new template version for wallaby. This would allow us to use
2-argument variant of the ``if`` function that would allow for
e.g. conditional definition of resource properties and help
cleanup templates. If only two arguments are passed to ``if``
function, the entire enclosing item is removed when the condition
is false.
Change-Id: I25f981b60c6a66b39919adc38c02a051b6c51269
Until now, we only relied on the ":z" flag in order to set container
volumes label to container_file_t.
While it works fine, it has multiple issues:
- if an operator runs a restorecon, it might break the container service
- if an SELinux related package is updated, it might reset the label,
and break the container service
- it requires a container stop&start to reset the label to the expected
value
- in case of deep tree or huge amount of file, relabelling takes time
This change ensures the system sets the expected context on the specific
locations, instead of following the content of selinux-policy-targeted
rulesets.
It has an equivalent for some locations in tripleo-ansible repository:
https://review.opendev.org/c/openstack/tripleo-ansible/+/782393
Note about swift locations:
Since openstack-selinux already sets fcontext rules for, at least, once
swift location, we can't override it here. The following
openstack-selinux patch is being pushed in order to work around this
specific case:
https://github.com/redhat-openstack/openstack-selinux/pull/73
Change-Id: Icb7f58004e281b42141c70a9a4895905dc32b45d
Resolves: rhbz#1941922
We do not need to add an if: internal_tls_enabled in a number of
ansible tasks. enabled_internal_tls is already defined as an ansible
fact in common/deploy-steps.j2:
enable_internal_tls: {get_param: EnableInternalTLS}
So when the service uses the enable_internal_tls condition and it points
to the EnableInternalTLS param, we can just use the ansible fact
directly. Note that if the enable_internal_tls condition points to
something else than the mere EnableInternalTLS we may not do this
cleanup.
Change-Id: Idb07cbc8fc3a4d73ff52c54d869310fd6c49b502
This is using linux-system-roles.certificate ansible role,
which replaces puppet-certmonger for submitting certificate
requests to certmonger. Each service is configured through
it's heat template.
Partial-Implements: blueprint ansible-certmonger
Depends-On: https://review.rdoproject.org/r/31713
Change-Id: Ib868465c20d97c62cbcb214bfc62d949bd6efc62
Since we merged the pcs-host patches we erroneously also removed
the sidecar container that does the tls stunneling for redis.
This is needed to allow the redis master stream the deplications to
its slaves via TLS.
Tested this and we now correctly get the working container and cluster
state:
[root@controller-0 ~]# podman ps -a |grep redis
4182a78811a2 undercloud-0.ctlplane.redhat.local:8787/openstack-redis:16.2_20210218.1-hotfixupdate2 /bin/bash /usr/lo... 3 minutes ago Up 3 minutes ago redis-bundle-podman-0
604a086bb53c undercloud-0.ctlplane.redhat.local:8787/openstack-redis:16.2_20210218.1-hotfixupdate2 kolla_start 8 minutes ago Up 8 minutes ago redis_tls_proxy
[root@controller-0 ~]# pcs status |grep redis
* GuestOnline: [ galera-bundle-0@database-1 galera-bundle-1@database-2 galera-bundle-2@database-0 ovn-dbs-bundle-0@controller-0 ovn-dbs-bundle-1@controller-1 ovn-dbs-bundle-2@controller-2 rabbitmq-bundle-0@messaging-0 rabbitmq-bundle-1@messaging-1 rabbitmq-bundle-2@messaging-2 redis-bundle-0@controller-0 redis-bundle-1@controller-1 redis-bundle-2@controller-2 ]
* Container bundle set: redis-bundle [cluster.common.tag/openstack-redis:pcmklatest]:
* redis-bundle-0 (ocf:💓redis): Master controller-0
* redis-bundle-1 (ocf:💓redis): Slave controller-1
* redis-bundle-2 (ocf:💓redis): Slave controller-2
We also move the redis_tls_proxy from step_2/start_order: 3 to step_1
since it actually makes sense to have it run before we start the
redis pcmk bundle at step 2 (i.e. so the slave replica can work right
away from the start).
Closes-Bug: #1916873
Change-Id: I44df0ee32e5c35b87f74bdb75dcb384496dfb6ab
In order to ANSIBLE_INJECT_FACT_VARS=False we have to use ansible_facts
instead of ansible_* vars. This change switches our distribution and
hostname related items to use ansible_facts instead.
Change-Id: I49a2c42dcbb74671834f312798367f411c819813
Related-Bug: #1915761
When a tripleo major upgrade or FFU causes an update or mariadb
to a new major version (e.g. 10.1 -> 10.3), some internal DB
tables must be upgraded (myisam tables), and sometimes the
existing user tables may be migrated to new mariadb defaults.
Move the db-specific upgrade steps into a dedicated script and
make sure that it is called at the right time while upgrading
the undercloud and/or the overcloud.
Closes-Bug: #1913438
Change-Id: I92353622994b28c895d95bdcbe348a73b6c6bb99
MySQL / MariaDB's default size of 127M is essentially
universally insufficient, and this variable is typically
better off in the 16-32G range assuming available memory.
Change-Id: Ie274854a73bf6fe3d99e8621175f81c69c9a878a
This was mainly there as an legacy interface which was
for internal use. Now that we pull the passwords from
the existing environment and don't use it, we can drop
this.
Reduces a number of heat resources.
Change-Id: If83d0f3d72a229d737a45b2fd37507dc11a04649
Adding the ability to specifies the private key size
used when creating the certificate. We have defined the
default value the same as we have before 2048 bits.
Also, it'll be able to override the key_size value
per service.
Depends-on: I4da96f2164cf1d136f9471f1d6251bdd8cfd2d0b
Change-Id: Ic2edabb7f1bd0caf4a5550d03f60fab7c8354d65
Currently galera and ovn require a coordinated restart across
the controller node when certmonger determines the certificate
for a node has expired and it needs to regenerate it.
But right now, when the tripleo certmonger puppet module is
called to assert to state of the certificates, it ends up
regenerating new certificate unconditionally. So the galera and
ovn get restarted on stack update, even when there is no need to.
To mitigate these unecessary restarts, disable the post-action
for now until we fix the behaviour of tripleo's certmonger puppet
module. This has the side effect that services won't get restarted
automatically if no stack update takes place until the certificate
expiration date is reached.
Related-Bug: #1906505
Change-Id: I17f1364932e43b8487515084e41b525e186888db
Following change Iaced2ba676a4e4f651c67da082797cc1c1ffccd1, this patch
adds a new task for the update/upgrades steps in order to ensure we're
in a clean state, with consistent names.
It also takes the opportunity to chase down newly added /var/run
mentions.
Change-Id: I9f069332254d057f80e3d25e9f8b734f8a592810
Pulling images over internet is not considered as very stable operation
as there can be a lot of issues (DNS, HTTP rate limit, route change or
interface restart ...) This patch retries "pull" 3 times to mitigate
possible networking issues
Change-Id: I03643576c9f8444d6db36364a73bccce244c8446
Closes-Bug: 1899057
With [1] we added the ability to use a fixed, static prefix for
the container image names that pacemaker uses to set up the HA
resources. This name is just an alias to the real tripleo
image that is being used in the control plane, and it allows to
update the namespace part of the image during a minor update
without disrupting service.
Add a new Heat parameter to use a completely static container
image name, to allow arbitrary name change during minor update,
e.g. registryA/namespaceA/imgA:tagA to registryB/namespaceB/imgB:tagB
By default, this new paramter is disabled.
[1] Id369154d147cd5cf0a6f997bf806084fc7580e01
Change-Id: I124c1e4dbcc7a8ed38079411f41a8f31c8f62284
The mysql database is create by container mysql_bootstrap,
which let Kolla run mysqld_safe temporarily, and then
let TripleO run it for additional setup.
Before running the second temporary mysqld server, make
sure that the mysqld_safe script started by Kolla is
always stopped, to avoid any race condition that would
cause the second mysqld_safe server to be killed by the
Kolla one.
Change-Id: Id7cf45fb95d3c8a2c5519b1a13a5651cf414a115
Co-Authored-By: Michele Baldessari <michele@acksyn.org>
Closes-Bug: #1896009
This is needed because the Mysql_ providers will prefetch
the the mysql users if facter finds the '/bin/mysql' executable
on the system and we do not want to run any mysql task on the host
directly.
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Change-Id: Ic6c65e6849368185177aeaa31d50f52761225f62
Related-Bug: #1863442
This implements the creation of the redis bundle on the host.
The testing protocol used is documented in the depends-on.
The full rationale is contained in the LP bug.
The reason for adding a post_update task is that during a minor update
the deployment tasks are not run during the node update procedure but
only during the final converge. So we ran the role again there to make
sure that any config change will trigger a restart during the minor
update, so the disruption is only local to the single node being
updated. If we did not do this a final converge could potentially
trigger a global restart of HA bundles which would be quite disruptive.
Related-Bug: #1863442
Depends-On: Iaa7e89f0d25221c2a6ef0b81eb88a6f496f01696
Change-Id: I5ce8367363d535b71b01395b0bef4cf17c8935b5
This implements the creation of the haproxy bundle on the host.
The testing protocol used is documented in the depends-on.
The reason for adding a post_update task is that during a minor update
the deployment tasks are not run during the node update procedure but
only during the final converge. So we ran the role again there to make
sure that any config change will trigger a restart during the minor
update, so the disruption is only local to the single node being
updated. If we did not do this a final converge could potentially
trigger a global restart of HA bundles which would be quite disruptive.
NB: in this case we keep the container init_bundle (renamed to
wait_bundle) around just use it to wait for galera to be up.
Depends-On: Iaa7e89f0d25221c2a6ef0b81eb88a6f496f01696
Change-Id: Ie14819b66cecdb5a9cc6299b68a0cc70a7aa3370
Related-Bug: #1863442