162 Commits

Author SHA1 Message Date
Zuul
66e4308536 Merge "Sync updated DB root password in running container" 2022-02-25 01:32:29 +00:00
Damien Ciabrini
7f8876ce7c Sync updated DB root password in running container
Now that pacemaker resources are managed by puppet on
the host [1], the galera container gets restarted
before the puppet run that manages password update.
To make sure that any DB root password update gets
reflected in the running galera container, add a
deploy task that syncs up .myc.cnf if required.

Closes-Bug: #1960332

[1] Ie14819b66cecdb5a9cc6299b68a0cc70a7aa3370

Change-Id: I60f73939dac03b14389f37e5ffc67de5d929ee52
2022-02-23 09:56:38 +01:00
Zuul
33121f95c8 Merge "Fix Redis config generation when fd limit changes" 2022-02-16 00:22:34 +00:00
Takashi Kajinami
e93b454d68 Redis: Hard-code paths of configuration files
In current CentOS 9 Stream, configuration files of redis is placed in
/etc/redis. Following that, the puppet-redis module has been updated to
use the new path by [1].

This change ensures that the old path is consistently used in both
CentOS 8 and CentOS 9 as we require a hard-coded path to start up
redis in standalone mode(without pacemaker). The path can be updated
later, for example when we drop support for CentOS 8.

[1] https://github.com/voxpupuli/puppet-redis/pull/434

Closes-Bug: #1960620
Change-Id: Ibf7c2f8d01ddf73c92d183e51c378a0770002e72
2022-02-14 21:11:25 +09:00
Damien Ciabrini
85ccef2923 Fix Redis config generation when fd limit changes
puppet-redis supports [1,2] changing PAM limits for the
file descriptors consumed by Redis. However puppet fails
to run systemd reload inside container-puppet-redis.

We have to keep Exec resources (for old pcs setup and for
manipulating the Redis config file), so silent the Exec
resource in that call systemd reload.

Closes-Bug: #1960134

[1] https://github.com/voxpupuli/puppet-redis/issues/130
[2] https://github.com/voxpupuli/puppet-redis/pull/192

Change-Id: If3569c08edb4e820a4db11ac755f23de71059a10
2022-02-09 16:51:28 +01:00
Alex Schultz
ebab335f38 Role specific container support
We may want to be able to specific different containers at a role level.
This requires switching the container image parameters to be role
specific too allow for role based overrides.

Change-Id: I4090e889a32abd51e7c11139737a7a18e27d18e7
2022-01-21 14:18:02 -07:00
Takashi Kajinami
4c12069389 Remove hieradata for Redis Sentinel
... because usage of Sentinel was already removed

Depends-on: https://review.opendev.org/806762
Change-Id: I9edcab007e56f09c3087f20dd4e431995a23ba93
2021-12-16 16:24:45 +00:00
Cédric Jeanneret
7a99ae23e3 Introduce a new linter for yaml-validate, and correct issues
This new linter ensures we don't have any trailing "/" in the container
volume definitions.

Those trailing "/" may create issues with the containers, for instance
for specific mounts such as "/dev"[1].

This patch also takes the opportunity to fix those trailing "/" for the
affected files, in order to start on a clean basis.

[1] https://launchpad.net/bugs/1950176

Change-Id: If951f9643d67574c1225301aab7c9e4b0d316b7f
Related-Bug: #1950176
2021-12-01 09:43:25 +01:00
Takashi Kajinami
368102b149 Deprecate ineffective MysqlIncreaseFileLimit
This parameter enables the systemd drop-in for mariadb.service but
the service has never used since all services were containerized.

Change-Id: I88e5f7c13861729c464a8ba88b5ca2c090597d93
2021-11-15 10:03:39 +00:00
Zuul
bd83eb9267 Merge "Remove old non-ha container removal tasks" 2021-11-04 17:51:18 +00:00
Michele Baldessari
52ed0f05b8 Remove old non-ha container removal tasks
Those were needed when we switched to HA by default via
https://review.opendev.org/c/openstack/tripleo-heat-templates/+/359060

That happened during ussuri, so we can drop these tasks.

Change-Id: Ibb68ca300fd8ce4c7a8830bd2bab7c9ce5182b33
2021-10-29 08:22:50 +02:00
Damien Ciabrini
d33865cded Remove mariadb-server packages from the host
During updates/upgrades, installing mariadb-server on the
host may impact the behaviour of containerized mysql.

During a FFU if mariadb-server is upgraded, it may happen
that the rpm scriptlets fail to start mariadb server and
leave a crash log behind (tc.log). This prevents the
regular online upgrade from happening.

During an upgrade, mariadb-server used to be force-upgraded
in the mysql service for historical reasons. It's not
necessary since mysql is containerized and can trigger
the same crash as explained above.

During a minor update, the same reasoning can apply if
RHEL channel ships a new mariadb-server rpm as scriptlets
will probably leave a crash behind as well.

Make sure mariadb-server is never installed, while
keeping mariadb CLI if already present on the host, to
avoid operational impacts.

Change-Id: Ib669bb4a5fcbb493d6d5edb5999bd1d87418558b
Closes-Bug: #1946742
2021-10-28 08:14:06 +02:00
Takashi Kajinami
76adfd4202 Use true/false for boolean values
In ansible, usage of true/false for boolean values, instead of yes/no,
is considered as a best practise and is enforced by ansible-lint with
the "truthy value should be one of false, true (truthy)" rule.

This change replaces usage of yes/no by true/false to follow that
practise.

Change-Id: I3313278f1ef6cbee0f906aca0a77bde1a3c53784
2021-10-12 09:35:38 +09:00
Damien Ciabrini
f7121a8465 Do not create mysql user if it already exists
Mariadb 10.5 creates a user mysql@localhost when the
db is initialized. When bootstrapping the db on the
undercloud, do not try to recreate this user if it
already exists.

Change-Id: Ifde49bfb958252cba3d57a7d9e4f51690fec2ec2
Closes-Bug: #1942772
2021-09-06 14:44:59 +02:00
Damien Ciabrini
5ef7f9330a FFU: change transfer parameters for database resync
Configure tripleo_transfer's flags to mimick the way a galera SST
transfers the entire database between two nodes: in-place copy,
no delta-transfer, no compression, selective file copy.

Tested by running an FFU from Queens to Train on an composable
HA overcloud.

Change-Id: I557962e77d6558281603b40d0adf01b912e8c2f9
Closes-Bug: #1925260
2021-07-06 12:15:52 +02:00
Zuul
d5a9b5bb00 Merge "Remove or fix outdated/incorrect tripleo hieradata definitions" 2021-06-28 22:27:51 +00:00
Sagi Shnaidman
019419463f Use community.general ansible collection instead of modules
Replace module calls by community.general ansible collection calls.
Change-Id: Ie96b3d35cea61370b1f98d7e060d696c4807c6b7
2021-06-10 15:17:08 +03:00
Sagi Shnaidman
e40a346d70 Use collection FQCN for podman modules
Replace modules for containers.podman and openstack.cloud
Change-Id: Ia7478fc82ce532bf60a07cba395c5652a6200a8d
2021-05-26 17:50:08 +03:00
ramishra
b253d564f7 Use server side env merging for ServiceNetMap/VipSubnetMap
This simplifies the ServiceNetMap/VipSubnetMap interfaces
to use parameter merge strategy and removes the *Defaults
interfaces.

Change-Id: Ic73628a596e9051b5c02435b712643f9ef7425e3
2021-05-19 10:16:58 +05:30
Alexey Stupnikov
ab9f26b305 Remove or fix outdated/incorrect tripleo hieradata definitions
- tripleo::profile::base::database::mysql::client_bind_address
  definition was removed. This parameter looks like an artifact
  from Newton release when separate class for client configuration
  tripleo::profile::base::database::mysql::client was introduced.
- tripleo::profile::pacemaker::database::redis_bundle::control_port
  parameter was renamed to
  tripleo::profile::pacemaker::database::redis_bundle::redis_docker_control_port
- tripleo::haproxy::panko definition was removed: related
  puppet-tripleo definition was removed during Ussuri release
  I3ef5c1433691dd31b619e0fdbd5ec433a181ec03
- tripleo::haproxy::ec2_api and tripleo::haproxy::ec2_api_metadata
  parameters were removed: related puppet-tripleo definitons
  were removed during Victoria release
  I9ce13aefb82cbcada5466cd3dddf851cfc51bacc
- multiple tripleo::profile::pacemaker::rabbitmq_bundle::control_port
  parameter definitions were renamed to
  tripleo::profile::pacemaker::rabbitmq_bundle::rabbitmq_docker_control_port

Partial-Bug: #1916386
Change-Id: Icfb9c5c37283f1c94cd870305aa51c9605b901b3
2021-05-13 12:44:56 +02:00
Zuul
0dc522bbc7 Merge "Refactor Service VIPs redis and ovn_dbs" 2021-04-19 20:31:59 +00:00
Harald Jensås
23cdf4dd17 Refactor Service VIPs redis and ovn_dbs
With this change a Heat resource is no longer used to
create an undercloud neutron API port resource for the
redis and ovn_dbs service virtual IPs. Instead an
external deploy task at step 0 in the individual service
template uses the "tripleo_service_vip" ansible module
to mange a neutron API port resource for each service.

The interfaces to control the IP address and service
network (RedisVirtualFixedIPs, OVNDBsVirtualFixedIPs
and ServiceNetMap) remains the same.

It is also possible to include the 'use_neutron' boolean
in the FixedIPs parameter to instruct the ansible module
not to create a neutron API resource, and simply "echo"
the ip_address given in the FixedIPs parameter. For
example:
  RedisVirtualFixedIPs:
    - ip_address: 1.0.0.5
      use_neutron: false

Alternatively the fixed-ips can be set using the
'ServiceVips' parameter, like this:

 ServiceVips:
   redis: 1.0.0.5
   ovs_dbs: 1.0.0.6

NOTE: If the neutron service is not available the
      tripleo_service_vip ansible module will "echo"
      the IP provided in %service%VirtualFixedIPs.

Related: blueprint network-data-v2-ports
Depends-On: https://review.opendev.org/777307
Depends-On: https://review.opendev.org/779883
Change-Id: I4794418546363888e7a555a16b45b7a4417f1ef8
2021-04-14 10:22:59 +02:00
ramishra
acdddec6d5 Simplify database service templates
Removes unnecessary conditions and intrinsic functions.

Change-Id: Id7d3ec5dd801a897ce7d4d4325dcd17635ebd726
2021-04-14 10:25:57 +05:30
Zuul
f8676c05f1 Merge "Ensure SELinux context persist across restorecon and reboot" 2021-04-07 03:59:53 +00:00
Zuul
824ec8b5ad Merge "Simplify internal_tls_enabled conditions" 2021-04-03 13:20:28 +00:00
ramishra
c9991c2e31 Use 'wallaby' heat_template_version
With I57047682cfa82ba6ca4affff54fab5216e9ba51c Heat has added
a new template version for wallaby. This would allow us to use
2-argument variant of the ``if`` function that would allow for
 e.g. conditional definition of resource properties and help
cleanup templates. If only two arguments are passed to ``if``
function, the entire enclosing item is removed when the condition
is false.

Change-Id: I25f981b60c6a66b39919adc38c02a051b6c51269
2021-03-31 17:35:12 +05:30
Cédric Jeanneret
d77fe55516 Ensure SELinux context persist across restorecon and reboot
Until now, we only relied on the ":z" flag in order to set container
volumes label to container_file_t.
While it works fine, it has multiple issues:
- if an operator runs a restorecon, it might break the container service
- if an SELinux related package is updated, it might reset the label,
  and break the container service
- it requires a container stop&start to reset the label to the expected
  value
- in case of deep tree or huge amount of file, relabelling takes time

This change ensures the system sets the expected context on the specific
locations, instead of following the content of selinux-policy-targeted
rulesets.

It has an equivalent for some locations in tripleo-ansible repository:
https://review.opendev.org/c/openstack/tripleo-ansible/+/782393

Note about swift locations:
Since openstack-selinux already sets fcontext rules for, at least, once
swift location, we can't override it here. The following
openstack-selinux patch is being pushed in order to work around this
specific case:
https://github.com/redhat-openstack/openstack-selinux/pull/73

Change-Id: Icb7f58004e281b42141c70a9a4895905dc32b45d
Resolves: rhbz#1941922
2021-03-30 08:11:59 +02:00
Michele Baldessari
5e4c17acfb Simplify internal_tls_enabled conditions
We do not need to add an if: internal_tls_enabled in a number of
ansible tasks. enabled_internal_tls is already defined as an ansible
fact in common/deploy-steps.j2:
enable_internal_tls: {get_param: EnableInternalTLS}

So when the service uses the enable_internal_tls condition and it points
to the EnableInternalTLS param, we can just use the ansible fact
directly. Note that if the enable_internal_tls condition points to
something else than the mere EnableInternalTLS we may not do this
cleanup.

Change-Id: Idb07cbc8fc3a4d73ff52c54d869310fd6c49b502
2021-03-27 13:42:35 +01:00
Carlos Goncalves
6e7e0ab48e Remove obsoleted generate_service_certificates
Remove traces of generate_service_certificates. It was removed during
Pike release cycle [1].

[1] https://review.opendev.org/c/openstack/puppet-tripleo/+/444891

Change-Id: Ib203b52547433ff73141df66641528c389b50361
2021-03-16 19:50:14 +01:00
Grzegorz Grasza
e329ca915e Generate certificates using ansible role
This is using linux-system-roles.certificate ansible role,
which replaces puppet-certmonger for submitting certificate
requests to certmonger. Each service is configured through
it's heat template.

Partial-Implements: blueprint ansible-certmonger
Depends-On: https://review.rdoproject.org/r/31713
Change-Id: Ib868465c20d97c62cbcb214bfc62d949bd6efc62
2021-03-10 16:28:22 +01:00
Zuul
2e231cf7ab Merge "Upgrade mariadb storage during upgrade tasks" 2021-02-26 02:11:10 +00:00
Michele Baldessari
84c85aaff3 Fix redis_tls_proxy
Since we merged the pcs-host patches we erroneously also removed
the sidecar container that does the tls stunneling for redis.
This is needed to allow the redis master stream the deplications to
its slaves via TLS.

Tested this and we now correctly get the working container and cluster
state:
[root@controller-0 ~]# podman ps -a |grep redis
4182a78811a2  undercloud-0.ctlplane.redhat.local:8787/openstack-redis:16.2_20210218.1-hotfixupdate2       /bin/bash /usr/lo...  3 minutes ago   Up 3 minutes ago                 redis-bundle-podman-0
604a086bb53c  undercloud-0.ctlplane.redhat.local:8787/openstack-redis:16.2_20210218.1-hotfixupdate2       kolla_start           8 minutes ago   Up 8 minutes ago                 redis_tls_proxy
[root@controller-0 ~]# pcs status |grep redis
  * GuestOnline: [ galera-bundle-0@database-1 galera-bundle-1@database-2 galera-bundle-2@database-0 ovn-dbs-bundle-0@controller-0 ovn-dbs-bundle-1@controller-1 ovn-dbs-bundle-2@controller-2 rabbitmq-bundle-0@messaging-0 rabbitmq-bundle-1@messaging-1 rabbitmq-bundle-2@messaging-2 redis-bundle-0@controller-0 redis-bundle-1@controller-1 redis-bundle-2@controller-2 ]
  * Container bundle set: redis-bundle [cluster.common.tag/openstack-redis:pcmklatest]:
    * redis-bundle-0    (ocf:💓redis):  Master controller-0
    * redis-bundle-1    (ocf:💓redis):  Slave controller-1
    * redis-bundle-2    (ocf:💓redis):  Slave controller-2

We also move the redis_tls_proxy from step_2/start_order: 3 to step_1
since it actually makes sense to have it run before we start the
redis pcmk bundle at step 2 (i.e. so the slave replica can work right
away from the start).

Closes-Bug: #1916873

Change-Id: I44df0ee32e5c35b87f74bdb75dcb384496dfb6ab
2021-02-25 11:41:36 +01:00
Alex Schultz
8d1fc85744 Use ansible_facts instead
In order to ANSIBLE_INJECT_FACT_VARS=False we have to use ansible_facts
instead of ansible_* vars. This change switches our distribution and
hostname related items to use ansible_facts instead.

Change-Id: I49a2c42dcbb74671834f312798367f411c819813
Related-Bug: #1915761
2021-02-22 17:57:17 +00:00
Zuul
ced3eb989b Merge "Default all innodb_buffer_pool_size to 1G" 2021-02-17 15:50:15 +00:00
Damien Ciabrini
712cfcc71b Upgrade mariadb storage during upgrade tasks
When a tripleo major upgrade or FFU causes an update or mariadb
to a new major version (e.g. 10.1 -> 10.3), some internal DB
tables must be upgraded (myisam tables), and sometimes the
existing user tables may be migrated to new mariadb defaults.

Move the db-specific upgrade steps into a dedicated script and
make sure that it is called at the right time while upgrading
the undercloud and/or the overcloud.

Closes-Bug: #1913438

Change-Id: I92353622994b28c895d95bdcbe348a73b6c6bb99
2021-02-16 09:08:40 +01:00
Mike Bayer
823c5b48d1 Default all innodb_buffer_pool_size to 1G
MySQL / MariaDB's default size of 127M is essentially
universally insufficient, and this variable is typically
better off in the 16-32G range assuming available memory.

Change-Id: Ie274854a73bf6fe3d99e8621175f81c69c9a878a
2021-02-12 09:32:01 -05:00
ramishra
7f195ff9a8 Remove DefaultPasswords interface
This was mainly there as an legacy interface which was
for internal use. Now that we pull the passwords from
the existing environment and don't use it, we can drop
this.

Reduces a number of heat resources.

Change-Id: If83d0f3d72a229d737a45b2fd37507dc11a04649
2021-02-12 11:38:44 +05:30
Raildo
9760977529 Adding key_size option on the certificate creation
Adding the ability to specifies the private key size
used when creating the certificate. We have defined the
default value the same as we have before 2048 bits.
Also, it'll be able to override the key_size value
per service.

Depends-on: I4da96f2164cf1d136f9471f1d6251bdd8cfd2d0b
Change-Id: Ic2edabb7f1bd0caf4a5550d03f60fab7c8354d65
2020-12-17 20:22:52 -03:00
Damien Ciabrini
8b16911cc2 Revert rolling certificate updates for HA services
Currently galera and ovn require a coordinated restart across
the controller node when certmonger determines the certificate
for a node has expired and it needs to regenerate it.

But right now, when the tripleo certmonger puppet module is
called to assert to state of the certificates, it ends up
regenerating new certificate unconditionally. So the galera and
ovn get restarted on stack update, even when there is no need to.

To mitigate these unecessary restarts, disable the post-action
for now until we fix the behaviour of tripleo's certmonger puppet
module. This has the side effect that services won't get restarted
automatically if no stack update takes place until the certificate
expiration date is reached.

Related-Bug: #1906505

Change-Id: I17f1364932e43b8487515084e41b525e186888db
2020-12-02 12:45:53 +01:00
Zuul
9dc74f9219 Merge "Ensure name consistency for tmpfiles.d configurations" 2020-10-15 22:54:35 +00:00
Zuul
837f8cbb06 Merge "HA: option to use static pacemaker image name" 2020-10-14 18:20:25 +00:00
Cédric Jeanneret
6ec3578a0c Ensure name consistency for tmpfiles.d configurations
Following change Iaced2ba676a4e4f651c67da082797cc1c1ffccd1, this patch
adds a new task for the update/upgrades steps in order to ensure we're
in a clean state, with consistent names.

It also takes the opportunity to chase down newly added /var/run
mentions.

Change-Id: I9f069332254d057f80e3d25e9f8b734f8a592810
2020-10-13 11:32:52 +02:00
Sergii Golovatiuk
cd6dc467ce Retry container pull 3 times
Pulling images over internet is not considered as very stable operation
as there can be a lot of issues (DNS, HTTP rate limit, route change or
interface restart ...) This patch retries "pull" 3 times to mitigate
possible networking issues

Change-Id: I03643576c9f8444d6db36364a73bccce244c8446
Closes-Bug: 1899057
2020-10-08 18:48:12 +02:00
Damien Ciabrini
cdad826ad4 HA: option to use static pacemaker image name
With [1] we added the ability to use a fixed, static prefix for
the container image names that pacemaker uses to set up the HA
resources. This name is just an alias to the real tripleo
image that is being used in the control plane, and it allows to
update the namespace part of the image during a minor update
without disrupting service.

Add a new Heat parameter to use a completely static container
image name, to allow arbitrary name change during minor update,
e.g. registryA/namespaceA/imgA:tagA to registryB/namespaceB/imgB:tagB
By default, this new paramter is disabled.

[1] Id369154d147cd5cf0a6f997bf806084fc7580e01

Change-Id: I124c1e4dbcc7a8ed38079411f41a8f31c8f62284
2020-10-07 08:55:10 +02:00
Zuul
4437dbd323 Merge "Use consistent naming for MysqlInnodbBufferPoolSize" 2020-09-22 04:48:28 +00:00
Luke Short
8e84bb3782 Use consistent naming for MysqlInnodbBufferPoolSize
Before: MySQLInnodbBufferPoolSize
After: MysqlInnodbBufferPoolSize

Change-Id: I4e4d4ded28846a68b2ac6f993d9adaca11bf18ce
Related-Change-Id: I59e74a76d8467bd49c95da5031a23cda0cc6f52d
Signed-off-by: Luke Short <ekultails@gmail.com>
2020-09-21 12:17:31 -04:00
Damien Ciabrini
e8ddc606b2 Remove race during mysql database creation
The mysql database is create by container mysql_bootstrap,
which let Kolla run mysqld_safe temporarily, and then
let TripleO run it for additional setup.

Before running the second temporary mysqld server, make
sure that the mysqld_safe script started by Kolla is
always stopped, to avoid any race condition that would
cause the second mysqld_safe server to be killed by the
Kolla one.

Change-Id: Id7cf45fb95d3c8a2c5519b1a13a5651cf414a115
Co-Authored-By: Michele Baldessari <michele@acksyn.org>
Closes-Bug: #1896009
2020-09-17 16:09:50 +02:00
Michele Baldessari
4b0cc36efe Make sure we noop the Mysql_ providers
This is needed because the Mysql_ providers will prefetch
the the mysql users if facter finds the '/bin/mysql' executable
on the system and we do not want to run any mysql task on the host
directly.

Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>

Change-Id: Ic6c65e6849368185177aeaa31d50f52761225f62
Related-Bug: #1863442
2020-09-11 05:05:27 +00:00
Michele Baldessari
83d1cb5852 pcs commands on host: redis bundle
This implements the creation of the redis bundle on the host.
The testing protocol used is documented in the depends-on.

The full rationale is contained in the LP bug.

The reason for adding a post_update task is that during a minor update
the deployment tasks are not run during the node update procedure but
only during the final converge. So we ran the role again there to make
sure that any config change will trigger a restart during the minor
update, so the disruption is only local to the single node being
updated. If we did not do this a final converge could potentially
trigger a global restart of HA bundles which would be quite disruptive.

Related-Bug: #1863442

Depends-On: Iaa7e89f0d25221c2a6ef0b81eb88a6f496f01696

Change-Id: I5ce8367363d535b71b01395b0bef4cf17c8935b5
2020-09-03 13:01:30 +00:00
Michele Baldessari
da3d5e8056 pcs commands on host: mysql
This implements the creation of the haproxy bundle on the host.
The testing protocol used is documented in the depends-on.

The reason for adding a post_update task is that during a minor update
the deployment tasks are not run during the node update procedure but
only during the final converge. So we ran the role again there to make
sure that any config change will trigger a restart during the minor
update, so the disruption is only local to the single node being
updated. If we did not do this a final converge could potentially
trigger a global restart of HA bundles which would be quite disruptive.

NB: in this case we keep the container init_bundle (renamed to
wait_bundle) around just use it to wait for galera to be up.

Depends-On: Iaa7e89f0d25221c2a6ef0b81eb88a6f496f01696

Change-Id: Ie14819b66cecdb5a9cc6299b68a0cc70a7aa3370
Related-Bug: #1863442
2020-09-01 17:04:06 +00:00