Add guide for galera configuration

We found a few interesting settings for databases in an IRC discussion.
This is the information we got out of it.

Change-Id: I7c16b32f6019bd5e2ee1343accd2d681d6f5adef
This commit is contained in:
Felix Huettner 2022-10-20 15:10:35 +02:00 committed by Dr. Jens Harbott
parent 8674137fcd
commit d09670a9c7
2 changed files with 153 additions and 0 deletions

View File

@ -0,0 +1,152 @@
========
Database
========
Introduction
------------
The general recommendation is to run a galera cluster (using mysql or mariadb) for your database.
The active-active features of it allow for a fast and easy failover.
Note that OpenStack does not play well with writing in parallel to multiple galera nodes,
see the config recommendation below.
------------------------------
One database to rule them all?
------------------------------
You can consider deploying the database in two ways:
* one galera cluster for each OpenStack service
* one single big galera cluster for all OpenStack services
The recommendation is to split your galera in separate clusters for multiple reasons:
* Reduce the impact when a galera cluster is down
* Allow intervention on smaller part of infrastructure
Also there is no benefit of colocating multiple services on the same galera cluster.
Config recommendation
---------------------
This section is split into three parts:
* the configuration for galera itself
* the configuration for the reverse proxy in front of galera
* the configuration for the OpenStack services
--------------------
Galera configuration
--------------------
All of these settings need to be consistent on all nodes of the galera cluster.
Note that this guide does not include the general requirements to get the galera cluster set up in the first place.
For this please see https://mariadb.com/kb/en/getting-started-with-mariadb-galera-cluster/
General health configs
^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: console
[mysqld]
max_connections=4000
max_statement_time=20
In order to ensure your cluster runs smoothly we recommend you limit the amount of connections and the time statements can be executed.
The value for ``max_connections`` should be set based on actual tests (testing with a lot of idle connections is fine).
The value of ``20`` seconds for ``max_statement_time`` is enough for all normal use-cases we know of.
You might only run into issues with regular Nova cleanup jobs if they do not run often enough.
Replication stability
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: console
[galera]
wsrep_provider_options=gcomm.thread_prio=rr:2;gcs.fc_limit=160;gcs.fc_factor=0.8;gcache.size=2G
When you have a large amount of connections to your galera cluster, these connections might starve the galera replication thread.
If the replication thread does not get enough CPU time, the galera cluster will lose its members and break.
This setting sets the replication thread to realtime scheduling on the kernel side.
If you run galera as a non-privileged user (as you hopefully do), galera will need ``CAP_SYS_NICE`` in order to be allowed to change the priority.
If you run inside a container environment, you might need to set ``kernel.sched_rt_runtime_us=-1`` (although that is suboptimal).
Performance
^^^^^^^^^^^
.. code-block:: console
[mysqld]
tmp_table_size=64M
max_heap_table_size=64M
optimizer_switch=derived_merge=off
Temporary tables and the ``derived_merge`` optimizer are an important setting if you have a large amount of Neutron RBAC rules.
---------------------------
Reverse Proxy Configuration
---------------------------
You will need to run a reverse proxy in front of your galera cluster to ensure OpenStack only ever communicates with a single cluster node.
This is required because OpenStack can not handle the potential consistency issues that arrise when writing to different nodes in parallel.
If you choose to run haproxy for this, you can use something like the following config:
.. code-block:: console
defaults
timeout client 300s
listen mysql
bind 0.0.0.0:3306
option mysql-check
server server-1 server-1.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster backup
server server-2 server-2.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster backup
server server-3 server-3.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster backup
Entering all servers with ``backup`` at the end ensures that haproxy will always choose the first server unless it is offline.
You should note the ``timeout client`` setting here, as it is relevant to the OpenStack configuration.
-----------------------
OpenStack Configuration
-----------------------
Database Connection Settings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The database configuration is normally in the ``[database]`` section of he configuration.
You should set the following:
.. code-block:: console
connection_recycle_time = 280
max_pool_size = 15
max_overflow = 25
The ``connection_recycle_time`` should be a bit smaller than the ``timeout client`` in the reverse proxy (5% to 10%).
This ensures connections are recreated on the OpenStack side first before the reverse proxy is forcing the connection to terminate.
The ``max_pool_size`` and ``max_overflow`` define the amount of connections an individual thread is allowed to have.
You will need to set this based on experience (although the above should be a good start).
Database cleanup
^^^^^^^^^^^^^^^^
Nova and Cinder use soft deletes inside their database.
This means deleted entries are still persistent in the database and just get a ``deleted`` flag set.
In order to prevent the database tables from growing forever these deleted entries will need to be regularly removed.
For this you can use:
* ``nova-manage db archive_deleted_rows`` and `nova-manage db purge` for Nova (https://docs.openstack.org/nova/latest/cli/nova-manage.html)
* ``cinder-manage db purge`` for Cinder (https://docs.openstack.org/cinder/latest/cli/cinder-manage.html)
If you did never run these cleanups previously (or if your environment has a high amount of resources being deleted) you might run into
a timeout due to the ``max_statement_time`` on the database cluster.
To work around this the ``nova-manage`` commands support a ``--max-rows`` argument.
For Cinder you might need to run the SQL statements manually and add a ``limit 1000`` to them (statements are part of the error of the command).

View File

@ -8,5 +8,6 @@ Contents:
:maxdepth: 2
configuration_guidelines
database
rabbitmq
ceph