Add guide for galera configuration

We found a few interesting settings for databases in an IRC discussion. This is the information we got out of it. Change-Id: I7c16b32f6019bd5e2ee1343accd2d681d6f5adef
2022-10-20 15:10:35 +02:00 · 2022-10-20 15:10:35 +02:00 · d09670a9c7
parent 8674137fcd
commit d09670a9c7
2 changed files with 153 additions and 0 deletions
--- a/doc/source/other/database.rst
+++ b/doc/source/other/database.rst
@ -0,0 +1,152 @@
+========
+Database
+========
+
+Introduction
+------------
+
+The general recommendation is to run a galera cluster (using mysql or mariadb) for your database.
+The active-active features of it allow for a fast and easy failover.
+
+Note that OpenStack does not play well with writing in parallel to multiple galera nodes,
+see the config recommendation below.
+
+------------------------------
+One database to rule them all?
+------------------------------
+
+You can consider deploying the database in two ways:
+
+* one galera cluster for each OpenStack service
+* one single big galera cluster for all OpenStack services
+
+The recommendation is to split your galera in separate clusters for multiple reasons:
+
+* Reduce the impact when a galera cluster is down
+* Allow intervention on smaller part of infrastructure
+
+Also there is no benefit of colocating multiple services on the same galera cluster.
+
+Config recommendation
+---------------------
+
+This section is split into three parts:
+
+* the configuration for galera itself
+* the configuration for the reverse proxy in front of galera
+* the configuration for the OpenStack services
+
+--------------------
+Galera configuration
+--------------------
+
+All of these settings need to be consistent on all nodes of the galera cluster.
+
+Note that this guide does not include the general requirements to get the galera cluster set up in the first place.
+For this please see https://mariadb.com/kb/en/getting-started-with-mariadb-galera-cluster/
+
+General health configs
+^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: console
+
+   [mysqld]
+   max_connections=4000
+   max_statement_time=20
+
+In order to ensure your cluster runs smoothly we recommend you limit the amount of connections and the time statements can be executed.
+
+The value for ``max_connections`` should be set based on actual tests (testing with a lot of idle connections is fine).
+
+The value of ``20`` seconds for ``max_statement_time`` is enough for all normal use-cases we know of.
+You might only run into issues with regular Nova cleanup jobs if they do not run often enough.
+
+Replication stability
+^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: console
+
+   [galera]
+   wsrep_provider_options=gcomm.thread_prio=rr:2;gcs.fc_limit=160;gcs.fc_factor=0.8;gcache.size=2G
+
+When you have a large amount of connections to your galera cluster, these connections might starve the galera replication thread.
+If the replication thread does not get enough CPU time, the galera cluster will lose its members and break.
+
+This setting sets the replication thread to realtime scheduling on the kernel side.
+If you run galera as a non-privileged user (as you hopefully do), galera will need ``CAP_SYS_NICE`` in order to be allowed to change the priority.
+If you run inside a container environment, you might need to set ``kernel.sched_rt_runtime_us=-1`` (although that is suboptimal).
+
+Performance
+^^^^^^^^^^^
+
+.. code-block:: console
+
+   [mysqld]
+   tmp_table_size=64M
+   max_heap_table_size=64M
+   optimizer_switch=derived_merge=off
+
+Temporary tables and the ``derived_merge`` optimizer are an important setting if you have a large amount of Neutron RBAC rules.
+
+---------------------------
+Reverse Proxy Configuration
+---------------------------
+
+You will need to run a reverse proxy in front of your galera cluster to ensure OpenStack only ever communicates with a single cluster node.
+This is required because OpenStack can not handle the potential consistency issues that arrise when writing to different nodes in parallel.
+
+If you choose to run haproxy for this, you can use something like the following config:
+
+.. code-block:: console
+
+   defaults
+       timeout client 300s
+   listen mysql
+       bind 0.0.0.0:3306
+       option mysql-check
+       server server-1 server-1.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster backup
+       server server-2 server-2.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster backup
+       server server-3 server-3.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster backup
+
+Entering all servers with ``backup`` at the end ensures that haproxy will always choose the first server unless it is offline.
+
+You should note the ``timeout client`` setting here, as it is relevant to the OpenStack configuration.
+
+-----------------------
+OpenStack Configuration
+-----------------------
+
+Database Connection Settings
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The database configuration is normally in the ``[database]`` section of he configuration.
+You should set the following:
+
+.. code-block:: console
+
+   connection_recycle_time = 280
+   max_pool_size = 15
+   max_overflow = 25
+
+The ``connection_recycle_time`` should be a bit smaller than the ``timeout client`` in the reverse proxy (5% to 10%).
+This ensures connections are recreated on the OpenStack side first before the reverse proxy is forcing the connection to terminate.
+
+The ``max_pool_size`` and ``max_overflow`` define the amount of connections an individual thread is allowed to have.
+You will need to set this based on experience (although the above should be a good start).
+
+Database cleanup
+^^^^^^^^^^^^^^^^
+
+Nova and Cinder use soft deletes inside their database.
+This means deleted entries are still persistent in the database and just get a ``deleted`` flag set.
+
+In order to prevent the database tables from growing forever these deleted entries will need to be regularly removed.
+For this you can use:
+
+* ``nova-manage db archive_deleted_rows`` and `nova-manage db purge` for Nova (https://docs.openstack.org/nova/latest/cli/nova-manage.html)
+* ``cinder-manage db purge`` for Cinder (https://docs.openstack.org/cinder/latest/cli/cinder-manage.html)
+
+If you did never run these cleanups previously (or if your environment has a high amount of resources being deleted) you might run into
+a timeout due to the ``max_statement_time`` on the database cluster.
+To work around this the ``nova-manage`` commands support a ``--max-rows`` argument.
+For Cinder you might need to run the SQL statements manually and add a ``limit 1000`` to them (statements are part of the error of the command).
--- a/doc/source/other/index.rst
+++ b/doc/source/other/index.rst
@ -8,5 +8,6 @@ Contents:
   :maxdepth: 2

   configuration_guidelines
+   database
   rabbitmq
   ceph