9646979e7b
The document mentionned that we could use the rest of nodes as read only nodes, however the configuration suggest to use only one. Change-Id: I418cd0e0bd00b093356ecca3d949e5ad44ba260e
170 lines
6.8 KiB
ReStructuredText
170 lines
6.8 KiB
ReStructuredText
========
|
|
Database
|
|
========
|
|
|
|
Introduction
|
|
------------
|
|
|
|
The general recommendation is to run a galera cluster (using mysql or mariadb) for your database.
|
|
The active-active features of it allow for a fast and easy failover.
|
|
|
|
Note that OpenStack does not play well with writing in parallel to multiple galera nodes,
|
|
see the config recommendation below.
|
|
|
|
------------------------------
|
|
One database to rule them all?
|
|
------------------------------
|
|
|
|
You can consider deploying the database in two ways:
|
|
|
|
* one galera cluster for each OpenStack service
|
|
* one single big galera cluster for all OpenStack services
|
|
|
|
The recommendation is to split your galera in separate clusters for multiple reasons:
|
|
|
|
* Reduce the impact when a galera cluster is down
|
|
* Allow intervention on smaller part of infrastructure
|
|
|
|
Also there is no benefit of colocating multiple services on the same galera cluster.
|
|
|
|
Config recommendation
|
|
---------------------
|
|
|
|
This section is split into three parts:
|
|
|
|
* the configuration for galera itself
|
|
* the configuration for the reverse proxy in front of galera
|
|
* the configuration for the OpenStack services
|
|
|
|
--------------------
|
|
Galera configuration
|
|
--------------------
|
|
|
|
All of these settings need to be consistent on all nodes of the galera cluster.
|
|
|
|
Note that this guide does not include the general requirements to get the galera cluster set up in the first place.
|
|
For this please see https://mariadb.com/kb/en/getting-started-with-mariadb-galera-cluster/
|
|
|
|
General health configs
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
.. code-block:: console
|
|
|
|
[mysqld]
|
|
max_connections=4000
|
|
max_statement_time=20
|
|
|
|
In order to ensure your cluster runs smoothly we recommend you limit the amount of connections and the time statements can be executed.
|
|
|
|
The value for ``max_connections`` should be set based on actual tests (testing with a lot of idle connections is fine).
|
|
|
|
The value of ``20`` seconds for ``max_statement_time`` is enough for all normal use-cases we know of.
|
|
You might only run into issues with regular Nova cleanup jobs if they do not run often enough.
|
|
|
|
Replication stability
|
|
^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
.. code-block:: console
|
|
|
|
[galera]
|
|
wsrep_provider_options=gcomm.thread_prio=rr:2;gcs.fc_limit=160;gcs.fc_factor=0.8;gcache.size=2G
|
|
|
|
When you have a large amount of connections to your galera cluster, these connections might starve the galera replication thread.
|
|
If the replication thread does not get enough CPU time, the galera cluster will lose its members and break.
|
|
|
|
This setting sets the replication thread to realtime scheduling on the kernel side.
|
|
If you run galera as a non-privileged user (as you hopefully do), galera will need ``CAP_SYS_NICE`` in order to be allowed to change the priority.
|
|
If you run inside a container environment, you might need to set ``kernel.sched_rt_runtime_us=-1`` (although that is suboptimal).
|
|
|
|
Performance
|
|
^^^^^^^^^^^
|
|
|
|
.. code-block:: console
|
|
|
|
[mysqld]
|
|
tmp_table_size=64M
|
|
max_heap_table_size=64M
|
|
optimizer_switch=derived_merge=off
|
|
|
|
Temporary tables and the ``derived_merge`` optimizer are an important setting if you have a large amount of Neutron RBAC rules.
|
|
|
|
---------------------------
|
|
Reverse Proxy Configuration
|
|
---------------------------
|
|
|
|
You will need to run a reverse proxy in front of your galera cluster to ensure OpenStack only ever communicates with a single cluster node for ``write`` requests.
|
|
This is required because OpenStack does not handle well the deadlocks when writing to different nodes in parallel.
|
|
|
|
If you choose to run haproxy for this, you can use something like the following config:
|
|
|
|
.. code-block:: console
|
|
|
|
defaults
|
|
timeout client 300s
|
|
|
|
listen db_master
|
|
bind 0.0.0.0:3306
|
|
balance first
|
|
option mysql-check
|
|
server server-1 server-1.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster id 1
|
|
server server-2 server-2.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster backup id 2
|
|
server server-3 server-3.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster backup id 3
|
|
|
|
listen db_slave
|
|
bind 0.0.0.0:3308
|
|
balance roundrobin
|
|
option mysql-check
|
|
server server-1 server-1.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster id 3 backup
|
|
server server-2 server-2.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster id 1 weight 10
|
|
server server-3 server-3.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster id 2 weight 10
|
|
|
|
By using two blocks, we can separate the ``read`` only SQL requests (listening on port 3308 here) from ``read/write``
|
|
requests (listening on port 3306 here) and lower down a little bit the load on the first (master) mysql backend.
|
|
|
|
You should note the ``timeout client`` setting here, as it is relevant to the OpenStack configuration.
|
|
|
|
-----------------------
|
|
OpenStack Configuration
|
|
-----------------------
|
|
|
|
Database Connection Settings
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The database configuration is normally in the ``[database]`` section of the configuration.
|
|
You should set the following:
|
|
|
|
.. code-block:: console
|
|
|
|
connection = mysql+pymysql://login:pass@proxy:3306/db?charset=utf8
|
|
slave_connection = mysql+pymysql://login:pass@proxy:3308/db?charset=utf8
|
|
connection_recycle_time = 280
|
|
max_pool_size = 15
|
|
max_overflow = 25
|
|
|
|
The ``connection`` is used by OpenStack services to do ``read`` and ``write`` requests.
|
|
|
|
The ``slave_connection`` is used by OpenStack services to do ``read`` only requests.
|
|
|
|
The ``connection_recycle_time`` should be a bit smaller than the ``timeout client`` in the reverse proxy (5% to 10%).
|
|
This ensures connections are recreated on the OpenStack side first before the reverse proxy is forcing the connection to terminate.
|
|
|
|
The ``max_pool_size`` and ``max_overflow`` define the amount of connections an individual thread is allowed to have.
|
|
You will need to set this based on experience (although the above should be a good start).
|
|
|
|
Database cleanup
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
Nova and Cinder use soft deletes inside their database.
|
|
This means deleted entries are still persistent in the database and just get a ``deleted`` flag set.
|
|
|
|
In order to prevent the database tables from growing forever these deleted entries will need to be regularly removed.
|
|
For this you can use:
|
|
|
|
* ``nova-manage db archive_deleted_rows`` and `nova-manage db purge` for Nova (https://docs.openstack.org/nova/latest/cli/nova-manage.html)
|
|
* ``cinder-manage db purge`` for Cinder (https://docs.openstack.org/cinder/latest/cli/cinder-manage.html)
|
|
|
|
If you did never run these cleanups previously (or if your environment has a high amount of resources being deleted) you might run into
|
|
a timeout due to the ``max_statement_time`` on the database cluster.
|
|
To work around this the ``nova-manage`` commands support a ``--max-rows`` argument.
|
|
For Cinder you might need to run the SQL statements manually and add a ``limit 1000`` to them (statements are part of the error of the command).
|