Browse Source

Add Galera documentation

Change-Id: I72222a1c20622ad0304ba2c6ab8984ac0ea01093
Proskurin Kirill 2 years ago
parent
commit
6297e2ad73
2 changed files with 287 additions and 0 deletions
  1. 286
    0
      doc/source/galera.rst
  2. 1
    0
      doc/source/index.rst

+ 286
- 0
doc/source/galera.rst View File

@@ -0,0 +1,286 @@
1
+.. _galera:
2
+
3
+==================
4
+Mysql Galera Guide
5
+==================
6
+
7
+This guide provides an overview of Galera implementation in CCP.
8
+
9
+Overview
10
+~~~~~~~~
11
+
12
+Galera Cluster is a synchronous multi-master database cluster, based on
13
+synchronous replication and MySQL/InnoDB. When Galera Cluster is in use, you
14
+can direct reads and writes to any node, and you can lose any individual node
15
+without interruption in operations and without the need to handle complex
16
+failover procedures.
17
+
18
+CCP implementaion details
19
+~~~~~~~~~~~~~~~~~~~~~~~~~
20
+
21
+Entrypoint script
22
+-----------------
23
+
24
+To handle all required logic, CCP has a dedicated entrypoint script for
25
+Galera and its side-containers. Because of that, Galera pods are slightly
26
+different from the rest of CCP pods. For example, Galera container still uses
27
+CCP global entrypoint, but it executes Galera entrypoint, which is executing
28
+MySQL and handles all required logic, like bootstrapping, fail detection, etc.
29
+
30
+Galera pod
31
+----------
32
+
33
+Each Galera pod consists of 3 containers:
34
+
35
+* galera
36
+* galera-checker
37
+* galera-haproxy
38
+
39
+**galera** - a container which runs Galera itself.
40
+
41
+**galera-checker** - a container with galera-checker script. It is used to
42
+check readiness and liveness of the Galera node.
43
+
44
+**galera-haproxy** - a container with a haproxy instance.
45
+
46
+.. NOTE:: More info about each container is available in the
47
+  "Galera containers" section.
48
+
49
+Etcd usage
50
+----------
51
+
52
+The current implementation uses etcd to store cluster state. The default etcd
53
+root the directory will be ``/galera/k8scluster``.
54
+
55
+Additional keys and directories are:
56
+
57
+* **leader** - key with the IP address of the current leader. Leader - is just
58
+  a single, random Galera node, which haproxy will be used as a backend.
59
+* **nodes/** - directory with current Galera nodes. Each node key will be
60
+  named as an IP address of the node and value will be a Unix time of the key
61
+  creation.
62
+* **queue/** - directory with current Galera nodes waiting in the recovery
63
+  queue. This is needed to ensure that all nodes are ready, before looking for
64
+  the node with the highest seqno. Each node key will be named as an IP addr
65
+  of the node and value will be a Unix time of the key creation.
66
+* **seqno/** - directory with current Galera nodes seqno's.
67
+  Each node key will be named as an IP address of the node and its value will
68
+  be a seqno of the node's data.
69
+* **state** - key with current cluster state. Can be "STEADY", "BUILDING" or
70
+  "RECOVERY"
71
+* **uuid** - key with current uuid of the Galera cluster. If a new node will
72
+  have a different uuid, this will indicate that we have a split brain
73
+  situation. Nodes with the wrong uuid will be destroyed.
74
+
75
+Galera containers
76
+~~~~~~~~~~~~~~~~~
77
+
78
+galera
79
+------
80
+
81
+This container runs Galera daemon, plus handles all the bootstrapping,
82
+reconnecting and recovery logic.
83
+
84
+At the start of the container, it checks for the ``init.ok`` file in the Galera
85
+data directory. If this file doesn't exist, it removes all files from the
86
+data directory, running Mysql init, to create base mysql data files, after
87
+we're starting mysqld daemon without networking and setting needed permissions
88
+for expected users.
89
+
90
+If ``init.ok`` file is found, it runs the ``mysqld_safe --wsrep-recover``
91
+to recover Galera related information and write it to the ``grastate.dat``
92
+file.
93
+
94
+After that, it checks the cluster state and depending on the current state
95
+it chose required scenario.
96
+
97
+galera-checker
98
+--------------
99
+
100
+This container is used for liveness and readiness checks of Galera pod.
101
+
102
+To check if this Galera pod is ready it checks for the following things:
103
+
104
+#. wsrep_local_state_comment = "Synced"
105
+#. wsrep_evs_state = "OPERATIONAL"
106
+#. wsrep_connected = "ON"
107
+#. wsrep_ready = "ON"
108
+#. wsrep_cluster_state_uuid = uuid in the etcd
109
+
110
+To check if this Galera pod is alive we checking the following things:
111
+
112
+#. If current cluster state is not "STEADY" - it skips liveness check.
113
+#. If it detects that SST sync is in progress - it skips liveness check.
114
+#. If it detects that there is no Mysql pid file yet - it skips liveness
115
+   check.
116
+#. If node "wsrep_cluster_state_uuid" differs from the etcd one - it kills
117
+   Galera container, since it's a "split brain" situation.
118
+#. If "wsrep_local_state_comment" is "Joined", and the previous state was
119
+   "Joined" too - it kills Galera container since it can't finish joining
120
+   to the cluster for some reason.
121
+#. If it caught any exception during the checks - it kills Galera container.
122
+
123
+If all checks passed - we're deciding that Galera pod is alive.
124
+
125
+galera-haproxy
126
+--------------
127
+
128
+This container is used to run haproxy daemon, which is used to send all traffic
129
+to a single Galera pod.
130
+
131
+This is needed to avoid deadlocks and stale reads. It chooses the "leader"
132
+out of all available Galera pods and once leader is chosen, all haproxy
133
+instances update their configuration with the new leader.
134
+
135
+Supported scenarios
136
+~~~~~~~~~~~~~~~~~~~
137
+
138
+Initial bootstrap
139
+-----------------
140
+
141
+In this scenario, there is no working Galera cluster yet. Each node trying to
142
+get the lock in etcd, first one which can start cluster bootstrapping. After
143
+it's done, next node gets the lock and connects to the existing cluster.
144
+
145
+.. NOTE:: During the bootstrap state of the cluster will be "BUILDING". It will
146
+  be changed to "STEADY" after last node connection.
147
+
148
+Re-connecting to the existing cluster
149
+-------------------------------------
150
+
151
+In this scenario, Galera cluster is already available. In most case it will be
152
+a node re-connection after some failures, such as node reboot. Each node tries
153
+to get the lock in etcd, once lock acquiring node connects to the existing
154
+cluster.
155
+
156
+.. NOTE:: During this scenario state of the cluster will be "STEADY".
157
+
158
+Recovery
159
+--------
160
+
161
+This scenario could be triggered by two possible options:
162
+
163
+* Operator manually sets cluster state in etcd to the "RECOVERY"
164
+* New node does a few checks before bootstrapping, if it finds that cluster
165
+  state is "STEADY", but there is zero nodes in the cluster - it assumes that
166
+  cluster has been destroyed somehow and we need to run recovery. In that case,
167
+  it sets the state to the "RECOVERY" and starts recovery scenario.
168
+
169
+During the recovery scenario cluster bootstrapping is different from the
170
+"Initial bootstrap". In this scenario, each node looks for its "seqno", which
171
+is basically the registered number of the transactions. A node with the highest
172
+seqno will bootstrap cluster and other nodes will join it, so in the end, we
173
+will have the latest data available before the cluster destruction.
174
+
175
+.. NOTE:: During the bootstrap state of the cluster will be "RECOVERY". It will
176
+  be changed to "STEADY" after last node connection.
177
+
178
+There is an option to manually choose the node to recover data from.
179
+For details please see the "force bootstrap" section in the "Advanced features"
180
+.
181
+
182
+Advanced features
183
+~~~~~~~~~~~~~~~~~
184
+
185
+Cluster size
186
+------------
187
+
188
+By default, galera cluster size will be 3 nodes. This is optimal for the most
189
+cases. If you want to change it to some custom number, you need to override
190
+**cluster_size** variable in the **percona** tree, for example:
191
+
192
+::
193
+
194
+    configs:
195
+      percona:
196
+        cluster_size: 5
197
+
198
+.. NOTE:: Cluster size should be an odd number. Cluster size with more that 5
199
+  nodes will lead to big latency for write operations.
200
+
201
+Force bootstrap
202
+---------------
203
+
204
+Sometimes operators may want to manually specify Galera node which recovery
205
+should be done from. In that case, you need to override **force_bootstrap**
206
+variable in the **percona** tree, for example:
207
+
208
+::
209
+
210
+    configs:
211
+      percona:
212
+        force_bootstrap:
213
+          enabled: true
214
+          node: NODE_NAME
215
+
216
+**NODE_NAME** should be the name of the k8s node, which will run Galera node
217
+with required data.
218
+
219
+Troubleshooting
220
+~~~~~~~~~~~~~~~
221
+
222
+Galera operation requires some advanced knowledge in Mysql and in some general
223
+clustering conceptions. In most cases, we expect that Galera will "self-heal"
224
+itself, in the worst case via restart, full resync and reconnection to the
225
+cluster.
226
+
227
+Our readiness and liveness scripts should cover this, and not allow
228
+misconfigured or non-operational node receive production traffic.
229
+
230
+Yet it's possible that some failure scenarios is not covered and to fix them
231
+some manual actions could be required.
232
+
233
+Check the logs
234
+--------------
235
+
236
+Each container of the Galera pod writes detailed logs to the stdout. You could
237
+read them via ``kubectl logs POD_NAME -c CONT_NAME``. Make sure you check the
238
+``galera`` container logs and ``galera-checker`` ones.
239
+
240
+Additionally you should check the Mysql logs in the
241
+``/var/log/ccp/mysql/mysql.log``
242
+
243
+Check the etcd state
244
+--------------------
245
+
246
+Galera keeps its state in the etcd and it could be useful to check what is
247
+going on in the etcd right now. Assuming that you're using the **ccp**
248
+namespace, you could check etcd state using this command:
249
+
250
+::
251
+
252
+    etcdctl --endpoints http://etcd.ccp.svc.cluster.local:2379 ls -r -p --sort /galera
253
+    etcdctl --endpoints http://etcd.ccp.svc.cluster.local:2379 get /galera/k8scluster/state
254
+    etcdctl --endpoints http://etcd.ccp.svc.cluster.local:2379 get /galera/k8scluster/leader
255
+    etcdctl --endpoints http://etcd.ccp.svc.cluster.local:2379 get /galera/k8scluster/uuid
256
+
257
+Node restart
258
+------------
259
+
260
+In most cases, it should be safe to restart a single Galera node. If you need
261
+to do it for some reason, just delete the pod, via kubectl:
262
+
263
+::
264
+
265
+    kubectl delete pod POD_NAME
266
+
267
+Full cluster restart
268
+--------------------
269
+
270
+In some cases, you may need to restart the whole cluster. Make sure you have a
271
+backup before doing this. To do this, set the cluster state to the "RECOVERY":
272
+
273
+::
274
+
275
+    etcdctl --endpoints http://etcd.ccp.svc.cluster.local:2379 set /galera/k8scluster/state RECOVERY
276
+
277
+After that restart all Galera pods:
278
+
279
+::
280
+
281
+    kubectl delete pod POD1_NAME POD2_NAME POD3_NAME
282
+
283
+Once that done, Galera cluster will be rebuilt and should be operational.
284
+
285
+.. NOTE:: For more info about cluster recovery please refer to the
286
+  "Supported scenarios" section.

+ 1
- 0
doc/source/index.rst View File

@@ -25,6 +25,7 @@ Advanced topics
25 25
    :maxdepth: 1
26 26
 
27 27
    deploying_multiple_parallel_environments
28
+   galera
28 29
    ceph
29 30
    ceph_cluster
30 31
    using_calico_instead_of_ovs

Loading…
Cancel
Save