Browse Source

Update docs

Update example and README docs:
* Quotes are important for 'off' as YAML treats off w/o
  quotes as a false
* Updated info about recommended cluster configuration for
  'suicide' no quorum policy.
* Updated details about 'reboot' and 'poweroff' policy values
* Provided example provision/deploy commands
* Update known issues

Change-Id: I4ce2c6641d221c8b37fe275029973b5968d27cb1
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
changes/64/171964/7
Bogdan Dobrelya 4 years ago
parent
commit
1a3849bd98

+ 57
- 6
README.md View File

@@ -56,14 +56,27 @@ Note that in order to build this plugin the following tools must present:
56 56
 
57 57
 * Create an HA environment and select the fencing policy (reboot, poweroff or
58 58
   disabled) at the settings tab.
59
+  Note, that there is no difference between the 'reboot' and 'poweroff' policy for
60
+  this version of the plugin. The 'reboot' or 'poweroff' value just enables the
61
+  fencing feature, while the 'disabled' value - disables it. The difference may
62
+  present for future versions, when creation of the YAML configuration files for
63
+  nodes will be automated.
59 64
 
60 65
 * Assign roles to the nodes as always, but use Fuel CLI instead of Deploy button
61 66
   to provision all nodes in the environment. Please note, that the power management
62
-  devices should be reachable from the management network via TCP protocol.
67
+  devices should be reachable from the management network via TCP protocol:
68
+
69
+  ```
70
+  fuel --env <environment_id> node --provision --node <nodes_list>
71
+  ```
72
+
73
+  (node list should be comma-separated like 1,2,3,4)
63 74
 
64 75
 * Define YAML configuration files for controller nodes and existing power management
65 76
   (PM aka STONITH) devices. See an example in
66
-  ``deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing.yaml``.
77
+  [deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing.yaml](https://github.com/stackforge/fuel-plugin-ha-fencing/blob/master/deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing.yaml).
78
+  Note, that quotes for the 'off' and 'reboot' values are important as just an ``off``
79
+  would be equal to ``false``, which is wrong.
67 80
 
68 81
   In the given example we assume 'reboot' policy, which is a hard resetting of
69 82
   the failed nodes in Pacemaker cluster. We define IPMI reset action and PSU OFF/ON
@@ -116,12 +129,19 @@ Note that in order to build this plugin the following tools must present:
116 129
 * Put created fencing configuration YAML files as ``/etc/pcs_fencing.yaml``
117 130
   for corresponding controller nodes.
118 131
 
119
-* Deploy HA environment either by CLI command or Deploy button
132
+* Deploy HA environment either by Deploy button in UI or by CLI command:
133
+
134
+  ```
135
+  fuel --env <environment_id> node --deploy --node <nodes_list>
136
+  ```
137
+
138
+  (node list should be comma-separated like 1,2,3,4)
120 139
 
121 140
 TODO(bogdando) finish the guide, add agents and devices verification commands
122 141
 
123
-Please also note that the recommended value for the ``no-quorum-policy`` cluster property
124
-should be changed manually (after deployment is done) from ignore/stopped to suicide.
142
+Please also note that for clusters containing 3,5,7 or more controllers the recommended
143
+value for the ``no-quorum-policy`` cluster property should be changed manually
144
+(after deployment is done) from ignore/stopped to suicide.
125 145
 For more information on no-quorum policy, see the [Cluster Options](http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-cluster-options.html)
126 146
 section in the official Pacemaker documentation. You can set this property by the command
127 147
 ```
@@ -184,7 +204,7 @@ Plugin :: Fuel version
184 204
 Known Issues
185 205
 ------------
186 206
 
187
-[LP1411603](https://bugs.launchpad.net/fuel/+bug/1411603)
207
+### Concurrent nodes deployment issue [LP1411603](https://bugs.launchpad.net/fuel/+bug/1411603)
188 208
 
189 209
 After the deployment is finished, please make sure all of the controller nodes have
190 210
 corresponding ``stonith__*`` primitives and the stonith verification command gives
@@ -208,6 +228,37 @@ one "allow" location shown by the ref command.
208 228
 If some of the controller nodes does not have corresponding stonith primitives
209 229
 or locations for them, please follow the workaround provided at the LP bug.
210 230
 
231
+### Timer expired responses
232
+
233
+There is also possible that fencing actions are timed out with the errors like:
234
+
235
+```
236
+error: remote_op_done: Operation reboot of node-8 by node-7 for
237
+crmd.7932@node-7.d3cb0ebd: Timer expired
238
+```
239
+
240
+or some nodes configured with 'reboot' policy may enter the reboot loop caused by
241
+the fencing action.
242
+
243
+All of this means that the given values for timeouts should be verified and adjusted
244
+as appropriate.
245
+
246
+### Node stucks in pending state after was powered on
247
+
248
+There is a known bug in pacemaker 1.1.10 when the fenced node returns back too fast
249
+(see this [mail thread](http://oss.clusterlabs.org/pipermail/pacemaker/2014-April/021564.html) for details):
250
+
251
+Essentially the node is returning "too fast" (specifically, before the fencing
252
+notification arrives) causing pacemaker to forget the node is up and healthy.
253
+The fix for this is https://github.com/beekhof/pacemaker/commit/e777b17 and is
254
+present in 1.1.11
255
+
256
+As a workaround you should not bring the failed node back within few minutes after
257
+it had been STONITHed. And if it still stucks in pending state, you can restart its
258
+corosync service. And if corosync service hangs on stop and have to be killed and
259
+restarted - make it fast, otherwise another STONITH action triggered by dead corosync
260
+process would arrive.
261
+
211 262
 Release Notes
212 263
 -------------
213 264
 

+ 3
- 3
deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing.yaml View File

@@ -41,9 +41,9 @@ fence_primitives:
41 41
       auth: password
42 42
       power_wait: '15'
43 43
       delay: '300'
44
-      action: reboot
45
-      pcmk_reboot_action: reboot
46
-      pcmk_off_action: reboot
44
+      action: 'reboot'
45
+      pcmk_reboot_action: 'reboot'
46
+      pcmk_off_action: 'reboot'
47 47
       pcmk_host_list: node-10.test.local
48 48
   psu_off:
49 49
     agent_type: fence_apc_snmp

+ 3
- 3
deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing_virsh.yaml View File

@@ -37,7 +37,7 @@ fence_primitives:
37 37
       login_timeout: '5'
38 38
       secure: true
39 39
       delay: '300'
40
-      action: reboot
41
-      pcmk_reboot_action: reboot
42
-      pcmk_off_action: reboot
40
+      action: 'reboot'
41
+      pcmk_reboot_action: 'reboot'
42
+      pcmk_off_action: 'reboot'
43 43
       pcmk_host_map: 'node-7:env60_slave-07'

Loading…
Cancel
Save