Browse Source

Move specs: airship-in-a-bottle to airship-specs

Moves the two blueprint/spec documents that existed in
airship-in-a-bottle to the airship-specs. The implemented spec was not
reformatted to the spec template. The other spec (in approved folder)
was minimally updated to the spec template.

Change-Id: I7468579e2fa3077ee1144e5294eba97d8e4ced05
Bryan Strassner 8 months ago
parent
commit
bfbfd56c81

+ 620
- 0
specs/approved/workflow_node-teardown.rst View File

@@ -0,0 +1,620 @@
1
+..
2
+      Copyright 2018 AT&T Intellectual Property.
3
+      All Rights Reserved.
4
+
5
+      Licensed under the Apache License, Version 2.0 (the "License"); you may
6
+      not use this file except in compliance with the License. You may obtain
7
+      a copy of the License at
8
+
9
+          http://www.apache.org/licenses/LICENSE-2.0
10
+
11
+      Unless required by applicable law or agreed to in writing, software
12
+      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
13
+      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
14
+      License for the specific language governing permissions and limitations
15
+      under the License.
16
+
17
+.. index::
18
+   single: Teardown node
19
+   single: workflow;redeploy_server
20
+   single: Drydock
21
+   single: Promenade
22
+   single: Shipyard
23
+
24
+
25
+.. _node-teardown:
26
+
27
+=====================
28
+Airship Node Teardown
29
+=====================
30
+
31
+Shipyard is the entrypoint for Airship actions, including the need to redeploy a
32
+server. The first part of redeploying a server is the graceful teardown of the
33
+software running on the server; specifically Kubernetes and etcd are of
34
+critical concern. It is the duty of Shipyard to orchestrate the teardown of the
35
+server, followed by steps to deploy the desired new configuration. This design
36
+covers only the first portion - node teardown
37
+
38
+
39
+Links
40
+=====
41
+
42
+None
43
+
44
+Problem description
45
+===================
46
+
47
+When redeploying a physical host (server) using the Airship Platform,
48
+it is necessary to trigger a sequence of steps to prevent undesired behaviors
49
+when the server is redeployed. This blueprint intends to document the
50
+interaction that must occur between Airship components to teardown a server.
51
+
52
+Impacted components
53
+===================
54
+
55
+Drydock
56
+Promenade
57
+Shipyard
58
+
59
+Proposed change
60
+===============
61
+
62
+Shipyard node teardown Process
63
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
64
+#. (Existing) Shipyard receives request to redeploy_server, specifying a target
65
+   server.
66
+#. (Existing) Shipyard performs preflight, design reference lookup, and
67
+   validation steps.
68
+#. (New) Shipyard invokes Promenade to decommission a node.
69
+#. (New) Shipyard invokes Drydock to destroy the node - setting a node
70
+   filter to restrict to a single server.
71
+#. (New) Shipyard invokes Promenade to remove the node from the Kubernetes
72
+   cluster.
73
+
74
+Assumption:
75
+node_id is the hostname of the server, and is also the identifier that both
76
+Drydock and Promenade use to identify the appropriate parts - hosts and k8s
77
+nodes. This convention is set by the join script produced by promenade.
78
+
79
+Drydock Destroy Node
80
+--------------------
81
+The API/interface for destroy node already exists. The implementation within
82
+Drydock needs to be developed. This interface will need to accept both the
83
+specified node_id and the design_id to retrieve from Deckhand.
84
+
85
+Using the provided node_id (hardware node), and the design_id, Drydock will
86
+reset the hardware to a re-provisionable state.
87
+
88
+By default, all local storage should be wiped (per datacenter policy for
89
+wiping before re-use).
90
+
91
+An option to allow for only the OS disk to be wiped should be supported, such
92
+that other local storage is left intact, and could be remounted without data
93
+loss. e.g.: --preserve-local-storage
94
+
95
+The target node should be shut down.
96
+
97
+The target node should be removed from the provisioner (e.g. MaaS)
98
+
99
+Responses
100
+~~~~~~~~~
101
+The responses from this functionality should follow the pattern set by prepare
102
+nodes, and other Drydock functionality. The Drydock status responses used for
103
+all async invocations will be utilized for this functionality.
104
+
105
+Promenade Decommission Node
106
+---------------------------
107
+Performs steps that will result in the specified node being cleanly
108
+disassociated from Kubernetes, and ready for the server to be destroyed.
109
+Users of the decommission node API should be aware of the long timeout values
110
+that may occur while awaiting promenade to complete the appropriate steps.
111
+At this time, Promenade is a stateless service and doesn't use any database
112
+storage. As such, requests to Promenade are synchronous.
113
+
114
+.. code:: json
115
+
116
+  POST /nodes/{node_id}/decommission
117
+
118
+  {
119
+    rel : "design",
120
+    href: "deckhand+https://{{deckhand_url}}/revisions/{{revision_id}}/rendered-documents",
121
+    type: "application/x-yaml"
122
+  }
123
+
124
+Such that the design reference body is the design indicated when the
125
+redeploy_server action is invoked through Shipyard.
126
+
127
+Query Parameters:
128
+
129
+-  drain-node-timeout: A whole number timeout in seconds to be used for the
130
+   drain node step (default: none). In the case of no value being provided,
131
+   the drain node step will use its default.
132
+-  drain-node-grace-period: A whole number in seconds indicating the
133
+   grace-period that will be provided to the drain node step. (default: none).
134
+   If no value is specified, the drain node step will use its default.
135
+-  clear-labels-timeout: A whole number timeout in seconds to be used for the
136
+   clear labels step. (default: none).  If no value is specified, clear labels
137
+   will use its own default.
138
+-  remove-etcd-timeout: A whole number timeout in seconds to be used for the
139
+   remove etcd from nodes step. (default: none). If no value is specified,
140
+   remove-etcd will use its own default.
141
+-  etcd-ready-timeout: A whole number in seconds indicating how long the
142
+   decommission node request should allow for etcd clusters to become stable
143
+   (default: 600).
144
+
145
+Process
146
+~~~~~~~
147
+Acting upon the node specified by the invocation and the design reference
148
+details:
149
+
150
+#. Drain the Kubernetes node.
151
+#. Clear the Kubernetes labels on the node.
152
+#. Remove etcd nodes from their clusters (if impacted).
153
+   - if the node being decommissioned contains etcd nodes, Promenade will
154
+   attempt to gracefully have those nodes leave the etcd cluster.
155
+#. Ensure that etcd cluster(s) are in a stable state.
156
+   - Polls for status every 30 seconds up to the etcd-ready-timeout, or the
157
+   cluster meets the defined minimum functionality for the site.
158
+   - A new document: promenade/EtcdClusters/v1 that will specify details about
159
+   the etcd clusters deployed in the site, including: identifiers,
160
+   credentials, and thresholds for minimum functionality.
161
+   - This process should ignore the node being torn down from any calculation
162
+   of health
163
+#. Shutdown the kubelet.
164
+   - If this is not possible because the node is in a state of disarray such
165
+   that it cannot schedule the daemonset to run, this step may fail, but
166
+   should not hold up the process, as the Drydock dismantling of the node
167
+   will shut the kubelet down.
168
+
169
+Responses
170
+~~~~~~~~~
171
+All responses will be form of the Airship Status response.
172
+
173
+-  Success: Code: 200, reason: Success
174
+
175
+   Indicates that all steps are successful.
176
+
177
+-  Failure: Code: 404, reason: NotFound
178
+
179
+   Indicates that the target node is not discoverable by Promenade.
180
+
181
+-  Failure: Code: 500, reason: DisassociateStepFailure
182
+
183
+   The details section should detail the successes and failures further. Any
184
+   4xx series errors from the individual steps would manifest as a 500 here.
185
+
186
+Promenade Drain Node
187
+--------------------
188
+Drain the Kubernetes node for the target node. This will ensure that this node
189
+is no longer the target of any pod scheduling, and evicts or deletes the
190
+running pods. In the case of notes running DaemonSet manged pods, or pods
191
+that would prevent a drain from occurring, Promenade may be required to provide
192
+the `ignore-daemonsets` option or `force` option to attempt to drain the node
193
+as fully as possible.
194
+
195
+By default, the drain node will utilize a grace period for pods of 1800
196
+seconds and a total timeout of 3600 seconds (1 hour). Clients of this
197
+functionality should be prepared for a long timeout.
198
+
199
+.. code:: json
200
+
201
+  POST /nodes/{node_id}/drain
202
+
203
+Query Paramters:
204
+
205
+-  timeout: a whole number in seconds (default = 3600). This value is the total
206
+   timeout for the kubectl drain command.
207
+-  grace-period: a whole number in seconds (default = 1800). This value is the
208
+   grace period used by kubectl drain. Grace period must be less than timeout.
209
+
210
+.. note::
211
+
212
+   This POST has no message body
213
+
214
+Example command being used for drain (reference only)
215
+`kubectl drain --force --timeout 3600s --grace-period 1800 --ignore-daemonsets --delete-local-data n1`
216
+https://git.openstack.org/cgit/openstack/airship-promenade/tree/promenade/templates/roles/common/usr/local/bin/promenade-teardown
217
+
218
+Responses
219
+~~~~~~~~~
220
+All responses will be form of the Airship Status response.
221
+
222
+-  Success: Code: 200, reason: Success
223
+
224
+   Indicates that the drain node has successfully concluded, and that no pods
225
+   are currently running
226
+
227
+-  Failure: Status response, code: 400, reason: BadRequest
228
+
229
+   A request was made with parameters that cannot work - e.g. grace-period is
230
+   set to a value larger than the timeout value.
231
+
232
+-  Failure: Status response, code: 404, reason: NotFound
233
+
234
+   The specified node is not discoverable by Promenade
235
+
236
+-  Failure: Status response, code: 500, reason: DrainNodeError
237
+
238
+   There was a processing exception raised while trying to drain a node. The
239
+   details section should indicate the underlying cause if it can be
240
+   determined.
241
+
242
+Promenade Clear Labels
243
+----------------------
244
+Removes the labels that have been added to the target kubernetes node.
245
+
246
+.. code:: json
247
+
248
+  POST /nodes/{node_id}/clear-labels
249
+
250
+Query Parameters:
251
+
252
+-  timeout: A whole number in seconds allowed for the pods to settle/move
253
+   following removal of labels. (Default = 1800)
254
+
255
+.. note::
256
+
257
+   This POST has no message body
258
+
259
+Responses
260
+~~~~~~~~~
261
+All responses will be form of the UCP Status response.
262
+
263
+-  Success: Code: 200, reason: Success
264
+
265
+   All labels have been removed from the specified Kubernetes node.
266
+
267
+-  Failure: Code: 404, reason: NotFound
268
+
269
+   The specified node is not discoverable by Promenade
270
+
271
+-  Failure: Code: 500, reason: ClearLabelsError
272
+
273
+   There was a failure to clear labels that prevented completion. The details
274
+   section should provide more information about the cause of this failure.
275
+
276
+Promenade Remove etcd Node
277
+~~~~~~~~~~~~~~~~~~~~~~~~~~
278
+Checks if the node specified contains any etcd nodes. If so, this API will
279
+trigger that etcd node to leave the associated etcd cluster::
280
+
281
+  POST /nodes/{node_id}/remove-etcd
282
+
283
+  {
284
+    rel : "design",
285
+    href: "deckhand+https://{{deckhand_url}}/revisions/{{revision_id}}/rendered-documents",
286
+    type: "application/x-yaml"
287
+  }
288
+
289
+Query Parameters:
290
+
291
+-  timeout: A whole number in seconds allowed for the removal of etcd nodes
292
+   from the targe node. (Default = 1800)
293
+
294
+Responses
295
+~~~~~~~~~
296
+All responses will be form of the UCP Status response.
297
+
298
+-  Success: Code: 200, reason: Success
299
+
300
+   All etcd nodes have been removed from the specified node.
301
+
302
+-  Failure: Code: 404, reason: NotFound
303
+
304
+   The specified node is not discoverable by Promenade
305
+
306
+-  Failure: Code: 500, reason: RemoveEtcdError
307
+
308
+   There was a failure to remove etcd from the target node that prevented
309
+   completion within the specified timeout, or that etcd prevented removal of
310
+   the node because it would result in the cluster being broken. The details
311
+   section should provide more information about the cause of this failure.
312
+
313
+
314
+Promenade Check etcd
315
+~~~~~~~~~~~~~~~~~~~~
316
+Retrieves the current interpreted state of etcd.
317
+
318
+GET /etcd-cluster-health-statuses?design_ref={the design ref}
319
+
320
+Where the design_ref parameter is required for appropriate operation, and is in
321
+the same format as used for the join-scripts API.
322
+
323
+Query Parameters:
324
+
325
+-  design_ref: (Required) the design reference to be used to discover etcd
326
+   instances.
327
+
328
+Responses
329
+~~~~~~~~~
330
+All responses will be form of the UCP Status response.
331
+
332
+-  Success: Code: 200, reason: Success
333
+
334
+   The status of each etcd in the site will be returned in the details section.
335
+   Valid values for status are: Healthy, Unhealthy
336
+
337
+https://github.com/att-comdev/ucp-integration/blob/master/docs/source/api-conventions.rst#status-responses
338
+
339
+.. code:: json
340
+
341
+  { "...": "... standard status response ...",
342
+    "details": {
343
+      "errorCount": {{n}},
344
+      "messageList": [
345
+        { "message": "Healthy",
346
+          "error": false,
347
+          "kind": "HealthMessage",
348
+          "name": "{{the name of the etcd service}}"
349
+        },
350
+        { "message": "Unhealthy"
351
+          "error": false,
352
+          "kind": "HealthMessage",
353
+          "name": "{{the name of the etcd service}}"
354
+        },
355
+        { "message": "Unable to access Etcd"
356
+          "error": true,
357
+          "kind": "HealthMessage",
358
+          "name": "{{the name of the etcd service}}"
359
+        }
360
+      ]
361
+    }
362
+    ...
363
+  }
364
+
365
+-  Failure: Code: 400, reason: MissingDesignRef
366
+
367
+   Returned if the design_ref parameter is not specified
368
+
369
+-  Failure: Code: 404, reason: NotFound
370
+
371
+   Returned if the specified etcd could not be located
372
+
373
+-  Failure: Code: 500, reason: EtcdNotAccessible
374
+
375
+   Returned if the specified etcd responded with an invalid health response
376
+   (Not just simply unhealthy - that's a 200).
377
+
378
+
379
+Promenade Shutdown Kubelet
380
+--------------------------
381
+Shuts down the kubelet on the specified node. This is accomplished by Promenade
382
+setting the label `promenade-decomission: enabled` on the node, which will
383
+trigger a newly-developed daemonset to run something like:
384
+`systemctl disable kubelet && systemctl stop kubelet`.
385
+This daemonset will effectively sit dormant until nodes have the appropriate
386
+label added, and then perform the kubelet teardown.
387
+
388
+.. code:: json
389
+
390
+  POST /nodes/{node_id}/shutdown-kubelet
391
+
392
+.. note::
393
+
394
+   This POST has no message body
395
+
396
+Responses
397
+~~~~~~~~~
398
+All responses will be form of the UCP Status response.
399
+
400
+-  Success: Code: 200, reason: Success
401
+
402
+   The kubelet has been successfully shutdown
403
+
404
+-  Failure: Code: 404, reason: NotFound
405
+
406
+   The specified node is not discoverable by Promenade
407
+
408
+-  Failure: Code: 500, reason: ShutdownKubeletError
409
+
410
+   The specified node's kubelet fails to shutdown. The details section of the
411
+   status response should contain reasonable information about the source of
412
+   this failure
413
+
414
+Promenade Delete Node from Cluster
415
+----------------------------------
416
+Updates the Kubernetes cluster, removing the specified node. Promenade should
417
+check that the node is drained/cordoned and has no labels other than
418
+`promenade-decomission: enabled`. In either of these cases, the API should
419
+respond with a 409 Conflict response.
420
+
421
+.. code:: json
422
+
423
+  POST /nodes/{node_id}/remove-from-cluster
424
+
425
+.. note::
426
+
427
+   This POST has no message body
428
+
429
+Responses
430
+~~~~~~~~~
431
+All responses will be form of the UCP Status response.
432
+
433
+-  Success: Code: 200, reason: Success
434
+
435
+   The specified node has been removed from the Kubernetes cluster.
436
+
437
+-  Failure: Code: 404, reason: NotFound
438
+
439
+   The specified node is not discoverable by Promenade
440
+
441
+-  Failure: Code: 409, reason: Conflict
442
+
443
+   The specified node cannot be deleted due to checks that the node is
444
+   drained/cordoned and has no labels (other than possibly
445
+   `promenade-decomission: enabled`).
446
+
447
+-  Failure: Code: 500, reason: DeleteNodeError
448
+
449
+   The specified node cannot be removed from the cluster due to an error from
450
+   Kubernetes. The details section of the status response should contain more
451
+   information about the failure.
452
+
453
+
454
+Shipyard Tag Releases
455
+---------------------
456
+Shipyard will need to mark Deckhand revisions with tags when there are
457
+successful deploy_site or update_site actions to be able to determine the last
458
+known good design. This is related to issue 16 for Shipyard, which utilizes the
459
+same need.
460
+
461
+.. note::
462
+
463
+   Repeated from https://github.com/att-comdev/shipyard/issues/16
464
+
465
+   When multiple configdocs commits have been done since the last deployment,
466
+   there is no ready means to determine what's being done to the site. Shipyard
467
+   should reject deploy site or update site requests that have had multiple
468
+   commits since the last site true-up action. An option to override this guard
469
+   should be allowed for the actions in the form of a parameter to the action.
470
+
471
+   The configdocs API should provide a way to see what's been changed since the
472
+   last site true-up, not just the last commit of configdocs. This might be
473
+   accommodated by new deckhand tags like the 'commit' tag, but for
474
+   'site true-up' or similar applied by the deploy and update site commands.
475
+
476
+The design for issue 16 includes the bare-minimum marking of Deckhand
477
+revisions. This design is as follows:
478
+
479
+Scenario
480
+~~~~~~~~
481
+Multiple commits occur between site actions (deploy_site, update_site) - those
482
+actions that attempt to bring a site into compliance with a site design.
483
+When this occurs, the current system of being able to only see what has changed
484
+between committed and the the buffer versions (configdocs diff) is insufficient
485
+to be able to investigate what has changed since the last successful (or
486
+unsuccessful) site action.
487
+To accommodate this, Shipyard needs several enhancements.
488
+
489
+Enhancements
490
+~~~~~~~~~~~~
491
+
492
+#. Deckhand revision tags for site actions
493
+
494
+   Using the tagging facility provided by Deckhand, Shipyard will tag the end
495
+   of site actions.
496
+   Upon completing a site action successfully tag the revision being used with
497
+   the tag site-action-success, and a body of dag_id:<dag_id>
498
+
499
+   Upon completion of a site action unsuccessfully, tag the revision being used
500
+   with the tag site-action-failure, and a body of dag_id:<dag_id>
501
+
502
+   The completion tags should only be applied upon failure if the site action
503
+   gets past document validation successfully (i.e. gets to the point where it
504
+   can start making changes via the other UCP components)
505
+
506
+   This could result in a single revision having both site-action-success and
507
+   site-action-failure if a later re-invocation of a site action is successful.
508
+
509
+#. Check for intermediate committed revisions
510
+
511
+   Upon running a site action, before tagging the revision with the site action
512
+   tag(s), the dag needs to check to see if there are committed revisions that
513
+   do not have an associated site-action tag.  If there are any committed
514
+   revisions since the last site action other than the current revision being
515
+   used (between them), then the action should not be allowed to proceed (stop
516
+   before triggering validations). For the calculation of intermediate
517
+   committed revisions, assume revision 0 if there are no revisions with a
518
+   site-action tag (null case)
519
+
520
+   If the action is invoked with a parameter of
521
+   allow-intermediate-commits=true, then this check should log that the
522
+   intermediate committed revisions check is being skipped and not take any
523
+   other action.
524
+
525
+#. Support action parameter of allow-intermediate-commits=true|false
526
+
527
+   In the CLI for create action, the --param option supports adding parameters
528
+   to actions. The parameters passed should be relayed by the CLI to the API
529
+   and ultimately to the invocation of the DAG.  The DAG as noted above will
530
+   check for the presense of allow-intermediate-commits=true.  This needs to be
531
+   tested to work.
532
+
533
+#. Shipyard needs to support retrieving configdocs and rendered documents for
534
+   the last successful site action, and last site action (successful or not
535
+   successful)
536
+
537
+   --successful-site-action
538
+   --last-site-action
539
+   These options would be mutually exclusive of --buffer or --committed
540
+
541
+#. Shipyard diff (shipyard get configdocs)
542
+
543
+   Needs to support an option to do the diff of the buffer vs. the last
544
+   successful site action and the last site action (succesful or not
545
+   successful).
546
+
547
+   Currently there are no options to select which versions to diff (always
548
+   buffer vs. committed)
549
+
550
+   support:
551
+   --base-version=committed | successful-site-action | last-site-action (Default = committed)
552
+   --diff-version=buffer | committed | successful-site-action | last-site-action (Default = buffer)
553
+
554
+   Equivalent query parameters need to be implemented in the API.
555
+
556
+Because the implementation of this design will result in the tagging of
557
+successful site-actions, Shipyard will be able to determine the correct
558
+revision to use while attempting to teardown a node.
559
+
560
+If the request to teardown a node indicates a revision that doesn't exist, the
561
+command to do so (e.g. redeploy_server) should not continue, but rather fail
562
+due to a missing precondition.
563
+
564
+The invocation of the Promenade and Drydock steps in this design will utilize
565
+the appropriate tag based on the request (default is successful-site-action) to
566
+determine the revision of the Deckhand documents used as the design-ref.
567
+
568
+Shipyard redeploy_server Action
569
+-------------------------------
570
+The redeploy_server action currently accepts a target node. Additional
571
+supported parameters are needed:
572
+
573
+#. preserve-local-storage=true which will instruct Drydock to only wipe the
574
+   OS drive, and any other local storage will not be wiped. This would allow
575
+   for the drives to be remounted to the server upon re-provisioning. The
576
+   default behavior is that local storage is not preserved.
577
+
578
+#. target-revision=committed | successful-site-action | last-site-action
579
+   This will indicate which revision of the design will be used as the
580
+   reference for what should be re-provisioned after the teardown.
581
+   The default is successful-site-action, which is the closest representation
582
+   to the last-known-good state.
583
+
584
+These should be accepted as parameters to the action API/CLI and modify the
585
+behavior of the redeploy_server DAG.
586
+
587
+Security impact
588
+---------------
589
+
590
+None. This change introduces no new security concerns outside of established
591
+patterns for RBAC controls around API endpoints.
592
+
593
+Performance impact
594
+------------------
595
+
596
+As this is an on-demand action, there is no expected performance impact to
597
+existing processes, although tearing down a host may result in temporary
598
+degraded service capacity in the case of needing to move workloads to different
599
+hosts, or a more simple case of reduced capacity.
600
+
601
+Alternatives
602
+------------
603
+
604
+N/A
605
+
606
+Implementation
607
+==============
608
+
609
+None at this time
610
+
611
+Dependencies
612
+============
613
+
614
+None.
615
+
616
+
617
+References
618
+==========
619
+
620
+None

+ 569
- 0
specs/implemented/deployment-grouping-baremetal.rst View File

@@ -0,0 +1,569 @@
1
+..
2
+      Copyright 2018 AT&T Intellectual Property.
3
+      All Rights Reserved.
4
+
5
+      Licensed under the Apache License, Version 2.0 (the "License"); you may
6
+      not use this file except in compliance with the License. You may obtain
7
+      a copy of the License at
8
+
9
+          http://www.apache.org/licenses/LICENSE-2.0
10
+
11
+      Unless required by applicable law or agreed to in writing, software
12
+      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
13
+      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
14
+      License for the specific language governing permissions and limitations
15
+      under the License.
16
+
17
+.. index::
18
+   single: Deployment grouping
19
+   single: workflow
20
+   single: Shipyard
21
+   single: Drydock
22
+
23
+.. _deployment-grouping-baremetal:
24
+
25
+=======================================
26
+Deployment Grouping for Baremetal Nodes
27
+=======================================
28
+One of the primary functionalities of the Undercloud Platform is the deployment
29
+of baremetal nodes as part of site deployment and upgrade. This blueprint aims
30
+to define how deployment strategies can be applied to the workflow during these
31
+actions.
32
+
33
+.. note::
34
+
35
+  This document has been moved from the airship-in-a-bottle project, and is
36
+  previously implemented. The format of this document diverges from the
37
+  standard template for airship-specs.
38
+
39
+Overview
40
+--------
41
+When Shipyard is invoked for a deploy_site or update_site action, there are
42
+three primary stages:
43
+
44
+1. Preparation and Validation
45
+2. Baremetal and Network Deployment
46
+3. Software Deployment
47
+
48
+During the Baremetal and Network Deployment stage, the deploy_site or
49
+update_site workflow (and perhaps other workflows in the future) invokes
50
+Drydock to verify the site, prepare the site, prepare the nodes, and deploy the
51
+nodes. Each of these steps is described in the `Drydock Orchestrator Readme`_
52
+
53
+.. _Drydock Orchestrator Readme: https://git.openstack.org/cgit/openstack/airship-drydock/plain/drydock_provisioner/orchestrator/readme.md
54
+
55
+The prepare nodes and deploy nodes steps each involve intensive and potentially
56
+time consuming operations on the target nodes, orchestrated by Drydock and
57
+MAAS. These steps need to be approached and managed such that grouping,
58
+ordering, and criticality of success of nodes can be managed in support of
59
+fault tolerant site deployments and updates.
60
+
61
+For the purposes of this document `phase of deployment` refer to the prepare
62
+nodes and deploy nodes steps of the Baremetal and Network deployment.
63
+
64
+Some factors that advise this solution:
65
+
66
+1. Limits to the amount of parallelization that can occur due to a centralized
67
+   MAAS system.
68
+2. Faults in the hardware, preventing operational nodes.
69
+3. Miswiring or configuration of network hardware.
70
+4. Incorrect site design causing a mismatch against the hardware.
71
+5. Criticality of particular nodes to the realization of the site design.
72
+6. Desired configurability within the framework of the UCP declarative site
73
+   design.
74
+7. Improved visibility into the current state of node deployment.
75
+8. A desire to begin the deployment of nodes before the finish of the
76
+   preparation of nodes -- i.e. start deploying nodes as soon as they are ready
77
+   to be deployed. Note: This design will not achieve new forms of
78
+   task parallelization within Drydock; this is recognized as a desired
79
+   functionality.
80
+
81
+Solution
82
+--------
83
+Updates supporting this solution will require changes to Shipyard for changed
84
+workflows and Drydock for the desired node targeting, and for retrieval of
85
+diagnostic and result information.
86
+
87
+.. index::
88
+   single: Shipyard Documents; DeploymentStrategy
89
+
90
+Deployment Strategy Document (Shipyard)
91
+---------------------------------------
92
+To accommodate the needed changes, this design introduces a new
93
+DeploymentStrategy document into the site design to be read and utilized
94
+by the workflows for update_site and deploy_site.
95
+
96
+Groups
97
+~~~~~~
98
+Groups are named sets of nodes that will be deployed together. The fields of a
99
+group are:
100
+
101
+name
102
+  Required. The identifying name of the group.
103
+
104
+critical
105
+  Required. Indicates if this group is required to continue to additional
106
+  phases of deployment.
107
+
108
+depends_on
109
+  Required, may be empty list. Group names that must be successful before this
110
+  group can be processed.
111
+
112
+selectors
113
+  Required, may be empty list. A list of identifying information to indicate
114
+  the nodes that are members of this group.
115
+
116
+success_criteria
117
+  Optional. Criteria that must evaluate to be true before a group is considered
118
+  successfully complete with a phase of deployment.
119
+
120
+Criticality
121
+'''''''''''
122
+- Field: critical
123
+- Valid values: true | false
124
+
125
+Each group is required to indicate true or false for the `critical` field.
126
+This drives the behavior after the deployment of baremetal nodes.  If any
127
+groups that are marked as `critical: true` fail to meet that group's success
128
+criteria, the workflow should halt after the deployment of baremetal nodes. A
129
+group that cannot be processed due to a parent dependency failing will be
130
+considered failed, regardless of the success criteria.
131
+
132
+Dependencies
133
+''''''''''''
134
+- Field: depends_on
135
+- Valid values: [] or a list of group names
136
+
137
+Each group specifies a list of depends_on groups, or an empty list. All
138
+identified groups must complete successfully for the phase of deployment before
139
+the current group is allowed to be processed by the current phase.
140
+
141
+- A failure (based on success criteria) of a group prevents any groups
142
+  dependent upon the failed group from being attempted.
143
+- Circular dependencies will be rejected as invalid during document validation.
144
+- There is no guarantee of ordering among groups that have their dependencies
145
+  met. Any group that is ready for deployment based on declared dependencies
146
+  will execute. Execution of groups is serialized - two groups will not deploy
147
+  at the same time.
148
+
149
+Selectors
150
+'''''''''
151
+- Field: selectors
152
+- Valid values: [] or a list of selectors
153
+
154
+The list of selectors indicate the nodes that will be included in a group.
155
+Each selector has four available filtering values: node_names, node_tags,
156
+node_labels, and rack_names. Each selector is an intersection of this
157
+critera, while the list of selectors is a union of the individual selectors.
158
+
159
+- Omitting a criterion from a selector, or using empty list means that criterion
160
+  is ignored.
161
+- Having a completely empty list of selectors, or a selector that has no
162
+  criteria specified indicates ALL nodes.
163
+- A collection of selectors that results in no nodes being identified will be
164
+  processed as if 100% of nodes successfully deployed (avoiding division by
165
+  zero), but would fail the minimum or maximum nodes criteria (still counts as
166
+  0 nodes)
167
+- There is no validation against the same node being in multiple groups,
168
+  however the workflow will not resubmit nodes that have already completed or
169
+  failed in this deployment to Drydock twice, since it keeps track of each node
170
+  uniquely. The success or failure of those nodes excluded from submission to
171
+  Drydock will still be used for the success criteria calculation.
172
+
173
+E.g.::
174
+
175
+  selectors:
176
+    - node_names:
177
+        - node01
178
+        - node02
179
+      rack_names:
180
+        - rack01
181
+      node_tags:
182
+        - control
183
+    - node_names:
184
+        - node04
185
+      node_labels:
186
+        - ucp_control_plane: enabled
187
+
188
+Will indicate (not really SQL, just for illustration)::
189
+
190
+    SELECT nodes
191
+    WHERE node_name in ('node01', 'node02')
192
+          AND rack_name in ('rack01')
193
+          AND node_tags in ('control')
194
+    UNION
195
+    SELECT nodes
196
+    WHERE node_name in ('node04')
197
+          AND node_label in ('ucp_control_plane: enabled')
198
+
199
+Success Criteria
200
+''''''''''''''''
201
+- Field: success_criteria
202
+- Valid values: for possible values, see below
203
+
204
+Each group optionally contains success criteria which is used to indicate if
205
+the deployment of that group is successful. The values that may be specified:
206
+
207
+percent_successful_nodes
208
+  The calculated success rate of nodes completing the deployment phase.
209
+
210
+  E.g.: 75 would mean that 3 of 4 nodes must complete the phase successfully.
211
+
212
+  This is useful for groups that have larger numbers of nodes, and do not
213
+  have critical minimums or are not sensitive to an arbitrary number of nodes
214
+  not working.
215
+
216
+minimum_successful_nodes
217
+  An integer indicating how many nodes must complete the phase to be considered
218
+  successful.
219
+
220
+maximum_failed_nodes
221
+  An integer indicating a number of nodes that are allowed to have failed the
222
+  deployment phase and still consider that group successful.
223
+
224
+When no criteria are specified, it means that no checks are done - processing
225
+continues as if nothing is wrong.
226
+
227
+When more than one criterion is specified, each is evaluated separately - if
228
+any fail, the group is considered failed.
229
+
230
+
231
+Example Deployment Strategy Document
232
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
233
+This example shows a deployment strategy with 5 groups: control-nodes,
234
+compute-nodes-1, compute-nodes-2, monitoring-nodes, and ntp-node.
235
+
236
+::
237
+
238
+  ---
239
+  schema: shipyard/DeploymentStrategy/v1
240
+  metadata:
241
+    schema: metadata/Document/v1
242
+    name: deployment-strategy
243
+    layeringDefinition:
244
+        abstract: false
245
+        layer: global
246
+    storagePolicy: cleartext
247
+  data:
248
+    groups:
249
+      - name: control-nodes
250
+        critical: true
251
+        depends_on:
252
+          - ntp-node
253
+        selectors:
254
+          - node_names: []
255
+            node_labels: []
256
+            node_tags:
257
+              - control
258
+            rack_names:
259
+              - rack03
260
+        success_criteria:
261
+          percent_successful_nodes: 90
262
+          minimum_successful_nodes: 3
263
+          maximum_failed_nodes: 1
264
+      - name: compute-nodes-1
265
+        critical: false
266
+        depends_on:
267
+          - control-nodes
268
+        selectors:
269
+          - node_names: []
270
+            node_labels: []
271
+            rack_names:
272
+              - rack01
273
+            node_tags:
274
+              - compute
275
+        success_criteria:
276
+          percent_successful_nodes: 50
277
+      - name: compute-nodes-2
278
+        critical: false
279
+        depends_on:
280
+          - control-nodes
281
+        selectors:
282
+          - node_names: []
283
+            node_labels: []
284
+            rack_names:
285
+              - rack02
286
+            node_tags:
287
+              - compute
288
+        success_criteria:
289
+          percent_successful_nodes: 50
290
+      - name: monitoring-nodes
291
+        critical: false
292
+        depends_on: []
293
+        selectors:
294
+          - node_names: []
295
+            node_labels: []
296
+            node_tags:
297
+              - monitoring
298
+            rack_names:
299
+              - rack03
300
+              - rack02
301
+              - rack01
302
+      - name: ntp-node
303
+        critical: true
304
+        depends_on: []
305
+        selectors:
306
+          - node_names:
307
+              - ntp01
308
+            node_labels: []
309
+            node_tags: []
310
+            rack_names: []
311
+        success_criteria:
312
+          minimum_successful_nodes: 1
313
+
314
+The ordering of groups, as defined by the dependencies (``depends-on``
315
+fields)::
316
+
317
+   __________     __________________
318
+  | ntp-node |   | monitoring-nodes |
319
+   ----------     ------------------
320
+       |
321
+   ____V__________
322
+  | control-nodes |
323
+   ---------------
324
+       |_________________________
325
+           |                     |
326
+     ______V__________     ______V__________
327
+    | compute-nodes-1 |   | compute-nodes-2 |
328
+     -----------------     -----------------
329
+
330
+Given this, the order of execution could be:
331
+
332
+- ntp-node > monitoring-nodes > control-nodes > compute-nodes-1 > compute-nodes-2
333
+- ntp-node > control-nodes > compute-nodes-2 > compute-nodes-1 > monitoring-nodes
334
+- monitoring-nodes > ntp-node > control-nodes > compute-nodes-1 > compute-nodes-2
335
+- and many more ... the only guarantee is that ntp-node will run some time
336
+  before control-nodes, which will run sometime before both of the
337
+  compute-nodes. Monitoring-nodes can run at any time.
338
+
339
+Also of note are the various combinations of selectors and the varied use of
340
+success criteria.
341
+
342
+Deployment Configuration Document (Shipyard)
343
+--------------------------------------------
344
+The existing deployment-configuration document that is used by the workflows
345
+will also be modified to use the existing deployment_strategy field to provide
346
+the name of the deployment-straegy document that will be used.
347
+
348
+The default value for the name of the DeploymentStrategy document will be
349
+``deployment-strategy``.
350
+
351
+Drydock Changes
352
+---------------
353
+
354
+API and CLI
355
+~~~~~~~~~~~
356
+- A new API needs to be provided that accepts a node filter (i.e. selector,
357
+  above) and returns a list of node names that result from analysis of the
358
+  design. Input to this API will also need to include a design reference.
359
+
360
+- Drydock needs to provide a "tree" output of tasks rooted at the requested
361
+  parent task. This will provide the needed success/failure status for nodes
362
+  that have been prepared/deployed.
363
+
364
+Documentation
365
+~~~~~~~~~~~~~
366
+Drydock documentation will be updated to match the introduction of new APIs
367
+
368
+
369
+Shipyard Changes
370
+----------------
371
+
372
+API and CLI
373
+~~~~~~~~~~~
374
+- The commit configdocs api will need to be enhanced to look up the
375
+  DeploymentStrategy by using the DeploymentConfiguration.
376
+- The DeploymentStrategy document will need to be validated to ensure there are
377
+  no circular dependencies in the groups' declared dependencies (perhaps
378
+  NetworkX_).
379
+- A new API endpoint (and matching CLI) is desired to retrieve the status of
380
+  nodes as known to Drydock/MAAS and their MAAS status. The existing node list
381
+  API in Drydock provides a JSON output that can be utilized for this purpose.
382
+
383
+Workflow
384
+~~~~~~~~
385
+The deploy_site and update_site workflows will be modified to utilize the
386
+DeploymentStrategy.
387
+
388
+- The deployment configuration step will be enhanced to also read the
389
+  deployment strategy and pass the information on a new xcom for use by the
390
+  baremetal nodes step (see below)
391
+- The prepare nodes and deploy nodes steps will be combined to perform both as
392
+  part of the resolution of an overall ``baremetal nodes`` step.
393
+  The baremetal nodes step will introduce functionality that reads in the
394
+  deployment strategy (from the prior xcom), and can orchestrate the calls to
395
+  Drydock to enact the grouping, ordering and and success evaluation.
396
+  Note that Drydock will serialize tasks; there is no parallelization of
397
+  prepare/deploy at this time.
398
+
399
+Needed Functionality
400
+''''''''''''''''''''
401
+
402
+- function to formulate the ordered groups based on dependencies (perhaps
403
+  NetworkX_)
404
+- function to evaluate success/failure against the success criteria for a group
405
+  based on the result list of succeeded or failed nodes.
406
+- function to mark groups as success or failure (including failed due to
407
+  dependency failure), as well as keep track of the (if any) successful and
408
+  failed nodes.
409
+- function to get a group that is ready to execute, or 'Done' when all groups
410
+  are either complete or failed.
411
+- function to formulate the node filter for Drydock based on a group's
412
+  selectors
413
+- function to orchestrate processing groups, moving to the next group (or being
414
+  done) when a prior group completes or fails.
415
+- function to summarize the success/failed nodes for a group (primarily for
416
+  reporting to the logs at this time).
417
+
418
+Process
419
+'''''''
420
+The baremetal nodes step (preparation and deployment of nodes) will proceed as
421
+follows:
422
+
423
+1. Each group's selector will be sent to Drydock to determine the list of
424
+   nodes that are a part of that group.
425
+
426
+   - An overall status will be kept for each unique node (not started |
427
+     prepared | success | failure).
428
+   - When sending a task to Drydock for processing, the nodes associated with
429
+     that group will be sent as a simple `node_name` node filter. This will
430
+     allow for this list to exclude nodes that have a status that is not
431
+     congruent for the task being performed.
432
+
433
+     - prepare nodes valid status: not started
434
+     - deploy nodes valid status: prepared
435
+
436
+2. In a processing loop, groups that are ready to be processed based on their
437
+   dependencies (and the success criteria of groups they are dependent upon)
438
+   will be selected for processing until there are no more groups that can be
439
+   processed. The processing will consist of preparing and then deploying the
440
+   group.
441
+
442
+   - The selected group will be prepared and then deployed before selecting
443
+     another group for processing.
444
+   - Any nodes that failed as part of that group will be excluded from
445
+     subsequent deployment or preparation of that node for this deployment.
446
+
447
+     - Excluding nodes that are already processed addresses groups that have
448
+       overlapping lists of nodes due to the group's selectors, and prevents
449
+       sending them to Drydock for re-processing.
450
+     - Evaluation of the success criteria will use the full set of nodes
451
+       identified by the selector. This means that if a node was previously
452
+       successfully deployed, that same node will count as "successful" when
453
+       evaluating the success criteria.
454
+
455
+   - The success criteria will be evaluated after the group's prepare step and
456
+     the deploy step. A failure to meet the success criteria in a prepare step
457
+     will cause the deploy step for that group to be skipped (and marked as
458
+     failed).
459
+   - Any nodes that fail during the prepare step, will not be used in the
460
+     corresponding deploy step.
461
+   - Upon completion (success, partial success, or failure) of a prepare step,
462
+     the nodes that were sent for preparation will be marked in the unique list
463
+     of nodes (above) with their appropriate status: prepared or failure
464
+   - Upon completion of a group's deployment step, the nodes status will be
465
+     updated to their current status: success or failure.
466
+
467
+4. Before the end of the baremetal nodes step, following all eligible group
468
+   processing, a report will be logged to indicate the success/failure of
469
+   groups and the status of the individual nodes. Note that it is possible for
470
+   individual nodes to be left in `not started` state if they were only part of
471
+   groups that were never allowed to process due to dependencies and success
472
+   criteria.
473
+
474
+5. At the end of the baremetal nodes step, if any nodes that have failed
475
+   due to timeout, dependency failure, or success criteria failure and are
476
+   marked as critical will trigger an Airflow Exception, resulting in a failed
477
+   deployment.
478
+
479
+Notes:
480
+
481
+- The timeout values specified for the prepare nodes and deploy nodes steps
482
+  will be used to put bounds on the individual calls to Drydock. A failure
483
+  based on these values will be treated as a failure for the group; we need to
484
+  be vigilant on if this will lead to indeterminate states for nodes that mess
485
+  with further processing. (e.g. Timed out, but the requested work still
486
+  continued to completion)
487
+
488
+Example Processing
489
+''''''''''''''''''
490
+Using the defined deployment strategy in the above example, the following is
491
+an example of how it may process::
492
+
493
+  Start
494
+  |
495
+  | prepare ntp-node           <SUCCESS>
496
+  | deploy ntp-node            <SUCCESS>
497
+  V
498
+  | prepare control-nodes      <SUCCESS>
499
+  | deploy control-nodes       <SUCCESS>
500
+  V
501
+  | prepare monitoring-nodes   <SUCCESS>
502
+  | deploy monitoring-nodes    <SUCCESS>
503
+  V
504
+  | prepare compute-nodes-2    <SUCCESS>
505
+  | deploy compute-nodes-2     <SUCCESS>
506
+  V
507
+  | prepare compute-nodes-1    <SUCCESS>
508
+  | deploy compute-nodes-1     <SUCCESS>
509
+  |
510
+  Finish (success)
511
+
512
+If there were a failure in preparing the ntp-node, the following would be the
513
+result::
514
+
515
+  Start
516
+  |
517
+  | prepare ntp-node           <FAILED>
518
+  | deploy ntp-node            <FAILED, due to prepare failure>
519
+  V
520
+  | prepare control-nodes      <FAILED, due to dependency>
521
+  | deploy control-nodes       <FAILED, due to dependency>
522
+  V
523
+  | prepare monitoring-nodes   <SUCCESS>
524
+  | deploy monitoring-nodes    <SUCCESS>
525
+  V
526
+  | prepare compute-nodes-2    <FAILED, due to dependency>
527
+  | deploy compute-nodes-2     <FAILED, due to dependency>
528
+  V
529
+  | prepare compute-nodes-1    <FAILED, due to dependency>
530
+  | deploy compute-nodes-1     <FAILED, due to dependency>
531
+  |
532
+  Finish (failed due to critical group failed)
533
+
534
+If a failure occurred during the deploy of compute-nodes-2, the following would
535
+result::
536
+
537
+  Start
538
+  |
539
+  | prepare ntp-node           <SUCCESS>
540
+  | deploy ntp-node            <SUCCESS>
541
+  V
542
+  | prepare control-nodes      <SUCCESS>
543
+  | deploy control-nodes       <SUCCESS>
544
+  V
545
+  | prepare monitoring-nodes   <SUCCESS>
546
+  | deploy monitoring-nodes    <SUCCESS>
547
+  V
548
+  | prepare compute-nodes-2    <SUCCESS>
549
+  | deploy compute-nodes-2     <FAILED>
550
+  V
551
+  | prepare compute-nodes-1    <SUCCESS>
552
+  | deploy compute-nodes-1     <SUCCESS>
553
+  |
554
+  Finish (success with some nodes/groups failed)
555
+
556
+Schemas
557
+~~~~~~~
558
+A new schema will need to be provided by Shipyard to validate the
559
+DeploymentStrategy document.
560
+
561
+Documentation
562
+~~~~~~~~~~~~~
563
+The Shipyard action documentation will need to include details defining the
564
+DeploymentStrategy document (mostly as defined here), as well as the update to
565
+the DeploymentConfiguration document to contain the name of the
566
+DeploymentStrategy document.
567
+
568
+
569
+.. _NetworkX: https://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.algorithms.dag.topological_sort.html

Loading…
Cancel
Save