Browse Source

Adding spec: Ansible bootstrap deployment

Proposing specification on how the bootstrap and configuration of the
initial host can be orchestrated by an Ansible playbook.

Story: 2004695

Change-Id: I895768eae975f2b6a880e82db2c0d9e452f8099c
Signed-off-by: Tee Ngo <tee.ngo@windriver.com>
Tee Ngo 3 months ago
parent
commit
a76f381204

+ 508
- 0
specs/2019.03/approved/deployment-improvements-2004695-ansible-bootstrap-deployment.rst View File

@@ -0,0 +1,508 @@
1
+..
2
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
3
+ License.
4
+
5
+ http://creativecommons.org/licenses/by/3.0/legalcode
6
+
7
+
8
+============================
9
+Ansible Bootstrap Deployment
10
+============================
11
+
12
+Storyboard: https://storyboard.openstack.org/#!/story/2004695.
13
+
14
+This spec describes the initial phase of StarlingX deployment improvement
15
+effort.
16
+
17
+Problem description
18
+===================
19
+
20
+The primary controller is currently configured using the ``config_controller``
21
+Python script which can only be executed on the controller console. The script
22
+requires input for many networking aspects upfront in order to run both
23
+bootstrap operations and host configuration to completion. Over time, the
24
+script logic has grown overly complex to accommodate a plethora of host
25
+configuration scenarios and so has increased the configuration time.
26
+
27
+Furthermore, once all required input configuration parameters have been
28
+successfully validated, the script will run all its steps. If the script fails
29
+due to a software issue or a configuration mistake, a re-install will be
30
+required. It is not possible for the user to apply a software patch and/or
31
+rerun the script to apply updated configurations.
32
+
33
+Use Cases
34
+=========
35
+
36
+* As a developer/tester/operator, I need the ability to configure the
37
+  controller remotely.
38
+* As a developer/tester/operator, I need to the ability to modify and
39
+  reapply configurations during initial host config.
40
+* As a developer/tester/operator, I need the ability to automate the
41
+  initial host deployment and build out my system from there.
42
+* As a developer of StarlingX community, I would like to streamline
43
+  the initial host config using an industry adopted tool to enable
44
+  automation and to promote process/code visibility and customization.
45
+
46
+Proposed change
47
+===============
48
+
49
+Existing workflow with config_controller (high level)
50
+-----------------------------------------------------
51
+**Config_controller:**
52
+
53
+1. Create bootstrap hiera config
54
+2. Apply bootstrap puppet manifest
55
+3. Persist local configuration
56
+4. Populate initial system inventory
57
+5. Create system hiera config
58
+6. Apply controller puppet manifest
59
+7. Finalize controller configuration
60
+8. Activate all services
61
+
62
+**Host-configuration:**
63
+
64
+   Manual or scripted configurations required for unlock.
65
+
66
+**Host-unlock:**
67
+
68
+1. Apply controller puppet manifest (and worker, storage puppet manifests
69
+   for All-in-one)
70
+2. Activate all services
71
+
72
+Proposed workflow with Ansible Playbook (high level)
73
+----------------------------------------------------
74
+The bootstrap and configuration of the initial host will be orchestrated
75
+by an Ansible Playbook [1]_.
76
+
77
+**Playbook:**
78
+
79
+1. Apply bootstrap puppet manifest
80
+2. Populate system configuration (with defaults and user-supplied config)
81
+3. Bring up Kubernetes master node and essential services
82
+
83
+**Host-configuration:**
84
+
85
+   Manual or scripted configurations required for unlock.
86
+
87
+**Host-unlock**
88
+
89
+1. Apply controller puppet manifest (and worker, storage puppet manifests
90
+   for All-in-one)
91
+2. Activate all services
92
+
93
+After phase #2 of the Playbook, the host configuration will resemble
94
+All-in-one simplex (i.e. defaulting to the loopback interface) until it
95
+is unlocked for the first time. Interface configuration is being deferred
96
+to ensure the network connection is not interrupted while the playbook is
97
+being *played*. Interface reconfiguration will only take effect on unlock
98
+operations. Previously, this would occur as part of the controller
99
+manifest apply which has been eliminated.
100
+
101
+Scope of the new workflow
102
+-------------------------
103
+The new workflow will cover the **initial config** for all supported system
104
+configurations in a containerized platform.
105
+
106
+Bootstrap playbook roles and tasks (high level)
107
+-----------------------------------------------
108
+Below is a list of major roles and tasks. The names are deliberately long
109
+to make them self-explanatory for review purpose. They can be renamed to
110
+be more terse as role variables should be prefixed with role names.
111
+During implementation, some roles and tasks will likely be decomposed or
112
+combined.
113
+
114
+Role: validate-config-input
115
+   * Task: validate-config
116
+Role: prepare-environment-for-execution
117
+   * Task: validate-environment
118
+   * Task: set-environment-variables
119
+Role: cleanup-environment-after-execution
120
+   * Task: unset-environment-variables
121
+   * Task: remove-temp-files
122
+Role: store-admin-password
123
+   * Task: validate-password
124
+   * Task: store-password
125
+Role: apply-bootstrap-manifest
126
+   * Task: generate-bootstrap-data
127
+   * Task: apply-manifest
128
+Role: populate-initial-config
129
+   * Task: persist-keyring
130
+   * Task: set-permanent-puppet-workdir
131
+   * Task: set-permanent-pxe-configdir
132
+   * Task: set-postgres-config-for-mate
133
+   * Task: process-branding-and-banner
134
+   * Task: populate-system-config
135
+   * Task: populate-load-config
136
+   * Task: populate-network-config
137
+   * Task: populate-controller-config
138
+   * Task: create-loopback-interface
139
+   * Task: update-local-dns
140
+   * Task: update-platform-config-file
141
+   * Task: add-dns-server
142
+Role: bring-up-kubernetes-master-and-dependent-services
143
+   * Task: bring-up-kubernetes-master
144
+   * Task: bring-up-tiller
145
+   * Task: bring-up-fault-management
146
+   * Task: bring-up-maintenance
147
+   * Task: bring-up-vim
148
+
149
+Playbook directory layout
150
+-------------------------
151
+The directory layout of the playbook initially could be as follows:
152
+
153
+bootstrap.yml
154
+
155
+roles/
156
+  validate-config-input/
157
+    tasks/
158
+      main.yml
159
+    handlers/
160
+      main.yml
161
+    files/
162
+      <scripts, files>
163
+    vars/
164
+      main.yml
165
+    defaults/
166
+      main.yml
167
+    meta/
168
+      main.yml
169
+
170
+  prepare-environment-for-execution/
171
+
172
+  cleanup-environment-after-execution/
173
+
174
+  store-admin-password/
175
+
176
+  apply-bootstrap-manifest/
177
+
178
+  popupate-initial-config/
179
+
180
+  bring-up-Kubernetes-master-and-dependent-services/
181
+
182
+Playbook pre_tasks and post_tasks
183
+---------------------------------
184
+The pre_tasks and post_tasks can be as simple as marking the start and end
185
+of the playbook execution.
186
+
187
+Running ``bootstrap playbook``
188
+------------------------------
189
+ansible-playbook bootstrap.yml -u <named-account-with-sudo-privileges>
190
+[-K -i <config-input-file> -e <list-of-variable-value-pairs-to-overwrite>
191
+--ask-vault-password]
192
+
193
+The playbook should be run using wrsroot account. However, it can be run using
194
+another account with sudo privileges if desired provided that the account has
195
+already been setup beforehand. Many playbook tasks must be run as root.
196
+The option -K will prompt for privilege escalation password.
197
+
198
+Overwriting playbook defaults
199
+-----------------------------
200
+The ``bootstrap playbook`` will come with default variables and Ansible
201
+hosts file /etc/ansible/hosts.yml. These defaults and content of the hosts
202
+file are meant for running the playbook locally and bootstrapping the initial
203
+controller for All-in-one simplex in virtual box. In practice, some of these
204
+defaults will need to be overwritten with user supplied values.
205
+
206
+Variables that usually require overwriting are:
207
+
208
+* host IP (for running the playbook remotely)
209
+* system properties
210
+* Management, OAM, PXE, cluster subnets
211
+* Default DNS server
212
+
213
+There are various ways to overwrite variables in Ansible Playbook.
214
+
215
+**Overwrite with configuration input file**
216
+
217
+One simple and clean option is to overwrite with -i command line parameter.
218
+The content of the provided configuration input file must be in YAML format.
219
+
220
+The default hosts (Ansible inventory) file will have the following entries:
221
+
222
+bootstrap:
223
+  hosts:
224
+    local:
225
+      ansible_connection: local
226
+
227
+  vars:
228
+    ansible_user: wrsroot
229
+    ansible_become: true
230
+
231
+To overwrite the bootstrap host for remote execution and/or user in the custom
232
+configuration input file:
233
+
234
+bootstrap:
235
+  hosts:
236
+    remote:
237
+      ansible_host: '128.224.150.83'
238
+      ansible_connection: ssh
239
+
240
+  vars:
241
+    ansible_user: wrsroot
242
+    ansible_become: true
243
+
244
+To overwrite the role default variables, one option is to add the list of of
245
+overwritten variables under ``vars`` section of the configuration input file:
246
+
247
+  vars:
248
+    system_mode: duplex-direct
249
+    dns_server: 8.8.8.8
250
+
251
+**Overwrite with role vars**
252
+
253
+Another option to overwrite role defaults is to replace main.yml file under
254
+``vars`` directory of the corresponding role(s) with custom one(s) before
255
+running the playbook. This takes precedence over the overwriting method above.
256
+
257
+**Overwrite with extra vars**
258
+
259
+Command line -e option which has the highest precedence can also be used
260
+to overwrite defaults. However, this method can be cumbersome if many
261
+defaults need overwriting and the playbook is run manually.
262
+
263
+The list of role defaults as well as the preferred method to overwrite
264
+these defaults will be documented after the playbook has been developed.
265
+
266
+Overwriting sensitive variables
267
+-------------------------------
268
+The admin password is a sensitive variable that usually needs to be
269
+overwritten. To ensure sensitive information is encrypted, sensitive
270
+variables and values are copied to a vault file and secure using
271
+ansible-vault encrypt command. The corresponding defaults will need to be
272
+mapped to the variables in vaulted file using jinja2 syntax.
273
+
274
+The command line argument --ask-vault-pass or --vault-password-file will need
275
+to be supplied when running the playbook with encrypted vault file.
276
+
277
+For development/test purposes, these variables can simply be overwritten
278
+using the command line -e option.
279
+
280
+Validating configuration parameters
281
+-----------------------------------
282
+The config_controller script has extensive logic to validate config
283
+parameters in user input file which could be leveraged in
284
+validate-config-input role of the ``bootstrap playbook``.
285
+
286
+Config_controller script changes
287
+--------------------------------
288
+Currently this complex script has multiple uses: a) perform initial
289
+configuration required mainly to bring up the controller services,
290
+b) backup system configuration, c) restore system configuration from
291
+backup file, d) clone the image, and e) restore the system from a clone.
292
+
293
+The proposed Ansible bootstrap deployment will replace the initial system
294
+configuration aspect of the script. The script will continue to be used for
295
+other operations. Relevant code will be removed from the script once the
296
+implementation of the playbook is complete.
297
+
298
+Puppet changes
299
+--------------
300
+The initial ``bootstrap playbook`` will leverage the existing Puppet
301
+bootstrap.pp manifest to bring up the following services that will be
302
+used by the playbook for the remaining tasks:
303
+
304
+**Required services to bring up Kubernetes master:**
305
+
306
+* docker
307
+* etcd
308
+
309
+**Required services for host unlock:**
310
+
311
+* fm
312
+* mtcAgent
313
+* nfv-vim
314
+
315
+The puppet .pp and in some cases .py files related to these services and
316
+Kubernetes will require update.
317
+
318
+Sysinv changes
319
+--------------
320
+Traditionally, the ``config_controller`` script is provided with all
321
+required parameters either interactively or via a config file to perform
322
+both bootstrap operations and host configuration. Networking and storage
323
+provisioning using system commands beyond this point have certain
324
+restrictions as the controller manifest has been applied.
325
+
326
+With Ansible bootstrap deployment method, some system commands will
327
+require changes to support manual configuration adjustments and replays of
328
+the ``bootstrap playbook``. The ``cgtsclient`` will also need minor
329
+modification to avoid requesting for smapi endpoint which is not yet
330
+available in this early stage.
331
+
332
+Maintenance changes
333
+-------------------
334
+Some minor tweaks to maintenance code will be required for maintenance
335
+Client and Agent to operate properly during the bootstrap phase.
336
+
337
+Packaging of ``bootstrap playbook`` in the ISO and SDK
338
+------------------------------------------------------
339
+The playbook will be packaged in the ISO as well as SDK to allow
340
+both local and remote execution.
341
+
342
+Alternatives
343
+============
344
+
345
+Additional host configuration roles to support the initial host-unlock
346
+were considered. However, this would add much of the complex modeling of
347
+input configuration (i.e. more upfront planning) to the intial deployment step.
348
+
349
+Data model impact
350
+=================
351
+
352
+No impact to existing system inventory data model.
353
+
354
+REST API impact
355
+===============
356
+
357
+At this time, no REST API impact is anticipated.
358
+
359
+Security impact
360
+===============
361
+
362
+The proposal is to make use of Ansible Playbook which is a well adopted
363
+multi-node configuration and deployment orchestration tool partly due to
364
+Ansible secure architecture and design.
365
+
366
+The scope of the proposed ``bootstrap playbook`` is limited to bringing the
367
+initial controller to the state where it can be unlocked and allow other
368
+Kubernetes nodes on an internal cluster network if configured to join.
369
+
370
+The Playbook can only be executed remotely over SSH using a named account
371
+with sudo privileges. Ansible vault will be used to store secrets/private
372
+information where applicable. As such, no additional security impact is
373
+introduced.
374
+
375
+Other end user impact
376
+=====================
377
+
378
+The user will be expected to interact with the feature using
379
+ansible-playbook [2]_ and ansible-vault [3]_ commands. The bootstrap deployment
380
+method will give the user more flexibility to customize and automate
381
+the deployment.
382
+
383
+Once the initial controller is ready to accept system commands and
384
+Kubernetes master is up, the user can:
385
+* perform minimum host configurations and unlock the host
386
+* join other Kubernetes nodes and perform more extensive custom
387
+configurations before the unlock
388
+
389
+The playbook can be replayed to update system properties and general
390
+networking information. It will not be playable after the host is unlocked.
391
+
392
+Performance Impact
393
+==================
394
+
395
+Ansible execution overhead is unknown at this time. However, as the
396
+controller manifest application and services activation steps are deferred
397
+till host-unlock, the time to bring the controller to unlock-ready state
398
+should be significantly faster than with the traditional method.
399
+
400
+Other deployer impact
401
+=====================
402
+
403
+None
404
+
405
+Developer impact
406
+================
407
+
408
+See end user impact.
409
+
410
+The developers can extend the ``bootstrap playbook`` with custom host
411
+configuration role(s) or another playbook to suit their specific needs.
412
+
413
+Upgrade impact
414
+==============
415
+
416
+None as this is the initial release of Bootstrap Deployment using
417
+Ansible Playbook.
418
+
419
+Implementation
420
+==============
421
+
422
+Assignee(s)
423
+===========
424
+
425
+Primary assignee:
426
+
427
+* Tee Ngo (teewrs)
428
+
429
+Other contributors:
430
+
431
+* Eric McDonald (emacdona)
432
+
433
+Repos Impacted
434
+==============
435
+
436
+* stx-config
437
+* stx-metal
438
+* stx-root
439
+* stx-docs
440
+
441
+Work Items
442
+==========
443
+
444
+* Modify maintenance to enable maintenance operations during bootstrap
445
+  phase.
446
+* Modify sysinv and cgtsclient to be more flexible with configuration
447
+  updates during bootstrap deployment using either system commands or APIs.
448
+* Modify puppet classes and python scripts to allow launching a limited
449
+  number of services required for bootstrap operations and initial host
450
+  unlock.
451
+* Create a ``bootstrap`` Playbook to bring up Kubernetes master node and
452
+  configure the primary controller based on default and user-supplied config
453
+  parameters.
454
+* Package the Playbook as part of the ISO & SDK to allow both on premise
455
+  and remote execution.
456
+* Make other necessary changes to support primary controller configuration
457
+  using either the playbook or traditional config_controller until the
458
+  transition is complete. This includes lab setup tool changes.
459
+
460
+
461
+Dependencies
462
+============
463
+
464
+* config_controller script
465
+* Ansible [4]_
466
+* Containerized OpenStack based deployment
467
+
468
+Testing
469
+=======
470
+
471
+This story changes the way StarlingX system is deployed, specifically
472
+how the primary controller is configured, which will require changes in
473
+existing automated installation and lab setup tools.
474
+
475
+The system deployment tests will be limited to All-in-one simplex,
476
+All-in-one duplex, and Standard configurations. Deployment tests for
477
+Region and Distributed Cloud configurations are deferred until the support
478
+for these configurations in a containerized OpenStack based platform is
479
+available. At which point, either the ``bootstrap playbook`` will be
480
+extended with additional roles or with new playbook(s) to process steps in
481
+``config_region`` and ``config_subcloud``. This will be documented either
482
+in a later version of this spec or in a separate spec.
483
+
484
+Documentation Impact
485
+====================
486
+
487
+This story affects the StarlingX installation and configuration
488
+documentation. Specific details of the documentation changes will be
489
+addressed once the implementation is complete.
490
+
491
+References
492
+==========
493
+
494
+.. [1]  https://docs.ansible.com/ansible/2.7/user_guide/playbooks.html
495
+.. [2]  https://docs.ansible.com/ansible/2.7/cli/ansible-playbook.html
496
+.. [3]  https://docs.ansible.com/ansible/2.7/cli/ansible-vault.html
497
+.. [4]  https://docs.ansible.com/ansible/2.7/index.html
498
+
499
+History
500
+=======
501
+
502
+.. list-table:: Revisions
503
+   :header-rows: 1
504
+
505
+   * - Release Name
506
+     - Description
507
+   * - TBD
508
+     - Introduced

Loading…
Cancel
Save