Browse Source

Create Spec: StarlingX - Distributed Cloud - Synchronized Keystone

As agreed upon within Edge-Computing meetings, this
specification proposes an additional Identity solution for the
Edge Reference Architecture; i.e. a 'Synchronized Keystone'
solution.  This solution addresses Edge-Computing Use Cases
where full autonomy is required on network connectivity loss
but without the overhead of running an Identity Provider (IDP)
presence at each Edge Cloud site.

Change-Id: Ie60c324e01c23b262336ce24c481e359c5bd61d7
Signed-off-by: Greg Waines <greg.waines@windriver.com>
changes/53/619053/5
Greg Waines 11 months ago
parent
commit
1ed103250b
1 changed files with 545 additions and 0 deletions
  1. 545
    0
      specs/2019.03/approved/distcloud-2002842-synchronizedKeystone.rst

+ 545
- 0
specs/2019.03/approved/distcloud-2002842-synchronizedKeystone.rst View File

@@ -0,0 +1,545 @@
1
+..
2
+  This work is licensed under a Creative Commons Attribution 3.0 Unported
3
+  License. http://creativecommons.org/licenses/by/3.0/legalcode
4
+
5
+..
6
+  Many thanks to the OpenStack Nova team for the Example Spec that formed the
7
+  basis for this document.
8
+
9
+=========================================
10
+Distributed Cloud - Synchronized Keystone
11
+=========================================
12
+
13
+| Storyboard:  https://storyboard.openstack.org/#!/story/2002842
14
+| ( Distributed Cloud Keystone Scalability )
15
+|
16
+
17
+The OpenStack Edge-Computing group has defined an Edge Reference Architecture.
18
+For Identity Management, it uses Federated Keystone to manage Identity across
19
+all Edge Clouds.  If 'full autonomy' is required at Edge Clouds, this requires
20
+a Distributed Identity Provider Solution with an Identity Provider (IDP)
21
+presence at every Edge Cloud.
22
+
23
+The Federated Keystone solution makes sense where:
24
+
25
+* Integration with an existing IDP infrastructure is already required,
26
+* In large deployments that would benefit from distributed IDP solutions,
27
+* Where partial autonomy is acceptable in the presence of edge cloud isolation
28
+  or
29
+* The cost of hosting an IDP presence at every Edge Cloud is acceptable for
30
+  full autonomy.
31
+
32
+The OpenStack Edge-Computing group recognizes that there is more than a
33
+'one-size-fits-all' architecture for the Edge.  As agreed upon within
34
+the OpenStack Edge-Computing meetings, this specification proposes an
35
+additional Identity solution for the Edge Reference Architecture; i.e. a
36
+'Synchronized Keystone' solution.  In the Synchronized Keystone solution,
37
+a Synchronization Framework synchronizes the Identity Resources of a Central
38
+Cloud to all of the Edge Clouds.
39
+
40
+Synchronized Keystone provides an Identity solution for the edge where :
41
+
42
+* a simpler standalone Identity solution can be used for the edge cloud
43
+  deployments, and
44
+* the edge cloud sites are compute-power-limited deployments, e.g. small
45
+  All-In-One (AIO) simplex / duplex servers, where the cost of hosting
46
+  an IDP presence in support of full autonomy is too high.
47
+
48
+Problem description
49
+===================
50
+
51
+In a distributed edge cloud environment, with 100s or 1000s of edge cloud
52
+sites, the centralized orchestration of cloud services across all the edge
53
+cloud sites is imperative for operational usability.  This specification
54
+deals specifically with the centralized orchestration of the Identity Cloud
55
+Service across all the edge cloud sites.
56
+
57
+For the Identity Cloud Service, in a distributed edge cloud environment, it is
58
+desired to support the same set of Users and Projects across all edge clouds.
59
+I.e. At any edge cloud, be able to login with the same User name and Project
60
+name, using the same authentication credentials and getting the same
61
+authorization capabilities and roles.
62
+
63
+Note that for some use cases, network connectivity between the edge cloud and
64
+the central cloud is not reliable.  The Identity Cloud Service at the edge
65
+cloud must be fully autonomous in the event of network connectivity loss to
66
+the central cloud.  I.e. both Service Users as well as Tenant Users must
67
+continue to be able to authenticate and be authorized when the edge cloud is
68
+isolated from the central cloud.
69
+
70
+This specification also enables an optimization for orchestration scalability
71
+in the distributed edge cloud environment.  The orchestration of services
72
+across all edge clouds requires authentication, typically of the same user,
73
+across 100s/1000s of edge clouds.  With the Identity Service's Users and
74
+Projects now synchronized across all edge clouds, then by additionally
75
+synchronizing Fernet Keys across all edge clouds, an authenticated Fernet
76
+Token generated at the Central Cloud can be used at any or all edge clouds;
77
+reducing the 100s or 1000s of authentication operations to a single
78
+authentication.
79
+
80
+Use Cases
81
+=========
82
+
83
+The requirement for common Identity Users and Projects across all edge clouds
84
+applies to all Edge Computing Use Cases.
85
+
86
+The Use Cases that require full autonomy of edge clouds (in the event of edge
87
+cloud isolation) are Use Cases where:
88
+
89
+* There are both
90
+
91
+  * Remote Physical users (at a central cloud site) and
92
+  * Local physical users (at edge cloud sites).
93
+
94
+* All 'userids' are centrally managed for security reasons,
95
+* At the edge cloud site,
96
+
97
+  * When connectivity to central cloud is lost
98
+
99
+    * local edge users must be able to manage their edge cloud and workloads on
100
+      the edge cloud,
101
+    * ... using their normal userid credentials.
102
+
103
+Examples of such Use Cases are:
104
+
105
+* Management of Retail Chains (e.g. Walmart)
106
+* Large Hospital Campus
107
+* Large Control Plant
108
+
109
+These are also Use Cases where the simplicity of a standalone Identity solution
110
+for the edge would be desirable.
111
+
112
+Background
113
+==========
114
+
115
+The Distributed Cloud (DC) sub-project within StarlingX, already supports a
116
+Synchronization Framework which is used to synchronize Nova, Neutron, Cinder
117
+and StarlingX resources from the Central Cloud to all of the Edge Clouds.
118
+
119
+This Synchronization Framework provides:
120
+
121
+* Synchronization Request Management
122
+
123
+  * Managing Synchronization Request Message Queues per Edge Cloud,
124
+  * With retry on failure.
125
+
126
+* The Overall Synchronization Audit Sequencing,
127
+* Connectivity Status tracking for Edge Clouds, and
128
+* Synchronization Status tracking for Edge Clouds.
129
+
130
+For the existing framework, each Service being synchronized implements the
131
+following within the Synchronization Framework:
132
+
133
+* an API Proxy
134
+
135
+  * For intercepting Service API calls in order to trigger immediate
136
+    synchronization to Edge Clouds,
137
+
138
+* a DC Orchestration Module
139
+
140
+  * For Service-specific details of Service API Request building and auditing,
141
+  * For managing the mapping of resources in each subcloud to the canonical
142
+    resource in the central cloud, and
143
+  * (in future) for dealing with any API / Schema differences between Central
144
+    Cloud and Edge Cloud (e.g. in Software Upgrade scenario).
145
+
146
+Currently the existing Synchronization Framework supports REST API -based
147
+synchronization of a Service's resources.
148
+
149
+For OpenStack Keystone, a REST API -based synchronization approach will not
150
+work since not all details of Keystone resources are exposed thru Keystone's
151
+REST APIs, e.g.:
152
+
153
+* User-IDs and Project-IDs can NOT be set on POST
154
+  (required to be synchronized so that Fernet Tokens can be used on any/all
155
+  edge clouds)
156
+* Revocation events, generated internally by Keystone to track events that
157
+  affect token validity, are NOT exposed via Keystone REST API,
158
+
159
+Proposed change
160
+===============
161
+
162
+Synchronization Framework Support for Keystone DB-based Synchronization
163
+-----------------------------------------------------------------------
164
+
165
+This specification proposes enhancing the StarlingX's Distributed Cloud's
166
+Synchronization Framework to support DB-based synchronization of a Service's
167
+resources.
168
+
169
+I.e. use the existing Synchronization Framework in order to leverage the
170
+existing retry mechanisms, audit mechanisms, synch status tracking, etc.,
171
+but in this case, the Service Module within the 'DC Orchestration Engine'
172
+would synchronize DB Records by:
173
+
174
+* Directly querying/setting the Services' DB, and
175
+* Using a new (admin-only) StarlingX DC DB SYNC Service and its REST API
176
+  on the StarlingX Edge Cloud which exposes the DB operations remotely
177
+  for synchronization purposes.
178
+
179
+The Service's API Proxy triggers an immediate DB sync of the affected row(s)
180
+of the Service's DB table(s), due to particular API request, while the
181
+Synchronization Framework's Audit Mechanism (default every 10 mins) deals
182
+with non-API events, unexpected events and/or errors to ensure required DB
183
+Table(s) are in-sync.
184
+
185
+The following Keystone resources will be synchronized with this method:
186
+Users, Passwords, Projects, Roles, Role Assignments and Token Revocation
187
+Events.
188
+
189
+Synchronization of Fernet Keys
190
+------------------------------
191
+
192
+This specification also proposes enhancing the StarlingX's Distributed
193
+Cloud's Synchronization Framework to support API-based synchronization of
194
+the Fernet Key Repo.
195
+
196
+New REST APIs for bulk synching of the Fernet Key Repo, updating the Fernet
197
+Key Repo (on rotation of keys) and auditing of the Fernet Key Repo are
198
+added to the STX-CONFIG service.
199
+
200
+The Synchronization Framework will be extended to support Fernet Key Repo
201
+synchronization thru the STX-CONFIG service; adding a Fernet Key Manager to
202
+the STX-CONFIG DC Orchestration Module for managing the Fernet Key Repo
203
+synchronization messaging done by the Synchronization Framework.
204
+
205
+Alternatives
206
+============
207
+
208
+An alternative solution considered for synchronizing keystone would be to use
209
+built-in DB synchronization of open-source DBs used within StarlingX for
210
+the OpenStack Service DBs.  I.e. use the built-in DB Synchronization
211
+capabilities of mariaDB or postgresDB, both of which support replication
212
+of DB Tables from a single R/W Master to multiple ReadOnly Slaves.
213
+
214
+However, the built-in DB synchronization solutions of mariaDB or postgresDB,
215
+do NOT support the ability of handling different DB Schemas in the Central
216
+Cloud and Edge Clouds; i.e. required for Software Upgrade scenarios, or even
217
+just a heterogeneous mix of openstack-versioned edge clouds.
218
+
219
+Data model impact
220
+=================
221
+
222
+There are no DB Model changes required to any Services.
223
+
224
+REST API impact
225
+===============
226
+
227
+Synchronization Framework Support for Keystone DB-based Synchronization
228
+-----------------------------------------------------------------------
229
+
230
+The following REST APIs were added to the STX-DISTCLOUD service to support
231
+DB-based synchronization of Services between the Central Cloud and the
232
+Edge Clouds:
233
+
234
+NOTE: These are public REST APIs in the sense that the Central Cloud
235
+will use these REST APIs to synchronize data to the Edge Clouds.  HOWEVER
236
+these REST APIs are NOT intended to be used by an end user.
237
+
238
+* GET /v1.0/identity/users
239
+
240
+  * Description:  DB SYNC List all identity users
241
+  * Normal Reponse Codes:  200
242
+  * Error Response Codes:  computeFault (400, 500, …),
243
+    serviceUnavailable (503), badRequest (400), unauthorized (401),
244
+    forbidden (403), badMethod (405), overLimit (413), badMediaType (415)
245
+  * Response Parameters:
246
+
247
+    * < all users of the Keystone DB Table >
248
+
249
+      * < all the attributes of the Keystone User DB Table >
250
+
251
+* GET /v1.0/identity/users/<UUID>
252
+
253
+  * Description:  DB SYNC Get specific identity user
254
+  * Normal Reponse Codes:  200
255
+  * Error Response Codes:  computeFault (400, 500, …),
256
+    serviceUnavailable (503), badRequest (400), unauthorized (401),
257
+    forbidden (403), badMethod (405), overLimit (413), badMediaType (415)
258
+  * Response Parameters:
259
+
260
+    * < all the attributes of the Keystone User DB Table >
261
+
262
+* POST /v1.0/identity/users
263
+
264
+  * Description:  DB SYNC create identity user (and password)
265
+  * Normal Reponse Codes:  201
266
+  * Error Response Codes:  computeFault (400, 500, …),
267
+    serviceUnavailable (503), badRequest (400), unauthorized (401),
268
+    forbidden (403), badMethod (405), overLimit (413), badMediaType (415)
269
+  * Request Parameters:
270
+
271
+    * < all the attributes of the Keystone User DB Table >
272
+
273
+* PUT /v1.0/identity/users/<UUID>
274
+
275
+  * Description:  DB SYNC update identity user (and password)
276
+  * Normal Reponse Codes:  202
277
+  * Error Response Codes:  computeFault (400, 500, …),
278
+    serviceUnavailable (503), badRequest (400), unauthorized (401),
279
+    forbidden (403), badMethod (405), overLimit (413), badMediaType (415)
280
+  * Request Parameters:
281
+
282
+    * < all the attributes of the Keystone User DB Table >
283
+
284
+
285
+... and similarly for the other Keystone DB Resources
286
+
287
+* GET /v1.0/identity/projects
288
+* GET /v1.0/identity/projects/<UUID>
289
+* POST /v1.0/identity/projects
290
+* PUT /v1.0/identity/projects/<UUID>
291
+
292
+|
293
+
294
+* GET /v1.0/identity/assignments
295
+* GET /v1.0/identity/assignments/<UUID>
296
+* POST /v1.0/identity/assignments
297
+* PUT /v1.0/identity/assignments/<UUID>
298
+
299
+|
300
+
301
+* GET /v1.0/identity/token-revocation-events
302
+* GET /v1.0/identity/token-revocation-events/<UUID>
303
+* POST /v1.0/identity/token-revocation-events
304
+
305
+Synchronization of Fernet Keys
306
+------------------------------
307
+
308
+The following REST APIs were added to the STX-CONFIG service to support
309
+synchronization of Fernet Key Repo between the Central Cloud and the
310
+Edge Clouds:
311
+
312
+NOTE: These are public REST APIs in the sense that the Central Cloud
313
+will use these REST APIs to synchronize data to the Edge Clouds.  HOWEVER
314
+these REST APIs are NOT intended to be used by an end user.
315
+
316
+* POST /v1/fernet_repo
317
+
318
+  * Description:  Distribute fernet repo
319
+  * Normal Reponse Codes:  201
320
+  * Error Response Codes:  computeFault (400, 500, …),
321
+    serviceUnavailable (503), badRequest (400), unauthorized (401),
322
+    forbidden (403), badMethod (405), overLimit (413), badMediaType (415)
323
+  * Request Parameters:
324
+
325
+    * Content-Type application/json
326
+
327
+      * Style: Plain
328
+      * Type: Xsd:String
329
+      * Description: The list of Fernet Keys.
330
+
331
+* PUT /v1/fernet_repo
332
+
333
+  * Description:  Update fernet repo with keys
334
+  * Normal Reponse Codes:  202
335
+  * Error Response Codes:  computeFault (400, 500, …),
336
+    serviceUnavailable (503), badRequest (400), unauthorized (401),
337
+    forbidden (403), badMethod (405), overLimit (413), badMediaType (415)
338
+  * Request Parameters:
339
+
340
+    * Content-Type application/json
341
+
342
+      * Style: Plain
343
+      * Type: Xsd:String
344
+      * Description: The list of Fernet Keys.
345
+
346
+* GET /v1/fernet_repo
347
+
348
+  * Description:  List contents of fernet_repo (the keys)
349
+  * Normal Reponse Codes:  200
350
+  * Error Response Codes:  computeFault (400, 500, …),
351
+    serviceUnavailable (503), badRequest (400), unauthorized (401),
352
+    forbidden (403), badMethod (405), overLimit (413), badMediaType (415)
353
+  * Response Parameters:
354
+
355
+    * Fernet_keys
356
+
357
+      * Style: Plain
358
+      * Type: Xsd:List
359
+      * Description: The list of fernet keys
360
+
361
+Security impact
362
+===============
363
+
364
+This work only impacts security in a Distributed Cloud environment.
365
+
366
+In a Distributed Cloud environment, this work directly manipulates Identity
367
+data by synchronizing selected Keystone resources and Fernet Keys between
368
+the Central Cloud and the Edge Clouds.
369
+
370
+The only external impact is that in a Distributed Cloud environment,
371
+a Token created on any Cloud (Central or Edge) can be used on any or
372
+all Clouds (Central or Edge).
373
+
374
+Other end user impact
375
+=====================
376
+
377
+This work only impacts end user in a Distributed Cloud environment.
378
+
379
+In a Distributed Cloud environment, a user can indirectly interact with the
380
+feature when using ANY OpenStack Service API across Edge Clouds by
381
+leveraging the fact that a Token created on the Central Cloud can be
382
+used on any or all Edge Clouds.
383
+
384
+In a Distributed Cloud environment, in an edge cloud network isolation
385
+scenario, an end user, local to the edge site, can now login / authenticate
386
+with his normal userid and credentials and manage his workloads.
387
+
388
+Performance Impact
389
+==================
390
+
391
+This work only impacts performance in a Distributed Cloud environment.
392
+
393
+Overall there is a reduced amount of synchronization messaging between
394
+the Central Cloud and the Edge Clouds in a Distributed Cloud Environment.
395
+
396
+Logically more data is being synchronized; i.e. Fernet Keys and selected
397
+Keystone DB Resources, in addition to the existing selected STX, Nova,
398
+Neutron and Cinder DB Resources.  However with the ability to use a
399
+single Token, generated on the Central Cloud, for ALL Edge Cloud
400
+synchronization messages, this drastically reduces the Synchronization
401
+Framework messaging.
402
+
403
+Other deployer impact
404
+=====================
405
+
406
+There are no deployer impacts with this work.
407
+
408
+Developer impact
409
+=================
410
+
411
+In a Distributed Cloud environment, developers implementing new services
412
+that orchestrate across all Edge Clouds should leverage the fact that
413
+a Token created on the Central Cloud can be used on ANY / ALL Edge Clouds,
414
+in order to reduce their messaging impact on the system.
415
+
416
+
417
+Upgrade impact
418
+===============
419
+
420
+In a Distributed Cloud environment, there are upgrade impacts with this work;
421
+i.e. when upgrading from OpenStack Version N to OpenStack Version N+1.
422
+
423
+This work is sensitive to any Keystone DB Model changes.  However the
424
+architecture of the DB-based synchronization within the StarlingX
425
+Distributed Cloud Synchronization Framework does support the ability
426
+to manage DB Schema changes between the Central Cloud and the Edge Cloud.
427
+This was one of the major reasons for choosing this approach.
428
+
429
+The plan for Software Upgrades (from one OpenStack Version to another), in
430
+a Distributed Cloud environment, is that the Central Cloud will be
431
+upgraded first to version N+1, and then the Edge Clouds.
432
+
433
+If the Keystone DB Schema changes between version N and version N+1,
434
+the N+1 version of Distributed Cloud Synchronization Framework must
435
+implement the Keystone DB Schema conversions between N+1 and N,
436
+for all synchronization messages during the Rolling Software Upgrade
437
+across the entire Distributed Cloud system.
438
+
439
+Implementation
440
+==============
441
+
442
+Assignee(s)
443
+===========
444
+
445
+Primary assignee:
446
+  Andy Ning
447
+
448
+Other contributors:
449
+  Tao Liu
450
+
451
+Repos Impacted
452
+==============
453
+
454
+Repositories in StarlingX that are impacted by this spec:
455
+
456
+* stx-distcloud
457
+
458
+Work Items
459
+===========
460
+
461
+Synchronization Framework Support for Keystone DB-based Synchronization
462
+-----------------------------------------------------------------------
463
+
464
+* Introduce dbsync agent/api on sub cloud, and add it to starlingx as a new
465
+  service,
466
+* REST APIs between dcorch engine and dbsync agent (POST/PUT/GET),
467
+* Implement dbsync client to wrap dbsync APIs into python functions,
468
+* Enhance identity module within dcorch engine to do DB based resource
469
+  synchronization,
470
+* Enhance identity module within dcorch engine to do DB based resource audit,
471
+* Add new resources to be synced (token revocation events),
472
+
473
+  *  NOTE: that current code is synching users, passwords, projects, roles and
474
+      role assignments ... albeit using API-based synchronization,
475
+
476
+* Deployment and configuration of new StarlingX DistCloud Services,
477
+* Unit test.
478
+
479
+
480
+Synchronization of Fernet Keys
481
+------------------------------
482
+
483
+* Add new stx-config APIs (POST) for central cloud to distribute fernet repo
484
+  including RPC between stx-config API and conductor,
485
+* Add new stx-config APIs (GET) for central cloud to audit existing keys
486
+  including RPC between stx-config API and conductor,
487
+* Add new stx-config APIs (PUT) for central cloud to update repo with keys
488
+  including RPC between stx-config API and conductor,
489
+* stx-config internally, safely retrieve and update fernet keys,
490
+* Enhance stx-distcloud orch engine (or cron job) to rotate keys and
491
+  call stx-config APIs to distribute new keys,
492
+* Enhance stx-distcloud orch engine to audit fernet keys across managed
493
+  sub clouds, and call stx-config APIs to distribute keys if mis-matches found,
494
+* Enhance dc manager to trigger key distribution when a sub cloud becomes
495
+  managed,
496
+* Add logic to stx-config to empty and re-setup fernet repo locally when
497
+  receive an empty POST,
498
+* stx-config/stx-metal/stx-distcloud unit test (Tox),
499
+* Manifest for fernet repo and keys creation during deployment may not need
500
+  any changes on both central cloud and sub clouds.
501
+
502
+Dependencies
503
+============
504
+
505
+There are no external dependencies for this work.
506
+
507
+I.e. there are NO requirements on changes to OpenStack Keystone.
508
+
509
+Testing
510
+=======
511
+
512
+Need to do explicit testing of Fernet Token synchronization and Keystone
513
+DB Resource synchronization between Central Cloud and Edge Clouds.
514
+
515
+Need to do COMPLETE regression of StarlingX Distributed Cloud (DC)
516
+functionality.
517
+
518
+Should qualitatively evaluate performance / messaging scalability
519
+improvements before and after this work.
520
+
521
+Need to do a SANITY regression of StarlingX in an NON-DC environment.
522
+
523
+Documentation Impact
524
+====================
525
+
526
+Currently there is no documentation on the StarlingX Distributed Cloud
527
+functionality.  When this documentation is created, the work of this
528
+specification should be described at a functional level.
529
+
530
+References
531
+==========
532
+
533
+None.
534
+
535
+
536
+History
537
+=======
538
+
539
+.. list-table:: Revisions
540
+   :header-rows: 1
541
+
542
+   * - Release Name
543
+     - Description
544
+   * - 19.03
545
+     - Introduced

Loading…
Cancel
Save