Browse Source

Create Spec: StarlingX - Distributed Cloud - Synchronized Keystone

As agreed upon within Edge-Computing meetings, this
specification proposes an additional Identity solution for the
Edge Reference Architecture; i.e. a 'Synchronized Keystone'
solution.  This solution addresses Edge-Computing Use Cases
where full autonomy is required on network connectivity loss
but without the overhead of running an Identity Provider (IDP)
presence at each Edge Cloud site.

Change-Id: Ie60c324e01c23b262336ce24c481e359c5bd61d7
Signed-off-by: Greg Waines <greg.waines@windriver.com>
changes/53/619053/5
Greg Waines 1 year ago
parent
commit
1ed103250b
1 changed files with 545 additions and 0 deletions
  1. +545
    -0
      specs/2019.03/approved/distcloud-2002842-synchronizedKeystone.rst

+ 545
- 0
specs/2019.03/approved/distcloud-2002842-synchronizedKeystone.rst View File

@@ -0,0 +1,545 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License. http://creativecommons.org/licenses/by/3.0/legalcode

..
Many thanks to the OpenStack Nova team for the Example Spec that formed the
basis for this document.

=========================================
Distributed Cloud - Synchronized Keystone
=========================================

| Storyboard: https://storyboard.openstack.org/#!/story/2002842
| ( Distributed Cloud Keystone Scalability )
|

The OpenStack Edge-Computing group has defined an Edge Reference Architecture.
For Identity Management, it uses Federated Keystone to manage Identity across
all Edge Clouds. If 'full autonomy' is required at Edge Clouds, this requires
a Distributed Identity Provider Solution with an Identity Provider (IDP)
presence at every Edge Cloud.

The Federated Keystone solution makes sense where:

* Integration with an existing IDP infrastructure is already required,
* In large deployments that would benefit from distributed IDP solutions,
* Where partial autonomy is acceptable in the presence of edge cloud isolation
or
* The cost of hosting an IDP presence at every Edge Cloud is acceptable for
full autonomy.

The OpenStack Edge-Computing group recognizes that there is more than a
'one-size-fits-all' architecture for the Edge. As agreed upon within
the OpenStack Edge-Computing meetings, this specification proposes an
additional Identity solution for the Edge Reference Architecture; i.e. a
'Synchronized Keystone' solution. In the Synchronized Keystone solution,
a Synchronization Framework synchronizes the Identity Resources of a Central
Cloud to all of the Edge Clouds.

Synchronized Keystone provides an Identity solution for the edge where :

* a simpler standalone Identity solution can be used for the edge cloud
deployments, and
* the edge cloud sites are compute-power-limited deployments, e.g. small
All-In-One (AIO) simplex / duplex servers, where the cost of hosting
an IDP presence in support of full autonomy is too high.

Problem description
===================

In a distributed edge cloud environment, with 100s or 1000s of edge cloud
sites, the centralized orchestration of cloud services across all the edge
cloud sites is imperative for operational usability. This specification
deals specifically with the centralized orchestration of the Identity Cloud
Service across all the edge cloud sites.

For the Identity Cloud Service, in a distributed edge cloud environment, it is
desired to support the same set of Users and Projects across all edge clouds.
I.e. At any edge cloud, be able to login with the same User name and Project
name, using the same authentication credentials and getting the same
authorization capabilities and roles.

Note that for some use cases, network connectivity between the edge cloud and
the central cloud is not reliable. The Identity Cloud Service at the edge
cloud must be fully autonomous in the event of network connectivity loss to
the central cloud. I.e. both Service Users as well as Tenant Users must
continue to be able to authenticate and be authorized when the edge cloud is
isolated from the central cloud.

This specification also enables an optimization for orchestration scalability
in the distributed edge cloud environment. The orchestration of services
across all edge clouds requires authentication, typically of the same user,
across 100s/1000s of edge clouds. With the Identity Service's Users and
Projects now synchronized across all edge clouds, then by additionally
synchronizing Fernet Keys across all edge clouds, an authenticated Fernet
Token generated at the Central Cloud can be used at any or all edge clouds;
reducing the 100s or 1000s of authentication operations to a single
authentication.

Use Cases
=========

The requirement for common Identity Users and Projects across all edge clouds
applies to all Edge Computing Use Cases.

The Use Cases that require full autonomy of edge clouds (in the event of edge
cloud isolation) are Use Cases where:

* There are both

* Remote Physical users (at a central cloud site) and
* Local physical users (at edge cloud sites).

* All 'userids' are centrally managed for security reasons,
* At the edge cloud site,

* When connectivity to central cloud is lost

* local edge users must be able to manage their edge cloud and workloads on
the edge cloud,
* ... using their normal userid credentials.

Examples of such Use Cases are:

* Management of Retail Chains (e.g. Walmart)
* Large Hospital Campus
* Large Control Plant

These are also Use Cases where the simplicity of a standalone Identity solution
for the edge would be desirable.

Background
==========

The Distributed Cloud (DC) sub-project within StarlingX, already supports a
Synchronization Framework which is used to synchronize Nova, Neutron, Cinder
and StarlingX resources from the Central Cloud to all of the Edge Clouds.

This Synchronization Framework provides:

* Synchronization Request Management

* Managing Synchronization Request Message Queues per Edge Cloud,
* With retry on failure.

* The Overall Synchronization Audit Sequencing,
* Connectivity Status tracking for Edge Clouds, and
* Synchronization Status tracking for Edge Clouds.

For the existing framework, each Service being synchronized implements the
following within the Synchronization Framework:

* an API Proxy

* For intercepting Service API calls in order to trigger immediate
synchronization to Edge Clouds,

* a DC Orchestration Module

* For Service-specific details of Service API Request building and auditing,
* For managing the mapping of resources in each subcloud to the canonical
resource in the central cloud, and
* (in future) for dealing with any API / Schema differences between Central
Cloud and Edge Cloud (e.g. in Software Upgrade scenario).

Currently the existing Synchronization Framework supports REST API -based
synchronization of a Service's resources.

For OpenStack Keystone, a REST API -based synchronization approach will not
work since not all details of Keystone resources are exposed thru Keystone's
REST APIs, e.g.:

* User-IDs and Project-IDs can NOT be set on POST
(required to be synchronized so that Fernet Tokens can be used on any/all
edge clouds)
* Revocation events, generated internally by Keystone to track events that
affect token validity, are NOT exposed via Keystone REST API,

Proposed change
===============

Synchronization Framework Support for Keystone DB-based Synchronization
-----------------------------------------------------------------------

This specification proposes enhancing the StarlingX's Distributed Cloud's
Synchronization Framework to support DB-based synchronization of a Service's
resources.

I.e. use the existing Synchronization Framework in order to leverage the
existing retry mechanisms, audit mechanisms, synch status tracking, etc.,
but in this case, the Service Module within the 'DC Orchestration Engine'
would synchronize DB Records by:

* Directly querying/setting the Services' DB, and
* Using a new (admin-only) StarlingX DC DB SYNC Service and its REST API
on the StarlingX Edge Cloud which exposes the DB operations remotely
for synchronization purposes.

The Service's API Proxy triggers an immediate DB sync of the affected row(s)
of the Service's DB table(s), due to particular API request, while the
Synchronization Framework's Audit Mechanism (default every 10 mins) deals
with non-API events, unexpected events and/or errors to ensure required DB
Table(s) are in-sync.

The following Keystone resources will be synchronized with this method:
Users, Passwords, Projects, Roles, Role Assignments and Token Revocation
Events.

Synchronization of Fernet Keys
------------------------------

This specification also proposes enhancing the StarlingX's Distributed
Cloud's Synchronization Framework to support API-based synchronization of
the Fernet Key Repo.

New REST APIs for bulk synching of the Fernet Key Repo, updating the Fernet
Key Repo (on rotation of keys) and auditing of the Fernet Key Repo are
added to the STX-CONFIG service.

The Synchronization Framework will be extended to support Fernet Key Repo
synchronization thru the STX-CONFIG service; adding a Fernet Key Manager to
the STX-CONFIG DC Orchestration Module for managing the Fernet Key Repo
synchronization messaging done by the Synchronization Framework.

Alternatives
============

An alternative solution considered for synchronizing keystone would be to use
built-in DB synchronization of open-source DBs used within StarlingX for
the OpenStack Service DBs. I.e. use the built-in DB Synchronization
capabilities of mariaDB or postgresDB, both of which support replication
of DB Tables from a single R/W Master to multiple ReadOnly Slaves.

However, the built-in DB synchronization solutions of mariaDB or postgresDB,
do NOT support the ability of handling different DB Schemas in the Central
Cloud and Edge Clouds; i.e. required for Software Upgrade scenarios, or even
just a heterogeneous mix of openstack-versioned edge clouds.

Data model impact
=================

There are no DB Model changes required to any Services.

REST API impact
===============

Synchronization Framework Support for Keystone DB-based Synchronization
-----------------------------------------------------------------------

The following REST APIs were added to the STX-DISTCLOUD service to support
DB-based synchronization of Services between the Central Cloud and the
Edge Clouds:

NOTE: These are public REST APIs in the sense that the Central Cloud
will use these REST APIs to synchronize data to the Edge Clouds. HOWEVER
these REST APIs are NOT intended to be used by an end user.

* GET /v1.0/identity/users

* Description: DB SYNC List all identity users
* Normal Reponse Codes: 200
* Error Response Codes: computeFault (400, 500, …),
serviceUnavailable (503), badRequest (400), unauthorized (401),
forbidden (403), badMethod (405), overLimit (413), badMediaType (415)
* Response Parameters:

* < all users of the Keystone DB Table >

* < all the attributes of the Keystone User DB Table >

* GET /v1.0/identity/users/<UUID>

* Description: DB SYNC Get specific identity user
* Normal Reponse Codes: 200
* Error Response Codes: computeFault (400, 500, …),
serviceUnavailable (503), badRequest (400), unauthorized (401),
forbidden (403), badMethod (405), overLimit (413), badMediaType (415)
* Response Parameters:

* < all the attributes of the Keystone User DB Table >

* POST /v1.0/identity/users

* Description: DB SYNC create identity user (and password)
* Normal Reponse Codes: 201
* Error Response Codes: computeFault (400, 500, …),
serviceUnavailable (503), badRequest (400), unauthorized (401),
forbidden (403), badMethod (405), overLimit (413), badMediaType (415)
* Request Parameters:

* < all the attributes of the Keystone User DB Table >

* PUT /v1.0/identity/users/<UUID>

* Description: DB SYNC update identity user (and password)
* Normal Reponse Codes: 202
* Error Response Codes: computeFault (400, 500, …),
serviceUnavailable (503), badRequest (400), unauthorized (401),
forbidden (403), badMethod (405), overLimit (413), badMediaType (415)
* Request Parameters:

* < all the attributes of the Keystone User DB Table >


... and similarly for the other Keystone DB Resources

* GET /v1.0/identity/projects
* GET /v1.0/identity/projects/<UUID>
* POST /v1.0/identity/projects
* PUT /v1.0/identity/projects/<UUID>

|

* GET /v1.0/identity/assignments
* GET /v1.0/identity/assignments/<UUID>
* POST /v1.0/identity/assignments
* PUT /v1.0/identity/assignments/<UUID>

|

* GET /v1.0/identity/token-revocation-events
* GET /v1.0/identity/token-revocation-events/<UUID>
* POST /v1.0/identity/token-revocation-events

Synchronization of Fernet Keys
------------------------------

The following REST APIs were added to the STX-CONFIG service to support
synchronization of Fernet Key Repo between the Central Cloud and the
Edge Clouds:

NOTE: These are public REST APIs in the sense that the Central Cloud
will use these REST APIs to synchronize data to the Edge Clouds. HOWEVER
these REST APIs are NOT intended to be used by an end user.

* POST /v1/fernet_repo

* Description: Distribute fernet repo
* Normal Reponse Codes: 201
* Error Response Codes: computeFault (400, 500, …),
serviceUnavailable (503), badRequest (400), unauthorized (401),
forbidden (403), badMethod (405), overLimit (413), badMediaType (415)
* Request Parameters:

* Content-Type application/json

* Style: Plain
* Type: Xsd:String
* Description: The list of Fernet Keys.

* PUT /v1/fernet_repo

* Description: Update fernet repo with keys
* Normal Reponse Codes: 202
* Error Response Codes: computeFault (400, 500, …),
serviceUnavailable (503), badRequest (400), unauthorized (401),
forbidden (403), badMethod (405), overLimit (413), badMediaType (415)
* Request Parameters:

* Content-Type application/json

* Style: Plain
* Type: Xsd:String
* Description: The list of Fernet Keys.

* GET /v1/fernet_repo

* Description: List contents of fernet_repo (the keys)
* Normal Reponse Codes: 200
* Error Response Codes: computeFault (400, 500, …),
serviceUnavailable (503), badRequest (400), unauthorized (401),
forbidden (403), badMethod (405), overLimit (413), badMediaType (415)
* Response Parameters:

* Fernet_keys

* Style: Plain
* Type: Xsd:List
* Description: The list of fernet keys

Security impact
===============

This work only impacts security in a Distributed Cloud environment.

In a Distributed Cloud environment, this work directly manipulates Identity
data by synchronizing selected Keystone resources and Fernet Keys between
the Central Cloud and the Edge Clouds.

The only external impact is that in a Distributed Cloud environment,
a Token created on any Cloud (Central or Edge) can be used on any or
all Clouds (Central or Edge).

Other end user impact
=====================

This work only impacts end user in a Distributed Cloud environment.

In a Distributed Cloud environment, a user can indirectly interact with the
feature when using ANY OpenStack Service API across Edge Clouds by
leveraging the fact that a Token created on the Central Cloud can be
used on any or all Edge Clouds.

In a Distributed Cloud environment, in an edge cloud network isolation
scenario, an end user, local to the edge site, can now login / authenticate
with his normal userid and credentials and manage his workloads.

Performance Impact
==================

This work only impacts performance in a Distributed Cloud environment.

Overall there is a reduced amount of synchronization messaging between
the Central Cloud and the Edge Clouds in a Distributed Cloud Environment.

Logically more data is being synchronized; i.e. Fernet Keys and selected
Keystone DB Resources, in addition to the existing selected STX, Nova,
Neutron and Cinder DB Resources. However with the ability to use a
single Token, generated on the Central Cloud, for ALL Edge Cloud
synchronization messages, this drastically reduces the Synchronization
Framework messaging.

Other deployer impact
=====================

There are no deployer impacts with this work.

Developer impact
=================

In a Distributed Cloud environment, developers implementing new services
that orchestrate across all Edge Clouds should leverage the fact that
a Token created on the Central Cloud can be used on ANY / ALL Edge Clouds,
in order to reduce their messaging impact on the system.


Upgrade impact
===============

In a Distributed Cloud environment, there are upgrade impacts with this work;
i.e. when upgrading from OpenStack Version N to OpenStack Version N+1.

This work is sensitive to any Keystone DB Model changes. However the
architecture of the DB-based synchronization within the StarlingX
Distributed Cloud Synchronization Framework does support the ability
to manage DB Schema changes between the Central Cloud and the Edge Cloud.
This was one of the major reasons for choosing this approach.

The plan for Software Upgrades (from one OpenStack Version to another), in
a Distributed Cloud environment, is that the Central Cloud will be
upgraded first to version N+1, and then the Edge Clouds.

If the Keystone DB Schema changes between version N and version N+1,
the N+1 version of Distributed Cloud Synchronization Framework must
implement the Keystone DB Schema conversions between N+1 and N,
for all synchronization messages during the Rolling Software Upgrade
across the entire Distributed Cloud system.

Implementation
==============

Assignee(s)
===========

Primary assignee:
Andy Ning

Other contributors:
Tao Liu

Repos Impacted
==============

Repositories in StarlingX that are impacted by this spec:

* stx-distcloud

Work Items
===========

Synchronization Framework Support for Keystone DB-based Synchronization
-----------------------------------------------------------------------

* Introduce dbsync agent/api on sub cloud, and add it to starlingx as a new
service,
* REST APIs between dcorch engine and dbsync agent (POST/PUT/GET),
* Implement dbsync client to wrap dbsync APIs into python functions,
* Enhance identity module within dcorch engine to do DB based resource
synchronization,
* Enhance identity module within dcorch engine to do DB based resource audit,
* Add new resources to be synced (token revocation events),

* NOTE: that current code is synching users, passwords, projects, roles and
role assignments ... albeit using API-based synchronization,

* Deployment and configuration of new StarlingX DistCloud Services,
* Unit test.


Synchronization of Fernet Keys
------------------------------

* Add new stx-config APIs (POST) for central cloud to distribute fernet repo
including RPC between stx-config API and conductor,
* Add new stx-config APIs (GET) for central cloud to audit existing keys
including RPC between stx-config API and conductor,
* Add new stx-config APIs (PUT) for central cloud to update repo with keys
including RPC between stx-config API and conductor,
* stx-config internally, safely retrieve and update fernet keys,
* Enhance stx-distcloud orch engine (or cron job) to rotate keys and
call stx-config APIs to distribute new keys,
* Enhance stx-distcloud orch engine to audit fernet keys across managed
sub clouds, and call stx-config APIs to distribute keys if mis-matches found,
* Enhance dc manager to trigger key distribution when a sub cloud becomes
managed,
* Add logic to stx-config to empty and re-setup fernet repo locally when
receive an empty POST,
* stx-config/stx-metal/stx-distcloud unit test (Tox),
* Manifest for fernet repo and keys creation during deployment may not need
any changes on both central cloud and sub clouds.

Dependencies
============

There are no external dependencies for this work.

I.e. there are NO requirements on changes to OpenStack Keystone.

Testing
=======

Need to do explicit testing of Fernet Token synchronization and Keystone
DB Resource synchronization between Central Cloud and Edge Clouds.

Need to do COMPLETE regression of StarlingX Distributed Cloud (DC)
functionality.

Should qualitatively evaluate performance / messaging scalability
improvements before and after this work.

Need to do a SANITY regression of StarlingX in an NON-DC environment.

Documentation Impact
====================

Currently there is no documentation on the StarlingX Distributed Cloud
functionality. When this documentation is created, the work of this
specification should be described at a functional level.

References
==========

None.


History
=======

.. list-table:: Revisions
:header-rows: 1

* - Release Name
- Description
* - 19.03
- Introduced

Loading…
Cancel
Save