Add proposal for enabling TLS on all internal communications

Change-Id: I4b9d28c1e70d2aba39432f27d550b97691493cc2
This commit is contained in:
James Gibson 2021-12-23 16:39:58 +00:00 committed by Damian Dabrowski
parent 30bbdf82c9
commit 8bc6eb20bc
2 changed files with 295 additions and 0 deletions

View File

@ -21,6 +21,15 @@ Antelope Specifications
specs/antelope/*
Zed Specifications
------------------
.. toctree::
:glob:
:maxdepth: 1
specs/zed/*
Xena Specifications
-------------------

286
specs/zed/internal-tls.rst Normal file
View File

@ -0,0 +1,286 @@
Enabling TLS on Internal Communications
#######################################
:date: 2022-11-16 21:00
:tags: ssl, tls, certificates, https, security
To improve the security of an OpenStack-Ansible deployments all traffic,
both internal and external should be encrypted. There is already
support for encrypting external traffic from all public endpoints that
reside behind haproxy, but this is not the case for all internal traffic.
Problem description
===================
This problem can broadly be split into 3 sections:
* Securing internal communications to the internal haproxy VIP
* Securing internal communications from haproxy to backends
* Securing internal communications between services such as rabbitmq, galera,
nova live migration and noVNC
Securing internal communications to the internal haproxy VIP
------------------------------------------------------------
Support for using TLS on in the internal haproxy VIP is already present in
haproxy role and is enabled for the AIO deployment, but not enabled for new or
upgrades of existing deployments.
There are no issues with enabling TLS on the internal haproxy VIP for new
deployments, but for existing deployments an upgrade process needs to be
implemented. The reason an upgrade process is required is because currently
if you enabled TLS on the internal haproxy VIP it would cause downtime, until
each client is configured to use HTTPS instead of HTTP.
Problems to resolve:
* Haproxy configuration to allow TLS to be enabled without downtime of API's on
existing deployments
* OpenStack-Ansible upgrade process and upgrade scripts to enable TLS without
downtime of API's on existing deployments
Securing internal communications from haproxy to backends
---------------------------------------------------------
Securing the communications from haproxy to the services backends is as
important as securing communication to the internal haproxy VIP.
A large number of the services used with haproxy use UWSGI, meaning once TLS
support is added to the UWSGI role there is only configuration to enable TLS
and the generation of certificates required for each of the services.
For services that do not use USWGI, such a noVNC Proxy further investigation is
required.
As with enabling TLS on the internal haproxy VIP for new deployments, there is
no issue with enabling TLS from haproxy to backends, but an upgrade process for
existing deployments is required. The reason an upgrade process is required is
because if haproxy expects TLS backends, but TLS has not been enabled on the
service yet the connection will fail and if you enable TLS on the service the
connection will fail as haproxy is not configured for TLS.
Problems to resolve:
* Add TLS support to UWSGI
* Add configuration to role for each service that use UWSGI to enable TLS
* Add configuration to role for remaining services that do not use UWSGI
* Add configuration to OpenStack-Ansible to enable TLS on backend of each
service
* OpenStack-Ansible upgrade process and upgrade scripts to enable TLS on
backends without downtime of API's on existing deployments
Securing internal communications between services
-------------------------------------------------
Many OpenStack services communicate directly with each other and do not use
haproxy, these communications should also be secured. The work to secure these
communications is already complete and enabled in the Yoga release of
OpenStack-Ansible, for the following services:
* RabbitMQ
* Galera
* Nova live migrations
* noVNC (noVNC to compute nodes).
Problems to resolve:
* Secure the following services:
- Memcached
- etcd
- OVN/OVS
* Are there any services missing from the list that do not go via haproxy that
need their communications securing?
Proposed change
===============
Enable TLS on all internal communications.
Internal communications could be encrypted using a self-signed certificate,
but as OpenStack-Ansible has support for issuing certificates from a
self-signed private certificate authority using the ansible-role-pki, this
should be used instead as it both encrypts the data and allows a client to
trust the server.
In all cases a user should be able to override the certificates issued by a
self-signed private certificate authority, allowing them to provide their own
certificate which may have been issued by a publicly trusted certificate
authority.
Alternatives
------------
None, internal communications should be protected and TLS is an appropriate
and well used solution.
Playbook/Role impact
--------------------
Roles:
* Support for generating certificates using the ansible-role-pki role will be
added to each service
* Configuring to enable/disable TLS will be added
Upgrade impact
--------------
Enabling TLS could be performed during or post upgrade.
As discussed in the problem description section, enabling TLS on the internal
haproxy VIP and service backends for existing deployments will cause downtime
during an upgrade if enabled. The reason it will cause downtime is that for both
communications from internal client => internal haproxy VIP (server) and
haproxy (client) => openstack service backend (server), both the client and
server need to be updated to use TLS at the same time.
To mitigate this issue I propose an intermediate step during an upgrade, where
haproxy frontend will accept both HTTP and HTTPS communications.
This would be achieved by adding a new TCP frontend to haproxy that accepts
both HTTP and HTTPS traffic and redirects to correct frontend for each,
and means that openstack clients can carry on using the same well known port
and haproxy looks after redirecting them to the correct frontend; HTTP or HTTPS.
To mitigate issues with haproxy<>backend communication, I suggest implementing
"Separated Haproxy Service Config" feature[1] that configures openstack service
and its haproxy service in the same playbook.
The other issue to be aware of is that when user wants to use predefined
certificate, this certificate will be used on all VIPs, both internal and
external.
This means that if TLS is enabled on haproxy's internal VIP, internal clients
must be able to trust the presented certificate if it is the same as the
external certificate.
This limitation does not apply to:
- certbot, which can present a separate certificate on external interfaces.
- PKI role which installs different certificates for external and internal
VIPs by default
Security impact
---------------
This change will encrypt all internal communications, securing any sensitive
data being sent, therefore security is improved.
Performance impact
------------------
Implementing TLS on all internal communications will lead to a small increase
in the processing requirements and latency of servers and clients, but the
increased security outweighs these.
End user impact
---------------
None, if the deployment is done correctly.
Deployer impact
---------------
* Deployer's will need to add monitoring of certificate expiry dates and renew
is necessary, if a certificates expires connections between services will be
dropped.
* This change should have no impact to deployer's of new deployments,
OpenStack-Ansible will create the certificates, deploy them and
configure all services to use them.
* This change will impact existing deployments and an upgrade process will be
implemented to help minimise and possibly prevent this.
Developer impact
----------------
No impact, other that traffic will be encrypted meaning tools like tcpdump
may provide less useful as they will not be able to the see the contents of
packets.
Dependencies
------------
None.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Damian Dabrowski
<damian@dabrowski.cloud>
Work items
----------
* Enable TLS support to UWSGI role
* Enable TLS backend support to haproxy role
* Add configuration to openstack services that use UWSGI to create TLS
certificate and enable TLS on UWSGI
* Add configuration to remaining openstack services that do not use USWGI to
enable TLS support
* Add configuration in OpenStack-Ansible to allow TLS for all service to be
enabled on both the server and haproxy
* Update documentation on TLS configuration options
* Add documentation for upgrade procedure
* Add script to automate as much as possible of the upgrade
Testing
=======
These changes can be tested using the existing setup, but manual testing of
upgrade procedure will be required to make this is does not cause any downtime,
as the automated testing only confirms a working upgrade at the end.
Documentation impact
====================
As this change will add extra configuration options these will need to be
documented.
The upgrade procedure for existing deployments will also have be documented,
as if this functionality is not deployed correctly it may cause system
distribution.
References
==========
[1] https://specs.openstack.org/openstack/openstack-ansible-specs/specs/antelope/separated-haproxy-service-config.html