infra-specs/specs/retire-static.rst
Ian Wienand 1533f03a30 Spec to retire static.openstack.org
Change-Id: Ic5557750ee6c52def01c8d362b8d9e7563cc0f8a
2019-10-23 06:09:44 +11:00

379 lines
12 KiB
ReStructuredText

::
Copyright 2019 Red Hat Inc.
This work is licensed under a Creative Commons Attribution 3.0
Unported License.
http://creativecommons.org/licenses/by/3.0/legalcode
..
This template should be in ReSTructured text. Please do not delete
any of the sections in this template. If you have nothing to say
for a whole section, just write: "None". For help with syntax, see
http://sphinx-doc.org/rest.html To test out your formatting, see
http://www.tele3.cz/jbar/rest/rest.html
===========================
Retire static.openstack.org
===========================
Include the URL of your StoryBoard story:
https://storyboard.openstack.org/#!/story/2006598
Move the services provided by ``static.openstack.org`` into less
centralised approaches more consistent with modern deployment trends.
Problem Description
===================
The ``static.openstack.org`` host is a monolithic server providing
various hosting services via a large amount of volume-attached
storage.
The immediate problem is it currently running Ubuntu Trusty which is
reaching the end of its supported life.
The secondary problems are twofold:
Firstly, we would like to move the various publishing and hosting
operations from centralised volumes on a single server to our AFS
distributed file-system.
Secondly, we would like to make the hosting portion more OpenDev
compatible; this means avoiding working on legacy deployment methods
(i.e. puppet) and integrating with our general idea of a "whitebox"
service that can be used by many different projects.
Thus we propose breaking up the services it offers to utlise more
modern infrastructure alternatives and retiring the host.
Proposed Change
===============
We can break the services down
Log storage
Legacy log storage (~14tb)
Redirects
Apache service redirects a number of legacy URLs to new locations
Static site serving
100gb attached partition holding various static sites (i.e. plain
HTML publishing, no middleware, etc)
Tarball
512gb partition which holds and publishes release tarballs for all
projects.
Alternatives
------------
``apt-get dist-ugprade`` the host to a more recent distribution, fix
any puppet issues and ignore it until next time it needs updating.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
TBD
Gerrit Topic
------------
Use Gerrit topic "static-services" all patches related to this spec.
.. code-block:: bash
git-review -t static-services
Work Items
----------
Log storage
~~~~~~~~~~~
OpenDev CI logs have been moved to various object-storage backends
provided by donors. The existing logs will age out per our existing
old-log cleanup jobs.
Since logs were always ephemeral there should be no issues with old
links. For clarity we will remove (rather than redirect) the
``logs.openstack.org`` DNS entry so there is no confusion that logs
might still live there.
Workitems:
* remove ``logs.openstack.org`` DNS entries after old logs entries
have cleared out
Legacy redirects
~~~~~~~~~~~~~~~~
The following do straight redirects from their config hostnames to
``docs.openstack.org``
* 50-cinder.openstack.org.conf
* 50-devstack.org.conf
* 50-glance.openstack.org.conf
* 50-horizon.openstack.org.conf
* 50-keystone.openstack.org.conf
* 50-nova.openstack.org.conf
* 50-swift.openstack.org.conf
The following have slightly different semantics
* 50-ci.openstack.org.conf
* ``/nodepool``, ``/shade``, ``/zuul``, etc all to docs; see
https://opendev.org/opendev/system-config/src/branch/master/modules/openstack_project/templates/ci.vhost.erb
* 50-qa.openstack.org.conf
* currently redirects to broken link
https://docs.openstack.org/developer/qa
The following redirects to ``openstack.org``
* 50-summit.openstack.org.conf
Clearly there is a need for a generic ability to redirect various URLs
as things change over time.
We will use a single containerised ``haproxy`` instance to handle
redirects for the OpenDev project. Although initially it will simply
be handling 302 redirects, it is imagined that future services can use
it for it's availability or load-balancing services as well. Note
that ``gitea`` services also have their own load-balancer; although it
reuses all the deployment mechanisms, the production service is kept
separately to maintain isolation been probably the most important
service (code) and more informational services.
Proof-of-concept reviews are provided at:
* https://review.opendev.org/677903 : make haproxy role more generic
* https://review.opendev.org/678159 : add a service load balancer
The work items consist of:
* approval of the above reviews
* starting the production host
* iterating the extant DNS records and pointing them to the new
load-balancer
OpenDev infrastructure migration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We wish to provide new services only using our latest deployment
methods, to avoid introducing even more legacy services and to provide
a basis for the migration process to OpenDev services.
Although ``files02.openstack.org`` has an existing role as a webserver
serving content from the ``/openstack.org`` AFS mount, it is
configured using legacy puppet. Thus a new server will be provisioned
using our Ansible environment, rather than adding more hosts to legacy
configuration.
This server should be a "whitebox" server that is capable of serving a
range of domains that OpenDev would like to serve. However, it's role
will only to be to serve static directories on AFS volumes. After
this process, there will be numerous examples of SSL certificate
generation, vhost configuration, AFS volume setup and publishing jobs
for any other projects to copy and implement.
Initially this server needs to serve https sites for the replacement
services; namely
* governance.openstack.org
* specs.openstack.org
* security.openstack.org
* service-types.openstack.org
* releases.openstack.org
* tarballs.openstack.org
Currently, SSL certificates are manually provisioned and entered into
puppet secret data, where they are deployed to the host. We wish to
use automatically renewing letsencrypt certificates per our other
infrastructure, utilising our DNS based authentication. However,
since ``openstack.org`` remains administered by external teams in
RAX's propietary environment, we will make an exception and setup DNS
validation records manually for these legacy sites until a full
migration of ``openstack.org`` to OpenDev infrastructure is possible.
Other domains will use OpenDev nameservers, which support automated
DNS validation renewals.
We will have the new server provisioned and ready before we begin the
steps of migrating publishing locations. This means we can debug any
setup issues outside production, and effects a zero-downtime cutover
when the sites are ready.
Workitems are as follows:
* Write roles and tests to provision a new ``static01.opendev.org``
server which will be limited to running Apache and serving AFS
directories.
* Create the server
* Create CNAME ``static.opendev.org`` which will be the main service
hostname, to provide for easier server replacement or other updates
in the future.
* Pre-provision https certificates for the above listed services
* Using the RAX web interface for name services and the openstack
infra permissions, setup
``_acme-challenge.<service>.openstack.org`` records as a CNAME to
``acme.opendev.org``.
* Each site should have a separate certificate provisioned. The
configuration would be something like
.. code-block:: yaml
letsencrypt_certs:
governance-openstack-org:
- governance.openstack.org
specs.openstack.org:
- specs.openstack.org
and.so.on.
* Debug any failures; however the theory is (taking one example):
the existing letsencrypt roles should request a certificate for
``governance.openstack.org`` on ``static01.opendev.org`` and
receive the authentication key, which is placed in a TXT record in
``acme.opendev.org``. The certificate creation will will trigger
a lookup of ``_acme-challenge.governance.openstack.org`` which
will be a CNAME to ``acme.opendev.org``, which contains the
correct TXT record. The certificate is issued on
``static01.opendev.org``.
* Preconfigure the vhost configuration for the above sites (using
prior provisioned keys for SSL)
* Confirm correct operation of the sites with dummy content.
Static hosting
~~~~~~~~~~~~~~
A number of jobs publish directly to ``/srv/static`` on the server.
These are then served by Apache as static websites.
In general, we want these jobs to publish to our AFS volumes. By
publishing to AFS we remove the central point of failure of a single
server and it's attached disks (mitigated by multiple AFS servers and
replicas).
The AFS volumes are then served by ``static01.opendev.org`` which has
a dedicated role as an AFS to HTTP bridge.
The sites in question are:
* 50-governance.openstack.org.conf
* https://governance.openstack.org
* main source -> https://opendev.org/openstack/governance-website
* published via https://opendev.org/openstack/project-config/src/branch/master/zuul.d/projects.yaml#L2298
* aliases ``/srv/static/<election|sigs|tc|uc>``
* 50-security.openstack.org.conf
* https://security.openstack.org
* single repo source -> https://opendev.org/openstack/ossa
* deployed by publish-security job -> https://opendev.org/openstack/project-config/src/branch/master/zuul.d/jobs.yaml#L739
* 50-service-types.openstack.org.conf
* https://service-types.openstack.org
* single repo -> https://opendev.org/openstack/service-types-authority
* https://opendev.org/openstack/project-config/src/branch/master/zuul.d/jobs.yaml#L551
* 50-specs.openstack.org.conf
* https://specs.openstack.org
* various spec repos; published by ``openstack-spec-jobs`` to subdirectories
* 50-releases.openstack.org.conf
* https://releases.openstack.org
* generated by -> https://opendev.org/openstack/releases/
* note generates .htaccess with contsraints links, used widely in pip
* publish-tox-jobs-static : https://opendev.org/openstack/project-config/src/branch/master/zuul.d/jobs.yaml#L685
* 50-tarballs.openstack.org.conf
* https://tarballs.openstack.org
* every project's release jobs
The extant AFS layout has volumes for each project. Thus we will
continue this theme and an admin will create one volume for each of
the above static sites; e.g.
* /afs/openstack.org/project/governance.openstack.org (~200mb)
* /afs/openstack.org/project/security.openstack.org (100mb)
* /afs/openstack.org/project/service-types.openstack.org (520k)
* /afs/openstack.org/project/specs.openstack.org (current 706mb)
* /afs/openstack.org/project/releases.openstack.org (current 57mb)
* /afs/openstack.org/project/tarballs.openstack.org (current 134gb)
The work items are as follows
* Create the volumes for each site as described above
* Migrate the extant data to the new volumes. It is impractical to
recreate all the sites as it would require triggering many often
infrequently updated repos.
* Publishing jobs will be updated to use AFS publishing to these new
locations. During transition period, we can publish to both
locations.
* Update the site configuration on ``static01.opendev.org`` to serve
the site from the new location
* We should be able to fully test the new sites at this point with
manual host entries. Ensure:
* https certificates working correctly
* old links remain consistent
* For each site, move to production by updating the CNAME entries in
the ``openstack.org`` domain for the main server to point to
``static.opendev.org`` (note, not the server directly,
i.e. ``static01.opendev.org``, to give us flexibility in managing
the backend service with server replacements or load-balancing in
the future). Per prior testing, this should be transparent.
* Old publishing jobs removed
Repositories
------------
Unlikley to require new repositories
Servers
-------
* a new http server for serving AFS content
* A load-balancer server is suggested to host the haproxy container
DNS Entries
-----------
Quite a few DNS entries will need to be updated as described
Documentation
-------------
Developers should largely not care where the results are published.
Small doc updates for any new services.
A guide to setting up jobs, host configuration, etc. for publishing
static data for other projects may be useful.
Security
--------
N/A
Testing
-------
Since all updates are replacements, we can confirm that the new sites
are operational before putting them into production. Any DNS switches
can be essentially zero impact.
Dependencies
============
N/A at this time