Rather than running a local zookeeper, just run a real zookeeper.
Also, get rid of nb01-test and just use nb04 - what could possibly
go wrong?
Dynamically write zookeeper host information to nodepool.yaml
So that we can run an actual zk using the new zk role on hosts in
ansible inventory, we need to write out the ip addresses of the
hosts that we build in zuul. This means having the info baked in
to the file in project-config isn't going to work.
We can do this in prod too, it shouldn't hurt anything.
Increase timeout for run-service-nodepool
We need to fix the playbook, but we'll do that after we get the
puppet gone.
Change-Id: Ib01d461ae2c5cec3c31ec5105a41b1a99ff9d84a
Migration plan:
* add zk* to emergency
* copy data files on each node to a safe place for DR backup
* make a json data backup: zk-shell localhost:2181 --run-once 'mirror / json://!tmp!zookeeper-backup.json/'
* manually run a modified playbook to set up the docker infra without starting containers
* rolling restart; for each node:
* stop zk
* split data and log files and move them to new locations
* remove zk packages
* start zk containers
* remove from emergency; land this change.
Change-Id: Ic06c9cf9604402aa8eb4bb79238021c14c5d9563
We want to use stop_grace_period to manage gerrit service stops. This
feature was added in docker-compose 1.10 but the distro provides 1.5.
Work around this by installing docker-compose from pypi.
This seems like a useful feature and we want to manage docker-compose
the same way globally so move docker-compose installation into the
install-docker role.
New docker-compose has slightly different output that we must check for
in the gitea start/stop machinery. We also need to check for different
container name formatting in our test cases. We should pause here and
consider if this has any upgrade implications for our existing services.
Change-Id: Ia8249a2b84a2ef167ee4ffd66d7a7e7cff8e21fb
We use project-config for gerrit, gitea and nodepool config. That's
cool, because can clone that from zuul too and make sure that each
prod run we're doing runs with the contents of the patch in question.
Introduce a flag file that can be touched in /home/zuulcd that will
block zuul from running prod playbooks. By default, if the file is
there, zuul will wait for an hour before giving up.
Rename zuulcd to zuul
To better align prod and test, name the zuul user zuul.
Change-Id: I83c38c9c430218059579f3763e02d6b9f40c7b89
We had the clouds split from back when we used the openstack
dynamic inventory plugin. We don't use that anymore, so we don't
need these to be split. Any other usage we have directly references
a cloud.
Change-Id: I5d95bf910fb8e2cbca64f92c6ad4acd3aaeed1a3
The Airship project is continuously publishing documentation to AFS,
so serve that volume with a corresponding vhost on the static01
server. Also add it to the list of volumes for periodic vos release.
Change-Id: I718963533d9e8596d44d451b5e930314d699fa28
Depends-On: https://review.opendev.org/706599
After the big OpenDev rename, these repos got renamed again. Update the
redirects for git.airshipit.org and git.starlingx.io to point to the
current location.
Update test_static.py for this, change the test repo since
airship-in-a-bottle was first renamed to in-a-bottle and later to
airship-in-a-bottle.
Change-Id: I71b786cd528aac9ae68464618db02e22cd4c0b5b
zuul and nodepool now life in opendev, avoid double redirects and
redirect directly to final location.
Change-Id: Ia55d76b24f07ec64cb55055955c4549f3706a95b
Currently we deploy the openstacksdk config into ~nodepool/.config on
the container, and then map this directory back to /etc/openstack in
the docker-compose. The config-file still hard-codes the
limestone.pem file to ~nodepool/.config.
Switch the nodepool-builder_opendev group to install to
/etc/openstack, and update the nodepool config file template to use
the configured directory for the .pem path.
Also update the testing paths.
Story: #2007407
Task: #39015
Change-Id: I9ca77927046e2b2e3cee9a642d0bc566e3871515
We rolled out review-dev with podman and it worked fine for us. It
worked less fine for nodepool-builder, although we still might be
able to solve it. Maybe right now isn't the time to do this switch.
Gitea, gitea-lb and zuul-registry all use docker instead of podman.
The only thing running with podman right now is review-dev. We can
do a manual cleanup of podman there before runnign this to keep
things simple:
- stop gerrit service
- uninstall podman and podman-compose
- uninstall podman ppa config
- uninstall pip3
Then let ansible install docker and docker compose up.
Story: #2007407
Task: #39062
Change-Id: I9bf99b18559d49d11ba99a96f02a4a45a4f65a86
Currently we don't set a contact email with our accounts. This is an
optional feature, but would be helpful for things like [1] where we
would be notified of certificates affected by bugs, etc.
Setup the email address in the acme.sh config which will apply with
any new accounts created. To update all the existing hosts, we see if
the account email is added/modified in the config *and* if we have
existing account details; if so we need a manual update call.
For anyone who might be poking here, we also add a note on sharing an
account based on some broadly agreed upon discussion in IRC.
[1] https://community.letsencrypt.org/t/revoking-certain-certificates-on-march-4/114864
Change-Id: Ib4dc3e179010419a1b18f355d13b62c6cc4bc7e8
This site was never used nor published, it can be killed according to QA
PTL.
codesearch returns no matches for it in any docs.
Keep the occurence in manifests/static.pp, this will get deleted
as part of https://review.opendev.org/710388.
Change-Id: I3c0d3b567a3eccb959dc903f169197e4581f1e13
The content for many projects has moved but the legacy redirects were
not updated, update to current location.
Change-Id: I7030ad35378085b0c45429c272dc24f00d33b2d2
This is a slight divergence from the accepted spec, where we were
going to implement these redirects via a new haproxy instance
(I961456d44a56f2334d3c94ef27e408f27409cd65). We've decided it's
easier to keep them on static.opendev.org
The following sites are configured to redirect to whatever they are
redirecting to now on static.opendev.org:
* devstack.org
* www.devstack.org
* ci.openstack.org
* cinder.openstack.org
* glance.openstack.org
* horizon.openstack.org
* keystone.openstack.org
* nova.openstack.org
* qa.openstack.org
* summit.openstack.org
* swift.openstack.org
As a bonus, they all get a https instance too, which they didn't have
before.
testinfra coverage should be total for this change. I have created
the _acme-challange CNAME records for all the above.
Story: #2006598
Task: #38881
Change-Id: I3f1fc108e7bb1c9500ad4d1a51df13bb4ae00cb9
When converting this from a htaccess file to run in the virtualhost
context, one instance of '^cgit' -> '^/cgit' was missed. Fix it, and
add a coverage test for it to testinfra.
Change-Id: Icc1dae6dce232e69c5cd1cf98b594f562c60d3f2
This creates the redirect sites
git.airshipit.org
git.openstack.org
git.starlingx.io
git.zuul-ci.org
The htaccess rules are put into the main configuration file to avoid
having to create a directory and manage another file. We use a macro
to duplicate the rules and retain the old semantics of the http site
redirecting directly (as opposed to doing a extra 301 to
https://git.openstack.org first). This required adding "/" to the "^"
matches as it now runs in VirtualHost context; no functional change is
intended over the old sites.
This will require _acme-challenge CNAMEs to acme.opendev.org before
being merged.
testinfra is updated to exercise some redirects matching against the
results of the extant sites.
Change-Id: Iaa9d5dc2af3f5f8abc11c2312e4308b50f5fcd2b
files.openstack.org serves a view of /afs/openstack.org/, which is the
same as static.opendev.org. Add a serveralias for it and certificate.
Make static.openstack.org be consistent with opendev by showing the
same thing.
Change-Id: I4c492e3b02554a7c736c015790bd4cd5bb435a43
This is an alternative to Iccf24a72cf82592bae8c699f9f857aa54fc74f10
which removes the 404 scraping tool. It creates a zuul user and
enables login via the system-config per-project ssh key, and then runs
the 404 scraping script against it periodically.
Change-Id: I30467d791a7877b5469b173926216615eb57d035
This creates sites to serve
developer.openstack.org
docs.openstack.org
docs.opendev.org
docs.starlingx.io
which are all just static directories underneath /afs/openstack.org/.
This is currently done by files02.openstack.org, but will be better
served in the future by consolidating in ansible configuration on
static.opendev.org.
The following dns entries need to be made before merging to ensure the
certificates are provisioned
_acme-challenge.developer.openstack.org
_acme-challenge.docs.openstack.org
_acme-challenge.docs.opendev.org
_acme-challenge.docs.starlingx.io
Once done, we can merge and then cut-over the main DNS entries as we
like.
Since there are some follow-ons, I have not removed the puppet
configuration from files02.openstack.org. I think it's best we
migrate everything away from that and remove it in one lot.
Change-Id: I459a36f823a8868e6cc09e2b0d85f2fe05d69002
This adds the site to publish from
/afs/openstack.org/project/releases.openstack.org
Change-Id: Ia91deb9a51441ac9974137ed39fc5a185689a11c
Task: #37724
Story: #2006598
If you currently hit https://static.opendev.org you get redirected to
the default site, which is just the first site in alphabetical order,
which happens to be governance.openstack.org.
Add a 00-static.opendev.org.conf file so this is the default site. It
will just serve up the top-level afs directory.
Change-Id: Icdcee962b76545c12e84d4cadb0b60a68cabe38b
This migrates the afsmon script from puppet deploying on
mirror-update.openstack.org to ansible deploying on
mirror-update.opendev.org.
There is nothing particularly special and this just a straight install
with some minor dependencies. Since we have log publishing running on
the opendev.org server, we publish the update logs alongside the
others.
Change-Id: Ifa3b4d59f8d0fc23a4492e50348bab30766d5779
This is a migration of the current periodic "vos release" script to
mirror-update.opendev.org.
The current script is deployed by puppet and run by a cron job on
afsdb01.dfw.openstack.org.
My initial motivation for this was wanting to better track our release
of these various volumes. With tarballs and releases moving to AFS
publishing, we are going to want to track the release process more
carefully.
Initially, I wanted to send timing statistics to graphite so we could
build a dashboard and track the release times of all volumes. Because
this requires an additional libraries and since we are deprecating
puppet, further development there is unappealing and it would better
live in ansible.
Since I6c96f89c6f113362e6085febca70d58176f678e7 we have the ability to
call "vos release" with "-localauth" permissions via ssh on
mirror-update; this avoids various timeout issues (see the changelog
comment there for more details). So we do not need to run this script
directly on the afsdb server.
We are alreadying publishing mirror update logs from mirror-update,
and it would be good to also publish these release logs so anyone can
see if there are problems.
All this points to mirror-update.opendev.org being a good future home
for this script.
The script has been refactored some to
- have a no-op mode
- send timing stats for each volume release
- call "vos release" via the ssh mecahnism we created
- use an advisory lock to avoid running over itself
It runs from a virtualenv and it's logs are published via the same
mechanism as the mirror logs (slightly misnamed now).
Note this script is currently a no-op to test the deployment, running
and log publishing. A follow-up will disable the old job and make
this active.
Change-Id: I62ae941e70c7d58e00bc663a50d52e79dfa5a684
Add these hosts to static.opendev.org, serving from AFS. Note that
tarballs.openstack.org just redirects to static.opendev.org/openstack.
This should have no effect currently, it will only become live when we
switch DNS.
For more details see the thread at:
http://lists.openstack.org/pipermail/openstack-infra/2020-January/006584.html
Change-Id: Ie56fac17ffaa91ee55be986de636485a58125a02
Add a new review-dev server on the opendev domain with LE support
enabled.
Depends-On: https://review.opendev.org/705661
Change-Id: Ie32124cd617e9986602301f230e83bb138524fdf
This runs gerrit in a container on review-dev01 using podman.
Remove an unused web_server.py file that we found from copying it
from puppet to ansible.
Change-Id: I399d3cf8471bc8063022b0db0ff81718b2ee2941
The ssh config file is /.ssh/config (not ssh_config)
We are accepting the ed25519 key, not the ecdsa key, so fix that in
the known_hosts stanza.
Change-Id: If3a42a7872f5d5e7a2bf9c3b5184fb14d43e6a1a
Currently we don't have any logs from our gitea sshd processes because
sshd logs to syslog by default and /dev/log isn't in our containers. You
can ask sshd nicely to log to stderr instead with the -e flag which
docker will pick up and store for us.
Update the sshd command to include -e then use testinfra to check we
collect logs and they are accssible from docker.
Change-Id: Ib7d6d405554c3c30be410bc08c6fee7d4363b096
This introduces two new roles for managing the backup-server and hosts
that we wish to back up.
Firstly the "backup" role runs on hosts we wish to backup. This
generates and configures a separate ssh key for running bup and
installs the appropriate cron job to run the backup daily.
The "backup-server" job runs on the backup server (or, indeed
servers). It creates users for each backup host, accepts the remote
keys mentioned above and initalises bup. It is then ready to receive
backups from the remote hosts.
This eliminates a fairly long-standing requirement for manual setup of
the backup server users and keys; this section is removed from the
documentation.
testinfra coverage is added.
Change-Id: I9bf74df351e056791ed817180436617048224d2c
The options to disable installing suggests and recommended packages
has been in diskimage-builder based images for a long time [1].
However we have no setting for it in our base-server role, meaning
that when launching nodes from cloud-provider images we can be out of
sync on this option.
I6d69ac0bd2ade95fede33c5f82e7df218da9458b is an example where packages
pulled in by suggestions can fail (arguably a packaging issue, but
anyway...)
By enabling this here, we make our control plane servers homogenous
with our diskimage-builder based testing nodes, which is better for
general sanity. Overall it gives us more control over what's
installed.
[1] https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/dpkg/pre-install.d/00-disable-apt-recommends
As I6d69ac0bd2ade95fede33c5f82e7df218da9458b showed, installing
suggested or recommended packages might result in
Change-Id: Id6dcc158944a46fc0ae03b6f1ff372dacd67c2e6
Add new IP addresses to inventory for the rebuild, but don't
reactivate it in the haproxy pools yet.
Note this switches the gitea testing to use a host called gitea99 so
that it doesn't conflict with our changes of the production hosts.
Change-Id: I9779e16cca423bcf514dd3a8d9f14e91d43f1ca3