. display in 68x24 .. display in 88x24
dissolve
images/testslide.ans
images/cursor.ans images/cursor2.ans
images/title.ans
images/redhat.ans
images/openstack.ans
"most insane CI infrastructure I've ever been a part of"
-- Alex Gaynor
"like the SpaceX of CI"
-- Emily Dunham
images/zuul.ans
images/ansible.ans
- "Speculative Future State"
- multiple repositories
- integrated deliverable
- gated commits
- open tooling
- nobody is special
- testing like deployment
- Federated
- Distributed
- Large
- Open
- Not Alone
- Hundreds of involved companies
- No 'main' company
- "Decisions are made by those who show up"
- Union of priorities/use cases
- No company can appoint people to positions in the project
- The project cannot fire anyone
- Variable background of contributors
- Heavy reliance on consensus
- There is no office
- Contributor base is global
- Multitude of contributor backgrounds
- Tooling must empower all contributors, regardless of background, skill level or cultural context
- Heavy preference for text-based communication
- Cannot assume US-centric needs or solutions
- "Accept patches from random people on the internet"
- Contributors (~2k in any given 6 month period)
- Changes
- Code Repositories (2082 as of this morning)
- 2KJPH (2,000 jobs per hour)
- Build Nodes from 16 Regions of 5 Public and 3 Private OpenStack Clouds
- Rackspace, Internap, OVH, Vexxhost, CityCloud
- Linaro (ARM), Limestone, Packethost
- 10,000 changes merged per month
- 2KJPH (2,000 jobs per hour)
- Build Nodes from 16 Regions of 5 Public and 3 Private OpenStack Clouds
- Rackspace, Internap, OVH, Vexxhost, CityCloud
- Linaro (ARM), Limestone, Packethost
- 10,000 changes merged per month
- By comparison, our friends at the amazing project Ansible received 13,000 changes and had merged 8,000 of them in its first 4 years.
- Empower teams to take care of themselves (distributed)
- Efficiency gained from shared solutions (centralized)
- Zuul supports per-repo config, central config, and multiple tenants
- One Zuul install is all you need for all of BMW or ExxonMobil or IBM
- Open Source (we don't hold back Enterprise features, we don't cripple things)
- Open Design (design process open to all, decisions are not made inside company doors)
- Open Development (public source code, public code review, all code is reviewed and gated)
- Open Community (lazy consensus, democratic leadership from participants, public logged meetings in IRC, public archived mailing lists)
- Tooling is not exempt
- Fifth Open - Open Operations - all infrastructure is run in the Open, via GitOps
- Dependencies (libvirt/kvm/xen, mysql/pg, rabbit, python/javascript, ceph/gluster, ansible/salt/puppet/chef, ovs/odl)
- Adjacencies (kubernetes, ansible, terraform, opnfv, spinnaker)
- Vendors (plugins, products, services, distros)
- Code Review - nobody has direct commit/push access
- 3rd-Party CI for vendors
- Gated Commits
Hack Review Test
========= ========== ==========
push approve
+-------------+ +-------------+ | | | |
+------+--+ +--v----+--+ +--v-------+ | | | | | | | $EDITOR | | Gerrit | | Zuul | | | | | | | +------^--+ +--+----^--+ +--+-------+ | | | | +-------------+ +-------------+ clone merge
explain patch upload, zuul runs, test results displayed in gerrit this is all the interface to zuul users need to see
switch to actual gertty screenshot
also show zuul status page
but zuul is doing a lot of work behind the scenes, and if you look closer, this is what you see
images/color-gertty.ans
- Has been in Production for OpenStack for Six Years
- Zuul v3 first release where not-OpenStack is first-class use case
- Zuul is now an OpenStack Foundation Pilot Project
- Zuul is in production for OpenStack (in OpenStack VMs)
Also running at:
- BMW (control plane in OpenShift)
- GoDaddy (control plane in Kubernetes)
- GoodMoney (control plane in EKS, adding GKE)
- Le Bon Coin
- Easystack
- TungstenFabric
- OpenLab
- Red Hat
- others ...
- Listens for code events
- Prepares appropriate job config and git repo states
- Allocates nodes for test jobs
- Pushes git repo states to nodes
- Runs user-defined Ansible playbooks
- Collects/reports results
- Potentially merges change
- No test automation exists or ...
- Developer runs test suite before pushing code
- Prone to developer skipping tests for "trivial" changes
- Doesn't scale organizationally
- Developers push changes directly to shared branch
- CI system runs tests from time to time - report if things still work
- "Who broke the build?"
- Leads to hacks like NVIE model
- Developers push changes directly to shared branch
- CI system is triggered by push - reports if push broke something
- Frequently batched / rolled up
- Easier to diagnose which change broke things
- Reactive - the bad changes are already in
- Changes are pushed to code review (Gerrit Change, GitHub PR, etc)
- CI system is triggered by code review change creation
- Test results inform review decisions
- Proactive - testing code before it lands
- Reviewers can get bored waiting for tests
- Only tests code as written, not potential result of merging code
- Changes are pushed to code review
- Gating system is triggered by code review approval
- Gating system merges code IFF tests pass
- Proactive - testing code before it lands
- Future state resulting from merge of code is tested
- Reviewers can fire-and-forget safely
- Zuul supports all of those modes
- Zuul users frequently combine them
- Run pre-review (check) and gating (gate) on each change
- Post-merge/post-tag for release/publication automation
- Periodic for catching bitrot
- Multiple source repositories are needed for deliverable
- Future state to be tested is the future state of all involved repos
- Get tip of each project. Merge appropriate change(s). Test.
- Changes must be serialized, otherwise state under test is invalid.
- Integrated deliverable repos share serialized queue
- Correct parallel processing of serialized future states
- Create virtual serial queue of changes for each deliverable
- Assume each change will pass its tests
- Test successive changes with previous changes applied to starting state
(aka 'The Jim Blair Algorithm')
- If a change fails, move it aside
- Cancel all test jobs behind it in the queue
- Reparent queue items on the nearest non-failing change
- Restart tests with new state
pan
images/zsim-00.ans
cut
images/zsim-01.ans
cut
images/zsim-02.ans
cut
images/zsim-03.ans
cut
images/zsim-04.ans
cut
images/zsim-05.ans
cut
images/zsim-06.ans
cut
images/zsim-07.ans
cut
images/zsim-08.ans
cut
images/zsim-09.ans
cut
images/zsim-10.ans
cut
images/zsim-11.ans
cut
images/zsim-12.ans
cut
images/zsim-13.ans
cut
images/zsim-14.ans
cut
images/zsim-15.ans
cut
images/zsim-16.ans
cut
images/zsim-17.ans
cut
images/zsim-18.ans
cut
images/zsim-19.ans
cut
images/zsim-20.ans
cut
images/zsim-21.ans
cut
images/zsim-22.ans
- Developers can mark changes as being dependent
- Depends-On: footer - in commit or PR
- Zuul uses depends-on when constructing virtual serial queue
- Will not merge changes in gate before depends-on changes
- Works cross-repo AND cross-source
- Circular Dependencies are not supported on purpose
- Rolling upgrades across interdependent services
- HOWEVER - many valid use cases (go/rust/c++) - support will be coming
Zuul is a distributed system, with a distributed configuration.
- tenant:
name: openstack
source:
gerrit:
config-repos:
- opendev/project-config
project-repos:
- zuul/zuul-jobs
- zuul/zuul
- zuul/nodepool
- ansible/ansible
- openstack/openstacksdk
images/startup1.ans
Ask mergers for .zuul.yaml for each branch
of each repo
images/startup2.ans
Works with cross-repo dependencies
("This change depends on a change to the job definition")
Zuul is comprised of several services (mostly python3)
- zuul-scheduler
- zuul-executor
- zuul-merger
- zuul-web
- zuul-fingergw
- zuul-dashboard (javascript/react)
- zuul-proxy (c++)
- nodepool-launcher
- nodepool-builder
- RDBMS
- Gearman
- Zookeeper
- A separate service that works very closely with Zuul
- Zuul requires Nodepool but Nodepool can be used independently
- Creates and destroys zero or more node resources
- Resources can include VMs, Containers, COE contexts or Bare Metals
- Static driver for allocating pre-existing nodes to jobs
- Optionally periodically builds images and uploads to clouds
Where build nodes should come from
- OpenStack
- Static
- Kubernetes Pod
- Kubernetes Namespace
- AWS
In work / coming soon:
- Azure
- GCE
- Written in Ansible
- Ansible is excellent at running one or more tasks in one or more places
- The answer to "how do I" is almost always "Ansible"
https://review.openstack.org/#/c/648838 http://logs.openstack.org/38/648838/7/check/zuul-build-dashboard/96253e1/npm/html/
images/questions.ans
tilt
Configuration
- job:
name: base
parent: null
description: |
The base job for Zuul.
timeout: 1800
nodeset:
nodes:
- name: primary
label: ubuntu-xenial
pre-run: playbooks/base/pre.yaml
post-run:
- playbooks/base/post-ssh.yaml
- playbooks/base/post-logs.yaml
secrets:
- site_logs
- job:
name: tox
pre-run: playbooks/setup-tox.yaml
run: playbooks/tox.yaml
post-run: playbooks/fetch-tox-output.yaml
- pre-run playbooks run in order of inheritance
- run playbook of job runs
- post-run playbooks run in reverse order of inheritance
- If pre-run playbooks fail, job is re-tried
- All post-run playbooks run - as far as pre-run playbooks got
For tox-py36 job
- base pre-run playbooks/base/pre.yaml
- tox pre-run playbooks/setup-tox.yaml
- tox run playbooks/tox.yaml
- tox post-run playbooks/fetch-tox-output.yaml
- base post-run playbooks/base/post-ssh.yaml
- base post-run playbooks/base/post-logs.yaml
- nodeset:
name: ceph-cluster
nodes:
- name: controller
label: centos-7
- name: compute1
label: fedora-28
- name: compute2
label: fedora-28
groups:
- name: ceph-osd
nodes:
- controller
- name: ceph-monitor
nodes:
- controller
- compute1
- compute2
- hosts: all
roles:
- install-ceph
- hosts: ceph-osd
roles:
- start-ceph-osd
- hosts: ceph-monitor
roles:
- start-ceph-monitor
- hosts: all
roles:
- do-something-interesting
zuul@ubuntu-xenial:~$ find /home/zuul/src -mindepth 3 -maxdepth 3 -type d
/home/zuul/src/git.openstack.org/openstack-infra/shade
/home/zuul/src/git.openstack.org/openstack/keystoneauth
/home/zuul/src/git.openstack.org/openstack/os-client-config
/home/zuul/src/github.com/ansible/ansible
- Specify a set of jobs for each pipeline
- project:
check:
jobs:
- openstack-tox-py27
- openstack-tox-py35
- openstack-tox-docs
gate:
jobs:
- openstack-tox-py27
- openstack-tox-py35
- openstack-tox-docs
- project:
check:
jobs:
- openstack-tox-py27
- openstack-tox-py35
- openstack-tox-py36:
voting: false
- openstack-tox-docs
gate:
jobs:
- openstack-tox-py27
- openstack-tox-py35
- openstack-tox-docs
- project:
check:
jobs:
- openstack-tox-py27
- openstack-tox-py35
- openstack-tox-py36:
voting: false
- openstack-tox-docs:
files: '^docs/.*$'
- project:
check:
jobs:
- openstack-tox-py27:
nodeset:
- name: centos-7
label: centos-7
- openstack-tox-py27:
branches: stable/newton
nodeset:
- name: ubuntu-trusty
label: ubuntu-trusty
- openstack-tox-py35
- openstack-tox-py36:
voting: false
- openstack-tox-docs:
files: '^docs/.*$'
# In git.openstack.org/openstack-infra/project-config:
- project:
name: openstack/nova
templates:
- openstack-tox-jobs
# In git.openstack.org/openstack/nova/.zuul.yaml:
- project:
check:
- nova-placement-functional-devstack
- project:
release:
jobs:
- build-artifacts
- upload-tarball:
dependencies: build-artifacts
- upload-pypi:
dependencies: build-artifacts
- notify-mirror:
dependencies:
- upload-tarball
- upload-pypi
# Changes that run through devstack-tempest are likely to have an impact on
# the devstack part of the job, so we keep devstack in the main play to
# avoid zuul retrying on legitimate failures.
- hosts: all
roles:
- run-devstack
# We run tests only on one node, regardless how many nodes are in the system
- hosts: tempest
roles:
- setup-tempest-run-dir
- setup-tempest-data-dir
- acl-devstack-files
- run-tempest
If you use Ansible for deployment, your test and deployment processes and playbooks are the same
# In git.openstack.org/openstack-infra/project-config/roles/legacy-install-afs-with-puppet/tasks/main.yaml
- name: Install puppet
shell: ./install_puppet.sh
args:
chdir: "{{ ansible_user_dir }}/src/git.openstack.org/openstack-infra/system-config"
environment:
# Skip setting up pip, our images have already done this.
SETUP_PIP: "false"
become: yes
- name: Copy manifest
copy:
src: manifest.pp
dest: "{{ ansible_user_dir }}/manifest.pp"
- name: Run puppet
puppet:
manifest: "{{ ansible_user_dir }}/manifest.pp"
become: yes
{{ zuul_url }}/{{ tenant }}/{{ project }}.pub
# In git.openstack.org/openstack/loci/.zuul.yaml:
- secret:
name: loci_docker_login
data:
user: loci-username
password: !encrypted/pkcs1-oaep
- gUEX4eY3JAk/Xt7Evmf/hF7xr6HpNRXTibZjrKTbmI4QYHlzEBrBbHey27Pt/eYvKKeKw
hk8MDQ4rNX7ZK1v+CKTilUfOf4AkKYbe6JFDd4z+zIZ2PAA7ZedO5FY/OnqrG7nhLvQHE
5nQrYwmxRp4O8eU5qG1dSrM9X+bzri8UnsI7URjqmEsIvlUqtybQKB9qQXT4d6mOeaKGE
5h6Ydkb9Zdi4Qh+GpCGDYwHZKu1mBgVK5M1G6NFMy1DYz+4NJNkTRe9J+0TmWhQ/KZSqo
4ck0x7Tb0Nr7hQzV8SxlwkaCTLDzvbiqmsJPLmzXY2jry6QsaRCpthS01vnj47itoZ/7p
taH9CoJ0Gl7AkaxsrDSVjWSjatTQpsy1ub2fuzWHH4ASJFCiu83Lb2xwYts++r8ZSn+mA
hbEs0GzPI6dIWg0u7aUsRWMOB4A+6t2IOJibVYwmwkG8TjHRXxVCLH5sY+i3MR+NicR9T
IZFdY/AyH6vt5uHLQDU35+5n91pUG3F2lyiY5aeMOvBL05p27GTMuixR5ZoHcvSoHHtCq
7Wnk21iHqmv/UnEzqUfXZOque9YP386RBWkshrHd0x3OHUfBK/WrpivxvIGBzGwMr2qAj
/AhJsfDXKBBbhGOGk1u5oBLjeC4SRnAcIVh1+RWzR4/cAhOuy2EcbzxaGb6VTM=
# In git.openstack.org/openstack/loci/.zuul.yaml:
- job:
name: publish-loci-cinder
parent: loci-cinder
post-run: playbooks/push
secrets:
- loci_docker_login
# In git.openstack.org/openstack/loci/playbooks/push.yaml:
- hosts: all
tasks:
- include_vars: vars.yaml
- name: Push project to DockerHub
block:
- command: docker login -u {{ loci_docker_login.user }} -p {{ loci_docker_login.password }}
no_log: True
- command: docker push openstackloci/{{ project }}:{{ branch }}-{{ item.name }}
with_items: "{{ distros }}"
images/questions.ans
pan
Presentty