inaugust.com/src/zuulv3/overview.rst

988 lines
20 KiB
ReStructuredText

. display in 68x24
.. display in 88x24
.. pygments yaml? (only file breaks (---) tinted)
.. slide on high level v3 changes
.. slide on nodepool
.. transition:: dissolve
:duration: 0.4
Test Slide
==========
.. hidetitle::
.. ansi:: images/testslide.ans
Preshow
=======
.. hidetitle::
.. ansi:: images/cursor.ans images/cursor2.ans
Zuul
====
.. hidetitle::
.. ansi:: images/title.ans
This Talk
=========
* In git: https://opendev.org/inaugust/inaugust.com/
.. code:: bash
git clone https://opendev.org/inaugust/inaugust.com
cd src/zuulv3
* Then:
.. code:: bash
cat overview.rst
* Or:
.. code:: bash
pip install presentty
presentty overview.rst
Red Hat
=======
.. hidetitle::
.. container:: handout
i work for
.. ansi:: images/redhat.ans
Ansible
=======
.. hidetitle::
.. ansi:: images/ansible.ans
OpenDev
=======
::
"most insane CI infrastructure I've ever been a part of"
-- Alex Gaynor
"like the SpaceX of CI"
-- Emily Dunham
Zuul
====
.. hidetitle::
.. ansi:: images/zuul.ans
What Zuul Does
==============
* "Speculative Future State"
* gated changes
* one or more git repositories
* integrated deliverable
* testing like deployment
Underlying Philosophy
=====================
* All changes flow through code review
* Changes only land if they pass all tests
* End-to-end integration testing is essential
* Computers are cheaper than humans
Ramifications of Philosophy
===========================
* No direct push access for anyone
* Software should be installable from source
* Testing should be automated and repeatable
* Developers write tests with their patches
* Code always works
Getting to Gating
=================
No Tests / Manual Tests
=======================
* No test automation exists or ...
* Developer runs test suite before pushing code
* Prone to developer skipping tests for "trivial" changes
* Doesn't scale organizationally
Periodic Testing
================
* Developers push changes directly to shared branch
* CI system runs tests from time to time - report if things still work
* "Who broke the build?"
* Leads to hacks like NVIE model
Post-Merge Testing
==================
* Developers push changes directly to shared branch
* CI system is triggered by push - reports if push broke something
* Frequently batched / rolled up
* Easier to diagnose which change broke things
* Reactive - the bad changes are already in
Pre-Review Testing
==================
* Changes are pushed to code review (Gerrit Change, GitHub PR, etc)
* CI system is triggered by code review change creation
* Test results inform review decisions
* Proactive - testing code before it lands
* Reviewers can get bored waiting for tests
* Only tests code as written, not potential result of merging code
Gating
======
* Changes are pushed to code review
* Gating system is triggered by code review approval
* Gating system merges code IFF tests pass
* Proactive - testing code before it lands
* Future state resulting from merge of code is tested
* Reviewers can fire-and-forget safely
Mix and Match
=============
* Zuul supports all of those modes
* Zuul users frequently combine them
* Run pre-review (check) and gating (gate) on each change
* Post-merge/post-tag for release/publication automation
* Periodic for catching bitrot
Multi-repository integration
============================
* Multiple source repositories are needed for deliverable
* Future state to be tested is the future state of all involved repos
To test proposed future state
=============================
* Get tip of each project. Merge appropriate change(s). Test.
* Changes must be serialized, otherwise state under test is invalid.
* Integrated deliverable repos share serialized queue
Speculative Execution
=====================
* Correct parallel processing of serialized future states
* Create virtual serial queue of changes for each deliverable
* Assume each change will pass its tests
* Test successive changes with previous changes applied to starting state
Nearest Non-Failing Change
==========================
(aka 'The Jim Blair Algorithm')
* If a change fails, move it aside
* Cancel all test jobs behind it in the queue
* Reparent queue items on the nearest non-failing change
* Restart tests with new state
Zuul Simulation
===============
.. transition:: pan
.. container:: handout
* todo
.. ansi:: images/zsim-00.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-01.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-02.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-03.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-04.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-05.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-06.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-07.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-08.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-09.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-10.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-11.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-12.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-13.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-14.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-15.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-16.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-17.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-18.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-19.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-20.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-21.ans
Zuul Simulation
===============
.. transition:: cut
.. container:: handout
* todo
.. ansi:: images/zsim-22.ans
Lock Step Changes
=================
* Circular Dependencies are not supported on purpose
* Rolling upgrades across interdependent services
* HOWEVER - many valid use cases (go/rust/c++) - support will be coming
Live Configuration Changes
==========================
.. container:: handout
Zuul is a distributed system, with a distributed configuration.
.. code:: yaml
- tenant:
name: openstack
source:
gerrit:
config-repos:
- opendev/project-config
project-repos:
- zuul/zuul-jobs
- zuul/zuul
- zuul/nodepool
- ansible/ansible
- openstack/openstacksdk
Zuul Startup
============
* Read config file
Zuul Startup
============
* Read config file
* Ask mergers for branches of each repo
.. ansi:: images/startup1.ans
Zuul Startup
============
* Read config file
* Ask mergers for branches of each repo
* Ask mergers for .zuul.yaml for each branch
of each repo
.. ansi:: images/startup2.ans
When .zuul.yaml Changes
=======================
.. container:: progressive
* Zuul looks for changes to .zuul.yaml
* Asks mergers for updated content
* Splices into configuration used for that change
* Works with cross-repo dependencies
("This change depends on a change to the job definition")
Explicit Cross-Project Dependencies
===================================
* Developers can mark changes as being dependent
* Depends-On: footer - in commit or PR
* Zuul uses depends-on when constructing virtual serial queue
* Will not merge changes in gate before depends-on changes
* Works cross-repo AND cross-source
Depends-On Example
==================
* Service 'nova' talks to service 'ironic'
* Currently using 'python-ironicclient'
* Want to replace python-ironicclient with openstacksdk:
* https://review.openstack.org/643664
* Need some plumbing in nova first:
* https://review.openstack.org/642899
* That change "Depends-On" a change to openstacksdk
Depends-On Example - openstacksdk
=================================
* In openstacksdk, need a new method to extract config differently
* https://review.openstack.org/643601
* The nova plumbing change adds this:
::
Depends-On: https://review.openstack.org/643601
Depends-On Example - keystoneauth
=================================
* openstacksdk uses 'keystoneauth' library to make REST calls
* Config extraction change wants a new helper method in keystoneauth
* https://review.openstack.org/644251
* openstacksdk change adds:
::
Depends-On: https://review.openstack.org/644251
Depends-On Example - In the Gate
================================
* When Zuul prepares git repos for the Ironic nova change:
* Tip of nova, plus nova plumbing change, plus nova ironic change
* Tip of openstacksdk, plus config method change
* Tip of keystoneauth, plus helper method change
* Developers iterate on the nova service change
* BEFORE finalizing and releasing keystoneauth and openstacksdk changes
Zuul Architecture
=================
We used to call "microservices" "distributed"
* Zuul is comprised of several services (mostly python3)
* zuul-scheduler
* zuul-executor
* zuul-merger
* zuul-web
* zuul-dashboard (javascript/react)
* zuul-fingergw
* zuul-proxy (c++)
* RDBMS
* Gearman
* Zookeeper
* Nodepool
Zuul Architecture
=================
.. ansi:: images/architecture.ans
Where Does Job Content Run?
===========================
Nodepool
========
* A separate program that works very closely with *Zuul*
* *Zuul* requires *Nodepool* but *Nodepool* can be used independently
* Creates and destroys zero or more node resources
* Resources can include VMs, Containers, COE contexts or Bare Metals
* Static driver for allocating pre-existing nodes to jobs
* Optionally periodically builds images and uploads to clouds
Nodepool Launcher
=================
Where build nodes should come from
* OpenStack
* Static
* Kubernetes Pod
* Kubernetes Namespace
* AWS
In work / coming soon:
* Azure
* GCE
What about job content?
=======================
* Written in Ansible
* Ansible is excellent at running one or more tasks in one or more places
* The answer to "how do I" is almost always "Ansible"
What Zuul Does
==============
* Listens for code events
* Prepares appropriate job config and git repo states
* Requests nodes for test jobs from *Nodepool*
* Runs user-defined Ansible playbooks with nodes in an inventory
* Collects/reports results
* Potentially merges change
Jobs
====
* Jobs define test node needs
* Metadata defined in Zuul's configuration
* Execution content in Ansible
* Jobs may be defined centrally or in the repo being tested
* Jobs have contextual variants that simplify configuration
Job
===
.. code:: yaml
- job:
name: base
parent: null
description: |
The base job for Zuul.
timeout: 1800
nodeset:
nodes:
- name: primary
label: ubuntu-bionic
pre-run: playbooks/base/pre.yaml
post-run:
- playbooks/base/post-ssh.yaml
- playbooks/base/post-logs.yaml
secrets:
- site_logs
Simple Job
==========
.. code:: yaml
- job:
name: tox
pre-run: playbooks/setup-tox.yaml
run: playbooks/tox.yaml
post-run: playbooks/fetch-tox-output.yaml
Simple Job Inheritance
======================
.. code:: yaml
- job:
name: tox-py36
parent: tox
vars:
tox_envlist: py36
Inheritance Works Like An Onion
===============================
* pre-run playbooks run in order of inheritance
* run playbook of job runs
* post-run playbooks run in reverse order of inheritance
* If pre-run playbooks fail, job is re-tried
* All post-run playbooks run - as far as pre-run playbooks got
Inheritance Example
===================
For tox-py36 job
* base pre-run playbooks/base/pre.yaml
* tox pre-run playbooks/setup-tox.yaml
* tox run playbooks/tox.yaml
* tox post-run playbooks/fetch-tox-output.yaml
* base post-run playbooks/base/post-ssh.yaml
* base post-run playbooks/base/post-logs.yaml
Simple Job Variant
==================
.. code:: yaml
- job:
name: tox-py27
branches: stable/mitaka
nodeset:
- name: ubuntu-trusty
label: ubuntu-trusty
Nodesets for Multi-node Jobs
============================
.. code:: yaml
- nodeset:
name: ceph-cluster
nodes:
- name: controller
label: centos-7
- name: compute1
label: fedora-28
- name: compute2
label: fedora-28
groups:
- name: ceph-osd
nodes:
- controller
- name: ceph-monitor
nodes:
- controller
- compute1
- compute2
Multi-node Job
==============
* nodesets are provided to Ansible for jobs in inventory
.. code:: yaml
- job:
name: ceph-multinode
nodeset: ceph-cluster
run: playbooks/install-ceph.yaml
* Creates ansible inventory:
::
controller ansible_host=1.2.3.4
compute1 ansible_host=1.2.3.5
compute2 ansible_host=1.2.3.6
[ceph-osd]
controller
[ceph-monitor]
controller
compute1
compute2
Multi-node Ceph Job Content
===========================
.. code:: yaml
- hosts: all
roles:
- install-ceph
- hosts: ceph-osd
roles:
- start-ceph-osd
- hosts: ceph-monitor
roles:
- start-ceph-monitor
- hosts: all
roles:
- do-something-interesting
Project With Central and Local Config
=====================================
.. code:: yaml
# In opendev.org/openstack-infra/project-config:
- project:
name: openstack/nova
templates:
- openstack-tox-jobs
.. code:: yaml
# In opendev.org/openstack/nova/.zuul.yaml:
- project:
check:
- nova-placement-functional-devstack
zuul-jobs standard library
==========================
* https://opendev.org/openstack-infra/zuul-jobs
* Repo containing general purpose job definitions
* Add the git repo directly to a local Zuul config
Project with Job Dependencies
=============================
.. code:: yaml
- project:
release:
jobs:
- build-artifacts
- upload-tarball:
dependencies: build-artifacts
- upload-pypi:
dependencies: build-artifacts
- notify-mirror:
dependencies:
- upload-tarball
- upload-pypi
Secrets
=======
* Inspired by Kubernetes Secrets API
* Projects can add named encrypted secrets to their .zuul.yaml file
* Jobs can request to use secrets by name
* Jobs using secrets are not reconfigured speculatively
* Secrets can only be used by the same project they are defined in
* Public key per project:
``{{ zuul_url }}/{{ tenant }}/{{ project }}.pub``
::
GET https://zuul.openstack.org/openstack-infra/shade.pub
Secret Example (note, no admins had to enable this)
===================================================
.. code:: yaml
# In opendev.org/openstack/loci/.zuul.yaml:
- secret:
name: loci_docker_login
data:
user: loci-username
password: !encrypted/pkcs1-oaep
- gUEX4eY3JAk/Xt7Evmf/hF7xr6HpNRXTibZjrKTbmI4QYHlzEBrBbHey27Pt/eYvKKeKw
hk8MDQ4rNX7ZK1v+CKTilUfOf4AkKYbe6JFDd4z+zIZ2PAA7ZedO5FY/OnqrG7nhLvQHE
5nQrYwmxRp4O8eU5qG1dSrM9X+bzri8UnsI7URjqmEsIvlUqtybQKB9qQXT4d6mOeaKGE
5h6Ydkb9Zdi4Qh+GpCGDYwHZKu1mBgVK5M1G6NFMy1DYz+4NJNkTRe9J+0TmWhQ/KZSqo
4ck0x7Tb0Nr7hQzV8SxlwkaCTLDzvbiqmsJPLmzXY2jry6QsaRCpthS01vnj47itoZ/7p
taH9CoJ0Gl7AkaxsrDSVjWSjatTQpsy1ub2fuzWHH4ASJFCiu83Lb2xwYts++r8ZSn+mA
hbEs0GzPI6dIWg0u7aUsRWMOB4A+6t2IOJibVYwmwkG8TjHRXxVCLH5sY+i3MR+NicR9T
IZFdY/AyH6vt5uHLQDU35+5n91pUG3F2lyiY5aeMOvBL05p27GTMuixR5ZoHcvSoHHtCq
7Wnk21iHqmv/UnEzqUfXZOque9YP386RBWkshrHd0x3OHUfBK/WrpivxvIGBzGwMr2qAj
/AhJsfDXKBBbhGOGk1u5oBLjeC4SRnAcIVh1+RWzR4/cAhOuy2EcbzxaGb6VTM=
Secret Example
==============
.. code:: yaml
# In opendev.org/openstack/loci/.zuul.yaml:
- job:
name: publish-loci-cinder
parent: loci-cinder
post-run: playbooks/push
secrets:
- loci_docker_login
# In opendev.org/openstack/loci/playbooks/push.yaml:
- hosts: all
tasks:
- include_vars: vars.yaml
- name: Push project to DockerHub
block:
- command: docker login -u {{ loci_docker_login.user }} -p {{ loci_docker_login.password }}
no_log: True
- command: docker push openstackloci/{{ project }}:{{ branch }}-{{ item.name }}
with_items: "{{ distros }}"
Speculative Conatiner Images
============================
* Gating applied to continuously deployed container images
* Build and test images that depend on other images
* Build and test deployments comprising multiple images
* Without publishing to final location
* Publish the actual image that was built in the gate
Zuul is not New
===============
* Has been in Production for OpenStack for Six Years
* Zuul is now a top-level effort of OpenStack Foundation
* Zuul v3 first release where not-OpenStack is first-class use case
OpenDev - Largest Known Zuul
============================
* 2KJPH (2,000 jobs per hour)
* Build Nodes from 16 Regions of 5 Public and 3 Private OpenStack Clouds
* Rackspace, Internap, OVH, Vexxhost, CityCloud
* Linaro (ARM), Limestone, Packethost
* 10,000 changes merged per month
Not just for OpenStack
======================
* BMW (control plane in OpenShift)
* GoDaddy (control plane in private Kubernetes)
* GoodMoney (control plane in EKS, adding GKE)
* Le Bon Coin
* Easystack
* TungstenFabric
* OpenLab
* Red Hat
* others ...
Code Review Systems
===================
* Gerrit
* GitHub (Public and Enterprise)
In work / coming soon:
* Pagure
* Gitea
Commonly Requested:
* GitLab
* Bitbucket
Support for non-git
===================
.. container:: progressive
* Nope
* helix4git may work for perforce, but is untested
Installation of Software
========================
Ways to Install Zuul
====================
* Containers: https://hub.docker.com/_/zuul/
* Windmill: http://opendev.org/openstack/windmill
* Software Factory: https://softwarefactory-project.io/
* Puppet: http://opendev.org/openstack-infra/puppet-zuul
Zuul Containers
===============
* Published on every commit
* Application/Process containers
* Config / Data should be bind-mounted in
zuul/zuul-executor
==================
* In k8s, zuul-executor must be run privileged
* Uses bubblewrap for unprivileged sanboxing
* Restriction may be lifted in the future
Release Management
==================
* Zuul is run Continuously Delivered and Deployed upstream
* Some users deploy Zuul with Zuul
* Releases are tagged from code run for OpenDev
* There is no intent to have a 'stable' release
* 'stable' is a synonym for "old and buggy"
zuul/zuul-scheduler
===================
* SPOF
* We're working on it - HA/Distributed scheduler is coming
* Recommend running scheduler from tags
Quick Start
===========
* docker-compose
https://zuul-ci.org/docs/zuul/admin/quick-start.html
Important Links
===============
* https://zuul-ci.org/
* https://zuul-ci.org/docs/zuul
* https://zuul-ci.org/docs/zuul-jobs/
* freenode:#zuul
* https://opendev.org/zuul (https://git.zuul-ci.org/cgit/zuul)
Questions
=========
.. ansi:: images/questions.ans
Presentty
=========
.. hidetitle::
.. transition:: pan
.. figlet:: Presentty
* Console presentations written in reStructuredText
* Cross-fade, pan, tilt, cut transitions
* https://pypi.python.org/pypi/presentty