inaugust.com/src/zuulv3/overview.rst

20 KiB

. display in 68x24 .. display in 88x24

dissolve

Test Slide

images/testslide.ans

Preshow

images/cursor.ans images/cursor2.ans

Zuul

images/title.ans

This Talk

git clone https://opendev.org/inaugust/inaugust.com
cd src/zuulv3
  • Then:
cat overview.rst
  • Or:
pip install presentty
presentty overview.rst

Red Hat

images/redhat.ans

Ansible

images/ansible.ans

OpenDev

"most insane CI infrastructure I've ever been a part of"

  -- Alex Gaynor

"like the SpaceX of CI"

  -- Emily Dunham

Zuul

images/zuul.ans

What Zuul Does

  • "Speculative Future State"
  • gated changes
  • one or more git repositories
  • integrated deliverable
  • testing like deployment

Underlying Philosophy

  • All changes flow through code review
  • Changes only land if they pass all tests
  • End-to-end integration testing is essential
  • Computers are cheaper than humans

Ramifications of Philosophy

  • No direct push access for anyone
  • Software should be installable from source
  • Testing should be automated and repeatable
  • Developers write tests with their patches
  • Code always works

Getting to Gating

No Tests / Manual Tests

  • No test automation exists or ...
  • Developer runs test suite before pushing code
  • Prone to developer skipping tests for "trivial" changes
  • Doesn't scale organizationally

Periodic Testing

  • Developers push changes directly to shared branch
  • CI system runs tests from time to time - report if things still work
  • "Who broke the build?"
  • Leads to hacks like NVIE model

Post-Merge Testing

  • Developers push changes directly to shared branch
  • CI system is triggered by push - reports if push broke something
  • Frequently batched / rolled up
  • Easier to diagnose which change broke things
  • Reactive - the bad changes are already in

Pre-Review Testing

  • Changes are pushed to code review (Gerrit Change, GitHub PR, etc)
  • CI system is triggered by code review change creation
  • Test results inform review decisions
  • Proactive - testing code before it lands
  • Reviewers can get bored waiting for tests
  • Only tests code as written, not potential result of merging code

Gating

  • Changes are pushed to code review
  • Gating system is triggered by code review approval
  • Gating system merges code IFF tests pass
  • Proactive - testing code before it lands
  • Future state resulting from merge of code is tested
  • Reviewers can fire-and-forget safely

Mix and Match

  • Zuul supports all of those modes
  • Zuul users frequently combine them
  • Run pre-review (check) and gating (gate) on each change
  • Post-merge/post-tag for release/publication automation
  • Periodic for catching bitrot

Multi-repository integration

  • Multiple source repositories are needed for deliverable
  • Future state to be tested is the future state of all involved repos

To test proposed future state

  • Get tip of each project. Merge appropriate change(s). Test.
  • Changes must be serialized, otherwise state under test is invalid.
  • Integrated deliverable repos share serialized queue

Speculative Execution

  • Correct parallel processing of serialized future states
  • Create virtual serial queue of changes for each deliverable
  • Assume each change will pass its tests
  • Test successive changes with previous changes applied to starting state

Nearest Non-Failing Change

(aka 'The Jim Blair Algorithm')

  • If a change fails, move it aside
  • Cancel all test jobs behind it in the queue
  • Reparent queue items on the nearest non-failing change
  • Restart tests with new state

Zuul Simulation

pan

  • todo

images/zsim-00.ans

Zuul Simulation

cut

  • todo

images/zsim-01.ans

Zuul Simulation

cut

  • todo

images/zsim-02.ans

Zuul Simulation

cut

  • todo

images/zsim-03.ans

Zuul Simulation

cut

  • todo

images/zsim-04.ans

Zuul Simulation

cut

  • todo

images/zsim-05.ans

Zuul Simulation

cut

  • todo

images/zsim-06.ans

Zuul Simulation

cut

  • todo

images/zsim-07.ans

Zuul Simulation

cut

  • todo

images/zsim-08.ans

Zuul Simulation

cut

  • todo

images/zsim-09.ans

Zuul Simulation

cut

  • todo

images/zsim-10.ans

Zuul Simulation

cut

  • todo

images/zsim-11.ans

Zuul Simulation

cut

  • todo

images/zsim-12.ans

Zuul Simulation

cut

  • todo

images/zsim-13.ans

Zuul Simulation

cut

  • todo

images/zsim-14.ans

Zuul Simulation

cut

  • todo

images/zsim-15.ans

Zuul Simulation

cut

  • todo

images/zsim-16.ans

Zuul Simulation

cut

  • todo

images/zsim-17.ans

Zuul Simulation

cut

  • todo

images/zsim-18.ans

Zuul Simulation

cut

  • todo

images/zsim-19.ans

Zuul Simulation

cut

  • todo

images/zsim-20.ans

Zuul Simulation

cut

  • todo

images/zsim-21.ans

Zuul Simulation

cut

  • todo

images/zsim-22.ans

Lock Step Changes

  • Circular Dependencies are not supported on purpose
  • Rolling upgrades across interdependent services
  • HOWEVER - many valid use cases (go/rust/c++) - support will be coming

Live Configuration Changes

Zuul is a distributed system, with a distributed configuration.

- tenant:
    name: openstack
    source:
      gerrit:
        config-repos:
          - opendev/project-config
        project-repos:
          - zuul/zuul-jobs
          - zuul/zuul
          - zuul/nodepool
          - ansible/ansible
          - openstack/openstacksdk

Zuul Startup

  • Read config file

Zuul Startup

  • Read config file
  • Ask mergers for branches of each repo

images/startup1.ans

Zuul Startup

  • Read config file

  • Ask mergers for branches of each repo

  • Ask mergers for .zuul.yaml for each branch

    of each repo

images/startup2.ans

When .zuul.yaml Changes

  • Zuul looks for changes to .zuul.yaml

  • Asks mergers for updated content

  • Splices into configuration used for that change

  • Works with cross-repo dependencies

    ("This change depends on a change to the job definition")

Explicit Cross-Project Dependencies

  • Developers can mark changes as being dependent
  • Depends-On: footer - in commit or PR
  • Zuul uses depends-on when constructing virtual serial queue
  • Will not merge changes in gate before depends-on changes
  • Works cross-repo AND cross-source

Depends-On Example

Depends-On Example - openstacksdk

Depends-On: https://review.openstack.org/643601

Depends-On Example - keystoneauth

  • openstacksdk uses 'keystoneauth' library to make REST calls
  • Config extraction change wants a new helper method in keystoneauth
  • https://review.openstack.org/644251
  • openstacksdk change adds:
Depends-On: https://review.openstack.org/644251

Depends-On Example - In the Gate

  • When Zuul prepares git repos for the Ironic nova change:
  • Tip of nova, plus nova plumbing change, plus nova ironic change
  • Tip of openstacksdk, plus config method change
  • Tip of keystoneauth, plus helper method change
  • Developers iterate on the nova service change
  • BEFORE finalizing and releasing keystoneauth and openstacksdk changes

Zuul Architecture

We used to call "microservices" "distributed"

  • Zuul is comprised of several services (mostly python3)
  • zuul-scheduler
  • zuul-executor
  • zuul-merger
  • zuul-web
  • zuul-dashboard (javascript/react)
  • zuul-fingergw
  • zuul-proxy (c++)
  • RDBMS
  • Gearman
  • Zookeeper
  • Nodepool

Zuul Architecture

images/architecture.ans

Where Does Job Content Run?

Nodepool

  • A separate program that works very closely with Zuul
  • Zuul requires Nodepool but Nodepool can be used independently
  • Creates and destroys zero or more node resources
  • Resources can include VMs, Containers, COE contexts or Bare Metals
  • Static driver for allocating pre-existing nodes to jobs
  • Optionally periodically builds images and uploads to clouds

Nodepool Launcher

Where build nodes should come from

  • OpenStack
  • Static
  • Kubernetes Pod
  • Kubernetes Namespace
  • AWS

In work / coming soon:

  • Azure
  • GCE

What about job content?

  • Written in Ansible
  • Ansible is excellent at running one or more tasks in one or more places
  • The answer to "how do I" is almost always "Ansible"

What Zuul Does

  • Listens for code events
  • Prepares appropriate job config and git repo states
  • Requests nodes for test jobs from Nodepool
  • Runs user-defined Ansible playbooks with nodes in an inventory
  • Collects/reports results
  • Potentially merges change

Jobs

  • Jobs define test node needs
  • Metadata defined in Zuul's configuration
  • Execution content in Ansible
  • Jobs may be defined centrally or in the repo being tested
  • Jobs have contextual variants that simplify configuration

Job

- job:
    name: base
    parent: null
    description: |
      The base job for Zuul.
    timeout: 1800
    nodeset:
      nodes:
        - name: primary
          label: ubuntu-bionic
    pre-run: playbooks/base/pre.yaml
    post-run:
      - playbooks/base/post-ssh.yaml
      - playbooks/base/post-logs.yaml
    secrets:
      - site_logs

Simple Job

- job:
   name: tox
   pre-run: playbooks/setup-tox.yaml
   run: playbooks/tox.yaml
   post-run: playbooks/fetch-tox-output.yaml

Simple Job Inheritance

- job:
    name: tox-py36
    parent: tox
    vars:
      tox_envlist: py36

Inheritance Works Like An Onion

  • pre-run playbooks run in order of inheritance
  • run playbook of job runs
  • post-run playbooks run in reverse order of inheritance
  • If pre-run playbooks fail, job is re-tried
  • All post-run playbooks run - as far as pre-run playbooks got

Inheritance Example

For tox-py36 job

  • base pre-run playbooks/base/pre.yaml
  • tox pre-run playbooks/setup-tox.yaml
  • tox run playbooks/tox.yaml
  • tox post-run playbooks/fetch-tox-output.yaml
  • base post-run playbooks/base/post-ssh.yaml
  • base post-run playbooks/base/post-logs.yaml

Simple Job Variant

- job:
    name: tox-py27
    branches: stable/mitaka
    nodeset:
      - name: ubuntu-trusty
        label: ubuntu-trusty

Nodesets for Multi-node Jobs

- nodeset:
    name: ceph-cluster
    nodes:
      - name: controller
        label: centos-7
      - name: compute1
        label: fedora-28
      - name: compute2
        label: fedora-28
    groups:
      - name: ceph-osd
        nodes:
          - controller
      - name: ceph-monitor
        nodes:
          - controller
          - compute1
          - compute2

Multi-node Job

  • nodesets are provided to Ansible for jobs in inventory
- job:
    name: ceph-multinode
    nodeset: ceph-cluster
    run: playbooks/install-ceph.yaml
  • Creates ansible inventory:
controller ansible_host=1.2.3.4
compute1 ansible_host=1.2.3.5
compute2 ansible_host=1.2.3.6

[ceph-osd]
controller

[ceph-monitor]
controller
compute1
compute2

Multi-node Ceph Job Content

- hosts: all
  roles:
    - install-ceph

- hosts: ceph-osd
  roles:
    - start-ceph-osd

- hosts: ceph-monitor
  roles:
    - start-ceph-monitor

- hosts: all
  roles:
    - do-something-interesting

Project With Central and Local Config

# In opendev.org/openstack-infra/project-config:
- project:
    name: openstack/nova
    templates:
      - openstack-tox-jobs
# In opendev.org/openstack/nova/.zuul.yaml:
- project:
    check:
      - nova-placement-functional-devstack

zuul-jobs standard library

Project with Job Dependencies

- project:
    release:
      jobs:
        - build-artifacts
        - upload-tarball:
            dependencies: build-artifacts
        - upload-pypi:
            dependencies: build-artifacts
        - notify-mirror:
            dependencies:
              - upload-tarball
              - upload-pypi

Secrets

  • Inspired by Kubernetes Secrets API
  • Projects can add named encrypted secrets to their .zuul.yaml file
  • Jobs can request to use secrets by name
  • Jobs using secrets are not reconfigured speculatively
  • Secrets can only be used by the same project they are defined in
  • Public key per project: {{ zuul_url }}/{{ tenant }}/{{ project }}.pub
::

GET https://zuul.openstack.org/openstack-infra/shade.pub

Secret Example (note, no admins had to enable this)

# In opendev.org/openstack/loci/.zuul.yaml:
- secret:
    name: loci_docker_login
    data:
      user: loci-username
      password: !encrypted/pkcs1-oaep
        - gUEX4eY3JAk/Xt7Evmf/hF7xr6HpNRXTibZjrKTbmI4QYHlzEBrBbHey27Pt/eYvKKeKw
          hk8MDQ4rNX7ZK1v+CKTilUfOf4AkKYbe6JFDd4z+zIZ2PAA7ZedO5FY/OnqrG7nhLvQHE
          5nQrYwmxRp4O8eU5qG1dSrM9X+bzri8UnsI7URjqmEsIvlUqtybQKB9qQXT4d6mOeaKGE
          5h6Ydkb9Zdi4Qh+GpCGDYwHZKu1mBgVK5M1G6NFMy1DYz+4NJNkTRe9J+0TmWhQ/KZSqo
          4ck0x7Tb0Nr7hQzV8SxlwkaCTLDzvbiqmsJPLmzXY2jry6QsaRCpthS01vnj47itoZ/7p
          taH9CoJ0Gl7AkaxsrDSVjWSjatTQpsy1ub2fuzWHH4ASJFCiu83Lb2xwYts++r8ZSn+mA
          hbEs0GzPI6dIWg0u7aUsRWMOB4A+6t2IOJibVYwmwkG8TjHRXxVCLH5sY+i3MR+NicR9T
          IZFdY/AyH6vt5uHLQDU35+5n91pUG3F2lyiY5aeMOvBL05p27GTMuixR5ZoHcvSoHHtCq
          7Wnk21iHqmv/UnEzqUfXZOque9YP386RBWkshrHd0x3OHUfBK/WrpivxvIGBzGwMr2qAj
          /AhJsfDXKBBbhGOGk1u5oBLjeC4SRnAcIVh1+RWzR4/cAhOuy2EcbzxaGb6VTM=

Secret Example

# In opendev.org/openstack/loci/.zuul.yaml:
- job:
    name: publish-loci-cinder
    parent: loci-cinder
    post-run: playbooks/push
    secrets:
      - loci_docker_login

# In opendev.org/openstack/loci/playbooks/push.yaml:
- hosts: all
  tasks:
    - include_vars: vars.yaml

- name: Push project to DockerHub
  block:
    - command: docker login -u {{ loci_docker_login.user }} -p {{ loci_docker_login.password }}
      no_log: True
    - command: docker push openstackloci/{{ project }}:{{ branch }}-{{ item.name }}
      with_items: "{{ distros }}"

Speculative Conatiner Images

  • Gating applied to continuously deployed container images
  • Build and test images that depend on other images
  • Build and test deployments comprising multiple images
  • Without publishing to final location
  • Publish the actual image that was built in the gate

Zuul is not New

  • Has been in Production for OpenStack for Six Years
  • Zuul is now a top-level effort of OpenStack Foundation
  • Zuul v3 first release where not-OpenStack is first-class use case

OpenDev - Largest Known Zuul

  • 2KJPH (2,000 jobs per hour)
  • Build Nodes from 16 Regions of 5 Public and 3 Private OpenStack Clouds
  • Rackspace, Internap, OVH, Vexxhost, CityCloud
  • Linaro (ARM), Limestone, Packethost
  • 10,000 changes merged per month

Not just for OpenStack

  • BMW (control plane in OpenShift)
  • GoDaddy (control plane in private Kubernetes)
  • GoodMoney (control plane in EKS, adding GKE)
  • Le Bon Coin
  • Easystack
  • TungstenFabric
  • OpenLab
  • Red Hat
  • others ...

Code Review Systems

  • Gerrit
  • GitHub (Public and Enterprise)

In work / coming soon:

  • Pagure
  • Gitea

Commonly Requested:

  • GitLab
  • Bitbucket

Support for non-git

  • Nope
  • helix4git may work for perforce, but is untested

Installation of Software

Ways to Install Zuul

Zuul Containers

  • Published on every commit
  • Application/Process containers
  • Config / Data should be bind-mounted in

zuul/zuul-executor

  • In k8s, zuul-executor must be run privileged
  • Uses bubblewrap for unprivileged sanboxing
  • Restriction may be lifted in the future

Release Management

  • Zuul is run Continuously Delivered and Deployed upstream
  • Some users deploy Zuul with Zuul
  • Releases are tagged from code run for OpenDev
  • There is no intent to have a 'stable' release
  • 'stable' is a synonym for "old and buggy"

zuul/zuul-scheduler

  • SPOF
  • We're working on it - HA/Distributed scheduler is coming
  • Recommend running scheduler from tags

Quick Start

  • docker-compose

https://zuul-ci.org/docs/zuul/admin/quick-start.html

Important Links

Questions

images/questions.ans

Presentty

pan

Presentty