inaugust.com/src/zuulv3/gus2019.rst

17 KiB

. display in 68x40 .. display in 88x24

dissolve

Test Slide

images/testslide.ans

Preshow

images/cursor.ans images/cursor2.ans

Zuul

images/title.ans

Red Hat

images/redhat.ans

OpenStack

images/openstack.ans

OpenDev

"most insane CI infrastructure I've ever been a part of"

  -- Alex Gaynor

"like the SpaceX of CI"

  -- Emily Dunham

Zuul

images/zuul.ans

What Zuul does

  • "Speculative Future State"
  • multiple repositories
  • integrated deliverable
  • gated commits
  • testing like deployment

Developer Process In a Nutshell

  • Code Review (yay Gerrit!) - nobody has direct commit/push access
  • Gated Commits - nobody has submit permission
  • Every change gated on Code Analysis, Unit Tests, Functional Tests and End to End Integration Tests
  • Run all tests (at least) twice:
    • on patchset upload
    • between change approval and merge

Developer Workflow

  • Who has submitted a patch?
  • Who wants to?
  • (Who is here because the name of this talk is weird?)
Hack             Review              Test
=========         ==========         ==========

        push              approve
   +-------------+    +-------------+
   |             |    |             |
+------+--+       +--v----+--+       +--v-------+
|         |       |          |       |          |
| $EDITOR |       |  Gerrit  |       |   Zuul   |
|         |       |          |       |          |
+------^--+       +--+----^--+       +--+-------+
   |             |    |             |
   +-------------+    +-------------+
        clone             submit

Gerrit

explain patch upload, zuul runs, test results displayed in gerrit this is all the interface to zuul users need to see

switch to actual gertty screenshot

also show zuul status page

but zuul is doing a lot of work behind the scenes, and if you look closer, this is what you see

images/color-gertty.ans

Gerrit Installation

  • 60G RAM
  • 16 VCPU
  • 8x git replicas running Gitea
  • 2.13 (sssh, don't tell Luca)

Zuul in a nutshell

  • Listens for code events
  • Prepares appropriate job config and git repo states
  • Allocates nodes for test jobs
  • Pushes git repo states to nodes
  • Runs user-defined Ansible playbooks
  • Collects/reports results
  • Potentially merges change

All in Service of Gating

No Tests / Manual Tests

  • No test automation exists or ...
  • Developer runs test suite before pushing code
  • Prone to developer skipping tests for "trivial" changes
  • Doesn't scale organizationally

Periodic Testing

  • Developers push changes directly to shared branch
  • CI system runs tests from time to time - report if things still work
  • "Who broke the build?"
  • Leads to hacks like NVIE model

Post-Merge Testing

  • Developers push changes directly to shared branch
  • CI system is triggered by push - reports if push broke something
  • Frequently batched / rolled up
  • Easier to diagnose which change broke things
  • Reactive - the bad changes are already in

Pre-Review Testing

  • Changes are pushed to code review (Gerrit Change, GitHub PR, etc)
  • CI system is triggered by code review change creation
  • Test results inform review decisions
  • Proactive - testing code before it lands
  • Reviewers can get bored waiting for tests
  • Only tests code as written, not potential result of merging code

Gating

  • Changes are pushed to code review
  • Gating system is triggered by code review approval
  • Gating system merges code IFF tests pass
  • Proactive - testing code before it lands
  • Future state resulting from merge of code is tested
  • Reviewers can fire-and-forget safely

Mix and Match

  • Zuul supports all of those modes
  • Zuul users frequently combine them
  • Run pre-review (check) and gating (gate) on each change
  • Post-merge/post-tag for release/publication automation
  • Periodic for catching bitrot

Check Jobs

  • Run on patchset upload
  • Verify patch as written
  • Avoid wasting reviewer time on broken changes

check pipeline

- pipeline:
    name: check
    manager: independent
    precedence: low
    require:
      gerrit:
        open: True
        current-patchset: True
    trigger:
      gerrit:
        - event: patchset-created
        - event: change-restored
        - event: comment-added
          comment: (?i)^(Patch Set [0-9]+:)?( [\w\\+-]*)*(\n\n)?\s*recheck
        - event: comment-added
          require-approval:
            - Verified: [-1, -2]
              username: zuul
          approval:
            - Workflow: 1
    success:
      gerrit:
        Verified: 1
    failure:
      gerrit:
        Verified: -1

Gate Triggering in Gerrit

  • Gate jobs run between Code Review Approval and Merging
  • "Workflow" Label in Gerrit
  • Approvers have ability to vote +1 in Workflow
  • Nobody sees the Submit button
  • Zuul runs Gate jobs on Workflow+1 - clicks Submit on Success

gate pipeline

- pipeline:
    name: gate
    manager: dependent
    post-review: True
    require:
      gerrit:
        open: True
        current-patchset: True
        approval:
          - Workflow: 1
    trigger:
      gerrit:
        - event: comment-added
          approval:
            - Workflow: 1
    start:
      gerrit:
        Verified: 0
    success:
      gerrit:
        Verified: 2
        submit: true
    failure:
      gerrit:
        Verified: -2

Submit Hook?

  • We'd love to have a hook point between Submit and Merge ...

Multi-repository integration

  • Multiple source repositories are needed for deliverable
  • Future state to be tested is the future state of all involved repos

To test proposed future state

  • Get tip of each project. Merge appropriate change(s). Test.
  • Changes must be serialized, otherwise state under test is invalid.
  • Integrated deliverable repos share serialized queue

Speculative Execution

  • Correct parallel processing of serialized future states
  • Create virtual serial queue of changes for each deliverable
  • Assume each change will pass its tests
  • Test successive changes with previous changes applied to starting state

Nearest Non-Failing Change

(aka 'The Jim Blair Algorithm')

  • If a change fails, move it aside
  • Cancel all test jobs behind it in the queue
  • Reparent queue items on the nearest non-failing change
  • Restart tests with new state

Zuul Simulation

pan

  • todo

images/zsim-00.ans

Zuul Simulation

cut

  • todo

images/zsim-01.ans

Zuul Simulation

cut

  • todo

images/zsim-02.ans

Zuul Simulation

cut

  • todo

images/zsim-03.ans

Zuul Simulation

cut

  • todo

images/zsim-04.ans

Zuul Simulation

cut

  • todo

images/zsim-05.ans

Zuul Simulation

cut

  • todo

images/zsim-06.ans

Zuul Simulation

cut

  • todo

images/zsim-07.ans

Zuul Simulation

cut

  • todo

images/zsim-08.ans

Zuul Simulation

cut

  • todo

images/zsim-09.ans

Zuul Simulation

cut

  • todo

images/zsim-10.ans

Zuul Simulation

cut

  • todo

images/zsim-11.ans

Zuul Simulation

cut

  • todo

images/zsim-12.ans

Zuul Simulation

cut

  • todo

images/zsim-13.ans

Zuul Simulation

cut

  • todo

images/zsim-14.ans

Zuul Simulation

cut

  • todo

images/zsim-15.ans

Zuul Simulation

cut

  • todo

images/zsim-16.ans

Zuul Simulation

cut

  • todo

images/zsim-17.ans

Zuul Simulation

cut

  • todo

images/zsim-18.ans

Zuul Simulation

cut

  • todo

images/zsim-19.ans

Zuul Simulation

cut

  • todo

images/zsim-20.ans

Zuul Simulation

cut

  • todo

images/zsim-21.ans

Zuul Simulation

cut

  • todo

images/zsim-22.ans

Explicit Cross-Project Dependencies

  • Developers can mark changes as being dependent
  • Depends-On: footer - in commit or PR
  • Zuul uses depends-on when constructing virtual serial queue
  • Will not merge changes in gate before depends-on changes
  • Works cross-repo AND cross-source

Cross Source

trigger:
  gerrit:
    - event: patchset-created
    - event: change-restored
    - event: comment-added
      comment: (?i)^(Patch Set [0-9]+:)?( [\w\\+-]*)*(\n\n)?\s*recheck
  github:
    - event: pull_request
      action:
        - opened
        - changed
        - reopened
    - event: pull_request
      action: comment
      comment: (?i)^\s*recheck\s*$
start:
  github:
    status: pending
    comment: false
success:
  gerrit:
    Verified: 1
  github:
    status: 'success'
failure:
  gerrit:
    Verified: -1
  github:
    status: 'failure'

Cross Source

  • Explicit Dependency between projects from different sources
  • Change to Zuul that depends on change to Gerrit
commit 737d61c116ff5f32770ef72e2dd82a031ab32591
Author: James E. Blair <jeblair@redhat.com>
Date:   Mon Aug 19 14:58:20 2020 -0700

Add Support for Gerrit Checks Plugin

Depends-On: https://gerrit-review.googlesource.com/c/plugins/checks/+/232079
Change-Id: I8e5903f4429c5a1273a6120e0d09c57169e8f938

Lock Step Changes

  • Circular Dependencies are not supported on purpose
  • Rolling upgrades across interdependent services
  • HOWEVER - many valid use cases (go/rust/c++) - support expected

Live Configuration Changes

Zuul is a distributed system, with a distributed configuration.

- tenant:
    name: openstack
    source:
      gerrit:
        config-projects:
          - opendev/project-config
        untrusted-projects:
          - zuul/zuul-jobs
          - zuul/zuul
          - zuul/nodepool
          - ansible/ansible
          - openstack/openstacksdk

Zuul Startup

  • Read config file

Zuul Startup

  • Read config file
  • Ask mergers for branches of each repo

images/startup1.ans

Zuul Startup

  • Read config file

  • Ask mergers for branches of each repo

  • Ask mergers for .zuul.yaml for each branch

    of each repo

images/startup2.ans

When .zuul.yaml Changes

  • Zuul looks for changes to .zuul.yaml

  • Asks mergers for updated content

  • Splices into configuration used for that change

  • Works with cross-repo dependencies

    ("This change depends on a change to the job definition")

Zuul Architecture

Zuul is comprised of several services (mostly python3)

  • zuul-scheduler
  • zuul-executor
  • zuul-merger
  • zuul-web
  • zuul-fingergw
  • zuul-dashboard (javascript/react)
  • zuul-proxy (c++)
  • nodepool-launcher
  • nodepool-builder
  • RDBMS
  • Gearman
  • Zookeeper

Where Does Job Content Run?

Nodepool

  • A separate service that works very closely with Zuul
  • Zuul requires Nodepool but Nodepool can be used independently
  • Creates and destroys zero or more node resources
  • Resources can include VMs, Containers, COE contexts or Bare Metals
  • Static driver for allocating pre-existing nodes to jobs
  • Optionally periodically builds images and uploads to clouds

Nodepool Launcher

Where build nodes should come from

  • OpenStack
  • Static
  • Kubernetes Pod
  • Kubernetes Namespace
  • AWS

In work / coming soon:

  • Azure
  • GCE

Jobs

  • Define node types needed from nodepool
  • Define which ansible playbooks to run
  • Jobs may be defined centrally or in the repo being tested
  • Jobs have contextual variants that simplify configuration
  • Jobs definitions support inheritance

Job Libraries

Simple Job

- job:
   name: tox
   pre-run: playbooks/setup-tox.yaml
   run: playbooks/tox.yaml
   post-run: playbooks/fetch-tox-output.yaml

What about job content?

  • Written in Ansible
  • Ansible is excellent at running one or more tasks in one or more places
  • The answer to "how do I" is almost always "Ansible"

Checks Plugin

  • Learned about checks plugin in Gothenburg
  • Added support for checks plugin to Zuul
  • Expanded HTTP support in Zuul's Gerrit driver
  • Connected OpenDev Zuul to Gerrit's Gerrit

OpenDev Zuul Verifies Checks Plugins Changes Using Checks API

OpenDev Zuul Verifies Checks Plugins Changes Using Checks API

https://gerrit-review.googlesource.com/c/plugins/checks/+/245796

https://opendev.org/zuul/project-config/src/branch/master/zuul.d/pipelines.yaml#L183-L235

Future Work

Sub-checks

Using Zuul on Gerrit's Gerrit

  • Sub-checks implemented in Gerrit and Zuul
  • Add GCE support for Nodepool
  • Work with Luca to run GCE-enabled Zuul

Who Is Running Zuul?

  • Zuul is in production for OpenStack for 7 years (in OpenStack VMs)

Also running at:

  • Volvo
  • BMW (control plane in OpenShift)
  • GoDaddy (control plane in Kubernetes)
  • GoodMoney (control plane in EKS, adding GKE)
  • Le Bon Coin
  • Easystack
  • Western Digital
  • TungstenFabric
  • Huawei OpenLab
  • IBM
  • Red Hat
  • others ...

Zuul as a Service: https://vexxhost.com/solutions/managed-zuul/

Important Links

Questions

images/questions.ans

Presentty

pan

Presentty