. display in 68x24 .. display in 88x24 .. pygments yaml? (only file breaks (---) tinted) .. slide on high level v3 changes .. slide on nodepool .. transition:: dissolve :duration: 0.4 Test Slide ========== .. hidetitle:: .. ansi:: images/testslide.ans Preshow ======= .. hidetitle:: .. ansi:: images/cursor.ans images/cursor2.ans Zuul ==== .. hidetitle:: .. ansi:: images/title.ans OpenDev ======= :: "most insane CI infrastructure I've ever been a part of" -- Alex Gaynor "like the SpaceX of CI" -- Emily Dunham What Zuul does ============== * "Speculative Future State" * multiple repositories * integrated deliverable * gated commits * open tooling * nobody is special * testing like deployment OpenStack ========= .. hidetitle:: .. ansi:: images/openstack.ans OpenStack Is ============ * Federated * Distributed * Large * Open * Not Alone Federated ========= * Hundreds of involved companies * No 'main' company * "Decisions are made by those who show up" * Union of priorities/use cases Impact of being Federated ========================= * No company can appoint people to positions in the project * The project cannot fire anyone * Variable background of contributors * Heavy reliance on consensus Distributed =========== * There is no office * Contributor base is global * Multitude of contributor backgrounds * "Accept patches from random people on the internet" Impact of being Distributed =========================== * Tooling must empower all contributors, regardless of background, skill level or cultural context * Human mutexes are impossible * Pessimistic threat model - patches are all assumed to be attacks Large numbers of ================ * Contributors (\~2k in any given 6 month period) * Changes (\~10k *merged* per month) * Code Repositories (2281 as of 2021-09-30) Not Bragging About Scale ======================== OpenStack Scale Comparison ========================== * 2KJPH (2,000 jobs per hour) * Build Nodes from 16 Regions of 5 Public and 3 Private OpenStack Clouds * Rackspace, Internap, OVH, Vexxhost, CityCloud * Linaro (ARM), Limestone, Packethost * 10,000 changes merged per month OpenStack Scale Comparison ========================== * 2KJPH (2,000 jobs per hour) * Build Nodes from 16 Regions of 5 Public and 3 Private OpenStack Clouds * Rackspace, Internap, OVH, Vexxhost, CityCloud * Linaro (ARM), Limestone, Packethost * 10,000 changes merged per month * By comparison, our friends at the amazing project Ansible received 13,000 changes and had merged 8,000 of them in its first 4 years. Impact of scale =============== * Empower teams to take care of themselves (distributed) * Efficiency gained from shared solutions (centralized) * Zuul supports per-repo config, central config, and multiple tenants * One Zuul install is all you need for all of BMW or ExxonMobil or IBM Four Opens ========== * Open Source (we don't hold back Enterprise features, we don't cripple things) * Open Design (design process open to all, decisions are not made inside company doors) * Open Development (public source code, public code review, all code is reviewed and gated) * Open Community (lazy consensus, democratic leadership from participants, public logged meetings in IRC, public archived mailing lists) Impact of Four Opens ==================== * Tooling is not exempt * Fifth Open - Open Operations - all OpenDev infrastructure is run in the Open, via Gated GitOps Infrastructure ============== * Cloud Software - network drivers, kernel modules * Some parts legitimately need root * Infrastructure platforms interface with the dirty bits Impact of Infrastructure ======================== * Simple approaches "just build a container" are insufficient * High cost of failure for users running CD * Integration testing may require multiple full computers with network interconnects * Constructs developed resonate strongly in manufacturing and financial services Developer Process In a Nutshell =============================== * Code Review - nobody has direct push access * 3rd-Party CI for vendors * Gated Commits Developer Workflow ================== .. container:: handout * Who has submitted a patch? * Who wants to? * (Who is here because the name of this talk is weird?) :: Hack Review Test ========= ========== ========== push approve +-------------+ +-------------+ | | | | +------+--+ +--v----+--+ +--v-------+ | | | | | | | $EDITOR | | Gerrit | | Zuul | | | | | | | +------^--+ +--+----^--+ +--+-------+ | | | | +-------------+ +-------------+ clone merge Code Review =========== .. hidetitle:: .. container:: handout explain patch upload, zuul runs, test results displayed in gerrit this is all the interface to zuul users need to see switch to actual gertty screenshot also show zuul status page but zuul is doing a lot of work behind the scenes, and if you look closer, this is what you see .. ansi:: images/color-gertty.ans Zuul is not New =============== * Has been in Production for OpenStack for Eight Years * Starting with v3, non-OpenStack is first-class use case * Zuul is now an OpenInfra Foundation Project Not just for OpenStack ====================== * Zuul is in production for OpenDev Also running at: * BMW (control plane in OpenShift) * GoDaddy (control plane in Kubernetes) * GoodMoney (control plane in EKS) * Gerrit (control plane in GKE) * Volvo * Red Hat * Le Bon Coin * Easystack * TungstenFabric * others ... Zuul eats its own dogfood ========================= * Zuul development is managed by the Zuul run by OpenDev * OpenDev's Zuul is deployed by OpenDev's Zuul Zuul in a nutshell ================== * Listens for code events * Prepares appropriate job config and git repo states * Allocates nodes for test jobs * Pushes git repo states to nodes * Runs user-defined Ansible playbooks * Collects/reports results * Potentially merges change All in Service of Gating ======================== No Tests / Manual Tests ======================= * No test automation exists or ... * Developer runs test suite before pushing code * Prone to developer skipping tests for "trivial" changes * Doesn't scale organizationally Periodic Testing ================ * Developers push changes directly to shared branch * CI system runs tests from time to time - report if things still work * "Who broke the build?" * Leads to hacks like NVIE model Post-Merge Testing ================== * Developers push changes directly to shared branch * CI system is triggered by push - reports if push broke something * Frequently batched / rolled up * Easier to diagnose which change broke things * Reactive - the bad changes are already in Pre-Review Testing ================== * Changes are pushed to code review (Gerrit Change, GitHub PR, etc) * CI system is triggered by code review change creation * Test results inform review decisions * Proactive - testing code before it lands * Reviewers can get bored waiting for tests * Only tests code as written, not potential result of merging code Gating ====== * Changes are pushed to code review * Gating system is triggered by code review approval * Gating system merges code IFF tests pass * Proactive - testing code before it lands * Future state resulting from merge of code is tested * Reviewers can fire-and-forget safely Mix and Match ============= * Zuul supports all of those modes * Zuul users frequently combine them * Run pre-review (check) and gating (gate) on each change * Post-merge/post-tag for release/publication automation * Periodic for catching bitrot Multi-repository integration ============================ * Multiple source repositories are needed for deliverable * Future state to be tested is the future state of all involved repos To test proposed future state ============================= * Get tip of each project. Merge appropriate change(s). Test. * Changes must be serialized, otherwise state under test is invalid. * Integrated deliverable repos share serialized queue Speculative Execution ===================== * Correct parallel processing of serialized future states * Create virtual serial queue of changes for each deliverable * Assume each change will pass its tests * Test successive changes with previous changes applied to starting state Nearest Non-Failing Change ========================== * If a change fails, move it aside * Cancel all test jobs behind it in the queue * Reparent queue items on the nearest non-failing change * Restart tests with new state Zuul Simulation =============== .. transition:: pan .. container:: handout * todo .. ansi:: images/zsim-00.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-01.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-02.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-03.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-04.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-05.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-06.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-07.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-08.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-09.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-10.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-11.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-12.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-13.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-14.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-15.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-16.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-17.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-18.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-19.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-20.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-21.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-22.ans Explicit Cross-Project Dependencies =================================== * Developers can mark changes as being dependent * Depends-On: footer - in commit or PR * Zuul uses depends-on when constructing virtual serial queue * Will not merge changes in gate before depends-on changes * Works cross-repo AND cross-source Lock Step Changes ================= * Circular Dependencies are not supported on purpose * Rolling upgrades across interdependent services Live Configuration Changes ========================== .. container:: handout Zuul is a distributed system, with a distributed configuration. .. code:: yaml - tenant: name: openstack source: gerrit: config-repos: - opendev/project-config project-repos: - zuul/zuul-jobs - zuul/zuul - zuul/nodepool - ansible/ansible - openstack/openstacksdk Zuul Startup ============ * Read config file Zuul Startup ============ * Read config file * Ask mergers for branches of each repo .. ansi:: images/startup1.ans Zuul Startup ============ * Read config file * Ask mergers for branches of each repo * Ask mergers for .zuul.yaml for each branch of each repo .. ansi:: images/startup2.ans When .zuul.yaml Changes ======================= .. container:: progressive * Zuul looks for changes to .zuul.yaml * Asks mergers for updated content * Splices into configuration used for that change * Works with cross-repo dependencies ("This change depends on a change to the job definition") Where Does Job Content Run? =========================== Nodepool ======== * A separate service that works very closely with *Zuul* * *Zuul* requires *Nodepool* but *Nodepool* can be used independently * Creates and destroys zero or more node resources * Resources can include VMs, Containers, COE contexts or Bare Metals * Static driver for allocating pre-existing nodes to jobs * Optionally periodically builds images and uploads to clouds Nodepool Launcher ================= Where build nodes should come from * OpenStack * Static * Kubernetes Pod * Kubernetes Namespace * AWS * Azure * GCE Jobs ==== * Define node types needed from nodepool * Define content to run on those nodes * Jobs may be defined centrally or in the repo being tested * Jobs have contextual variants that simplify configuration * Jobs definitions support inheritance Important Links =============== * https://zuul-ci.org/ * https://opendev.org/zuul * https://zuul-ci.org/docs/zuul * https://zuul-ci.org/docs/zuul-jobs/ * https://docs.openstack.org/infra/openstack-zuul-jobs/ * freenode:#zuul Questions ========= .. ansi:: images/questions.ans How do you use this thing? ========================== .. transition:: tilt .. hidetitle:: .. figlet:: Configuration Job === .. code:: yaml - job: name: base parent: null description: | The base job for Zuul. timeout: 1800 nodeset: nodes: - name: primary label: ubuntu-xenial pre-run: playbooks/base/pre.yaml post-run: - playbooks/base/post-ssh.yaml - playbooks/base/post-logs.yaml secrets: - site_logs Simple Job ========== .. code:: yaml - job: name: tox pre-run: playbooks/setup-tox.yaml run: playbooks/tox.yaml post-run: playbooks/fetch-tox-output.yaml Simple Job Inheritance ====================== .. code:: yaml - job: name: tox-py36 parent: tox vars: tox_envlist: py36 Inheritance Works Like An Onion =============================== * pre-run playbooks run in order of inheritance * run playbook of job runs * post-run playbooks run in reverse order of inheritance * If pre-run playbooks fail, job is re-tried * All post-run playbooks run - as far as pre-run playbooks got Inheritance Example =================== For tox-py36 job * base pre-run playbooks/base/pre.yaml * tox pre-run playbooks/setup-tox.yaml * tox run playbooks/tox.yaml * tox post-run playbooks/fetch-tox-output.yaml * base post-run playbooks/base/post-ssh.yaml * base post-run playbooks/base/post-logs.yaml Simple Job Variant ================== .. code:: yaml - job: name: tox-py27 branches: stable/mitaka nodeset: - name: primary label: ubuntu-trusty Nodesets for Multi-node Jobs ============================ .. code:: yaml - nodeset: name: ceph-cluster nodes: - name: controller label: centos-7 - name: compute1 label: fedora-28 - name: compute2 label: fedora-28 groups: - name: ceph-osd nodes: - controller - name: ceph-monitor nodes: - controller - compute1 - compute2 Multi-node Job ============== * nodesets are provided to Ansible for jobs in inventory .. code:: yaml - job: name: ceph-multinode nodeset: ceph-cluster run: playbooks/install-ceph.yaml Multi-node Ceph Job Content =========================== .. code:: yaml - hosts: all roles: - install-ceph - hosts: ceph-osd roles: - start-ceph-osd - hosts: ceph-monitor roles: - start-ceph-monitor - hosts: all roles: - do-something-interesting Projects ======== * Projects are git repositories * Specify a set of jobs for each pipeline * golang git repo naming as been adopted: :: zuul@ubuntu-xenial:~$ find /home/zuul/src -mindepth 3 -maxdepth 3 -type d /home/zuul/src/opendev.org/openstack-infra/shade /home/zuul/src/opendev.org/openstack/keystoneauth /home/zuul/src/opendev.org/openstack/os-client-config /home/zuul/src/github.com/ansible/ansible Project Config ============== * Specify a set of jobs for each pipeline .. code:: yaml - project: check: jobs: - openstack-tox-py27 - openstack-tox-py35 - openstack-tox-docs gate: jobs: - openstack-tox-py27 - openstack-tox-py35 - openstack-tox-docs Project with Local Variant ========================== .. code:: yaml - project: check: jobs: - openstack-tox-py27 - openstack-tox-py35 - openstack-tox-py36: voting: false - openstack-tox-docs gate: jobs: - openstack-tox-py27 - openstack-tox-py35 - openstack-tox-docs Project with More Local Variants ================================ .. code:: yaml - project: check: jobs: - openstack-tox-py27 - openstack-tox-py35 - openstack-tox-py36: voting: false - openstack-tox-docs: files: '^docs/.*$' Project with Many Local Variants ================================ .. code:: yaml - project: check: jobs: - openstack-tox-py27: nodeset: - name: centos-7 label: centos-7 - openstack-tox-py27: branches: stable/newton nodeset: - name: ubuntu-trusty label: ubuntu-trusty - openstack-tox-py35 - openstack-tox-py36: voting: false - openstack-tox-docs: files: '^docs/.*$' Project With Central and Local Config ===================================== .. code:: yaml # In opendev.org/openstack-infra/project-config: - project: name: openstack/nova templates: - openstack-tox-jobs .. code:: yaml # In opendev.org/openstack/nova/.zuul.yaml: - project: check: - nova-placement-functional-devstack Project with Job Dependencies ============================= .. code:: yaml - project: release: jobs: - build-artifacts - upload-tarball: dependencies: build-artifacts - upload-pypi: dependencies: build-artifacts - notify-mirror: dependencies: - upload-tarball - upload-pypi Playbooks ========= * Jobs run playbooks * Playbooks may be defined centrally or in the repo being tested * Playbooks can use roles from current or other Zuul repos or Galaxy * Playbooks are not allowed to execute content on 'localhost' devstack-tempest Run Playbook ============================= .. code:: yaml # Changes that run through devstack-tempest are likely to have an impact on # the devstack part of the job, so we keep devstack in the main play to # avoid zuul retrying on legitimate failures. - hosts: all roles: - run-devstack # We run tests only on one node, regardless how many nodes are in the system - hosts: tempest roles: - setup-tempest-run-dir - setup-tempest-data-dir - acl-devstack-files - run-tempest Simple Shell Playbook ===================== .. code:: yaml hosts: controller roles: - shell: | cd {{ zuul.project.src_dir }} ./run_tests.sh Test Like Production ==================== If you use Ansible for deployment, your test and deployment processes and playbooks are the same What if you don't use Ansible? ============================== OpenStack Infra Control Plane uses Puppet (for now) =================================================== .. code:: yaml # In opendev.org/openstack-infra/project-config/roles/legacy-install-afs-with-puppet/tasks/main.yaml - name: Install puppet shell: ./install_puppet.sh args: chdir: "{{ ansible_user_dir }}/src/opendev.org/openstack-infra/system-config" environment: # Skip setting up pip, our images have already done this. SETUP_PIP: "false" become: yes - name: Copy manifest copy: src: manifest.pp dest: "{{ ansible_user_dir }}/manifest.pp" - name: Run puppet puppet: manifest: "{{ ansible_user_dir }}/manifest.pp" become: yes Secrets ======= * Inspired by Kubernetes Secrets API * Projects can add named encrypted secrets to their .zuul.yaml file * Jobs can request to use secrets by name * Jobs using secrets are not reconfigured speculatively * Secrets can only be used by the same project they are defined in * Public key per project: ``{{ zuul_url }}/{{ tenant }}/{{ project }}.pub`` :: GET https://zuul.openstack.org/openstack-infra/shade.pub Secret Example (note, no admins had to enable this) =================================================== .. code:: yaml # In opendev.org/openstack/loci/.zuul.yaml: - secret: name: loci_docker_login data: user: loci-username password: !encrypted/pkcs1-oaep - gUEX4eY3JAk/Xt7Evmf/hF7xr6HpNRXTibZjrKTbmI4QYHlzEBrBbHey27Pt/eYvKKeKw hk8MDQ4rNX7ZK1v+CKTilUfOf4AkKYbe6JFDd4z+zIZ2PAA7ZedO5FY/OnqrG7nhLvQHE 5nQrYwmxRp4O8eU5qG1dSrM9X+bzri8UnsI7URjqmEsIvlUqtybQKB9qQXT4d6mOeaKGE 5h6Ydkb9Zdi4Qh+GpCGDYwHZKu1mBgVK5M1G6NFMy1DYz+4NJNkTRe9J+0TmWhQ/KZSqo 4ck0x7Tb0Nr7hQzV8SxlwkaCTLDzvbiqmsJPLmzXY2jry6QsaRCpthS01vnj47itoZ/7p taH9CoJ0Gl7AkaxsrDSVjWSjatTQpsy1ub2fuzWHH4ASJFCiu83Lb2xwYts++r8ZSn+mA hbEs0GzPI6dIWg0u7aUsRWMOB4A+6t2IOJibVYwmwkG8TjHRXxVCLH5sY+i3MR+NicR9T IZFdY/AyH6vt5uHLQDU35+5n91pUG3F2lyiY5aeMOvBL05p27GTMuixR5ZoHcvSoHHtCq 7Wnk21iHqmv/UnEzqUfXZOque9YP386RBWkshrHd0x3OHUfBK/WrpivxvIGBzGwMr2qAj /AhJsfDXKBBbhGOGk1u5oBLjeC4SRnAcIVh1+RWzR4/cAhOuy2EcbzxaGb6VTM= Secret Example ============== .. code:: yaml # In opendev.org/openstack/loci/.zuul.yaml: - job: name: publish-loci-cinder parent: loci-cinder post-run: playbooks/push secrets: - loci_docker_login # In opendev.org/openstack/loci/playbooks/push.yaml: - hosts: all tasks: - include_vars: vars.yaml - name: Push project to DockerHub block: - command: docker login -u {{ loci_docker_login.user }} -p {{ loci_docker_login.password }} no_log: True - command: docker push openstackloci/{{ project }}:{{ branch }}-{{ item.name }} with_items: "{{ distros }}" Important Links =============== * https://zuul-ci.org/ * https://git.zuul-ci.org/cgit/zuul * https://zuul-ci.org/docs/zuul * https://zuul-ci.org/docs/zuul-jobs/ * https://docs.openstack.org/infra/openstack-zuul-jobs/ * freenode:#zuul Questions ========= .. ansi:: images/questions.ans Presentty ========= .. hidetitle:: .. transition:: pan .. figlet:: Presentty * Console presentations written in reStructuredText * Cross-fade, pan, tilt, cut transitions * Figlet, cowsay! * https://pypi.python.org/pypi/presentty