--- author: "David Moreau Simard" categories: - community tags: - ansible - openstack - zuul date: 2018-04-09 title: "Scaling ARA to a million Ansible playbooks a month" slug: scaling-ara-to-a-million-ansible-playbooks-a-month type: post --- The [OpenStack](https://www.openstack.org/) community runs over 300 000 CI jobs with [Ansible](https://www.ansible.com/) every month with the help of the awesome [Zuul](https://zuul-ci.org/). ![Zuul Pipelines](zuul-pipelines.png) It even provides ARA reports for ARA's [integration test jobs](https://github.com/ansible-community/ara#contributing-testing-issues-and-bugs) in a sort-of nested way. Zuul's Ansible ends up installing Ansible and ARA. It makes my brain hurt sometimes... but in an awesome way. ![Zuul ARA Report](zuul-ci.png) As a core contributor of the infrastructure team there, I get to witness issues and get a lot of feedback directly from the users. [Static HTML report generation](https://ara.readthedocs.io/en/latest/usage.html#generating-a-static-html-version-of-the-web-application) in ARA is simple but didn't scale very well for us. One day, I was randomly chatting with Ian Wienand and he pointed out an [attempt](https://review.openstack.org/#/c/120317/) at a WSGI middleware that would serve extracted logs. That inspired me to write something similar but for dynamically loading ARA sqlite databases instead... This resulted in an awesome feature that I had not yet taken the time to explain very well... until now. *Excerpt from the [documentation](https://ara.readthedocs.io/en/latest/advanced.html#serving-ara-sqlite-databases-over-http)* > To put this use case into perspective, it was “benchmarked” against a single job from the [OpenStack-Ansible](https://github.com/openstack/openstack-ansible) project: > > - 4 playbooks > - 4647 tasks > - 4760 results > - 53 hosts, of which 39 had gathered host facts > - 416 saved files > > Generating a static report from that database takes ~1min30s on an average machine. > The result contains 5321 files and 5243 directories for an aggregate size of 63MB (or 27MB recursively gzipped). > > This middleware allows you to host the exact same report on your web server just by storing the sqlite database which is just one file and weighs 5.6MB. > This middleware can be useful if you're not interested in aggregating data in a central database server like MySQL or PostgreSQL. The OpenStack CI use case is decentralized: each of the >300 000 Zuul CI jobs have their own sqlite database uploaded as part of the log and artifact collection. There's a lot of benefits of doing things this way: - There's no network latency to a remote database server: the first bottleneck is your local disk speed. - Even if it's a 5ms road trip, this adds up over hundreds of hosts and thousands of tasks. - Oh, and contrary to popular belief, [sqlite is pretty damn fast](https://sqlite.org/speed.html). - There's no risk of a network interruption or central database server crash which would make ARA (and your sysadmins) panic. - Instead of one large database with lots of rows, you have more databases ("shards") with fewer rows. - Instead of generating thousands of files and directories, you're dealing with one small sqlite file. - There's no database cluster to maintain, just standard file servers with a web server in front. Another benefit is that you can easily have as many individual reports as you'd like, all you have to do is to configure ARA to use a custom database location. When I announced that we'd be switching to the sqlite middleware on [openstack-dev](http://lists.openstack.org/pipermail/openstack-dev/2018-March/128902.html), I mentioned that projects could leverage this within their jobs and OpenStack-Ansible was the first to take a stab at it: [https://review.openstack.org/#/c/557921/](https://review.openstack.org/#/c/557921/). Their job's logs now look like this: ``` ara-report/ansible.sqlite # ARA report for this Zuul job logs/ # Job's logs └── ara-report/ # ARA report for this OpenStack-Ansible deployment └── ansible.sqlite # Database for this OpenStack-Ansible deployment ``` The performance improvements for the OpenStack community at large are significant. Even if we're spending 1 minute generating and transferring thousands of HTML files... That's >300 000 minutes worth of compute that could be spent running other jobs. How expensive are 300 000 minutes (or 208 days!) of compute ? What about bandwidth and storage ? ## Unfreezing ARA's stable release for development The latest version of ARA is currently 0.14.6 and ARA was more or less in feature-freeze mode while all the work was focused on the next major release, "[1.0](https://dmsimard.com/2017/11/22/status-update-ara-1.0/)". However, there is a growing amount of large scale users (me included!) that are really pushing the current limitations of ARA and 1.0 (or 2.0!) won't be ready for a while still. I couldn't afford to leave performance issues and memory leaks ruin the experience of a tool that would otherwise be very useful to them. These improvement opportunities have convinced me that there will be a 0.15.0 release for ARA. Stay tuned for the 0.15.0 release notes and another update about 2.0 in the near future :)