OpenStack CI log processing tool

Go to file

Daniel Pawlik eee765f8c6 Add information about subunit index in logstash role With the commit [1], the logsender would push the tempest subunit results to the Opensearch, but to the new index: 'subunit'. [1] https://review.opendev.org/c/openstack/ci-log-processing/+/858373 Change-Id: I1ebfdaec384f7d81a0c246a1a1e6c2eaaad3ede0		2022-09-27 15:02:26 +02:00
ansible	Add index with individual test results	2022-09-22 09:35:20 +02:00
doc	Add Opensearch configuration information; change README file; enable md	2022-03-31 09:11:30 +02:00
loggearman	Change service directory permission; change python3 to python38	2022-03-01 16:16:47 +01:00
logscraper	Add index with individual test results	2022-09-22 09:35:20 +02:00
opensearch-config	Add information about subunit index in logstash role	2022-09-27 15:02:26 +02:00
.gitignore	Initial project commit	2021-11-16 11:50:55 +01:00
.gitreview	Added .gitreview	2021-10-26 15:01:01 +00:00
.stestr.conf	Initial project commit	2021-11-16 11:50:55 +01:00
.yamllint	Add job for validating logscraper and log gearman services	2021-12-01 11:05:11 +01:00
.zuul.yaml	Push container images as latest tag after merge	2022-07-15 09:04:07 +02:00
Dockerfile	Add Logsender tool	2022-03-08 15:14:02 +01:00
LICENSE	Initial project commit	2021-11-16 11:50:55 +01:00
README.rst	Fix typos; add download dir into the diagram	2022-06-24 08:48:18 +02:00
requirements.txt	Add index with individual test results	2022-09-22 09:35:20 +02:00
setup.cfg	Add Opensearch configuration information; change README file; enable md	2022-03-31 09:11:30 +02:00
setup.py	Move options into config file	2022-06-15 11:14:11 +02:00
test-requirements.txt	Add ansible role that is configuring logscraper	2021-11-23 08:26:20 +01:00
tox.ini	Add ansible role that is configuring logscraper	2021-11-23 08:26:20 +01:00

README.rst

OpenStack CI log processing

The goal of this repository is to provide and check functionality of new log processing system base on zuul log scraper tool.

Log Scraper

The Log Scraper tool is responsible for periodical check by using Zuul CI API if there are new builds available and if there are some, it would push the informations to the log processing system.

Log Sender

The Zuul Log Sender tool is responsible for periodical check directory, if there are some files that should be send to the Elasticsearch service. NOTE: build directories that does not provide files buildinfo and inventory.yaml file are skipped.

Available workflows

The Openstack CI Log Processing project is providing two configurations for sending logs from Zuul CI to the Opensearch host.

Logscraper, log gearman client, log gearman worker, logstash

With this solution, log workflow looks like:

+------------------+  1. Get last builds info  +----------------+
|                  |-------------------------> |                |
|    Logscraper    |                           |     Zuul API   |
|                  |<------------------------- |                |
+------------------+  2. Fetch data            +----------------+
         |
         |
         +-------------------+
                             |
                             |
             3. Send queue   |
             logs to gearman |
             client          |
                             v
                   +------------------+
                   |                  |
                   |  Log gearman     |
                   |     client       |
                   +------------------+
        +--------------           --------------+
        |                    |                  |
        |  4. Consume queue, |                  |
        | download log files |                  |
        |                    |                  |
        v                    v                  v
+---------------+   +----------------+  +---------------+
|  Log gearman  |   |   Log gearman  |  | Log gearman   |
|  worker       |   |   worker       |  | worker        |
+---------------+   +----------------+  +---------------+
       |                     |                  |
       |   5. Send to        |                  |
       |      Logstash       |                  |
       |                     v                  |
       |            +----------------+          |
       |            |                |          |
       +--------->  |    Logstash    |  <-------+
                    |                |
                    +----------------+
                            |
              6. Send to    |
                 Opensearch |
                            |
                   +--------v--------+
                   |                 |
                   |    Opensearch   |
                   |                 |
                   +-----------------+

On the beginning, this project was designed to use that solution, but it have a few bottlenecks: - log gearman client can use many memory, when log gearman worker is not fast, - one log gearman worker is not enough even on small infrastructure, - logstash service can fail, - logscraper is checking if log files are available, then log gearman is downloading the logs, which can make an issue on log sever, that host does not have free socket.

You can deploy your log workflow by using example Ansible playbook that you can find in ansible/playbooks/check-services.yml in this project.

Logscraper, logsender

This workflow removes bottlenecks by removing: log gearman client, log gearman worker and logstash service. Logs are downloaded when available to the disk, then parsed by logsender and send directly to the Opensearch host.

With this solution, log workflow looks like:

+----------------+  1. Get last builds info      +-----------------+
|                | ----------------------------> |                 |
|   Logscraper   |                               |  Zuul API       |
|                | <---------------------------- |                 |
+----------------+    2. Fetch data              +-----------------+
         |
         +------------------------------------------------+
                                                          |
                                  3. Download logs;       |
                                  include inventory.yaml  |
                                  and build info          |
                                                          |
                                                          V
                +----------------+               +----------------+
                |                |               |                |
                |   Logsender    | <------------ |  Download dir  |
                |                |               |                |
                +----------------+               +----------------+
                        |
 4. Parse log files;    |
 add required fields;   |
 send to Opensearch     |
                        |
                        v
               +-----------------+
               |                 |
               |   Opensearch    |
               |                 |
               +-----------------+

You can deploy your log workflow by using example Ansible playbook that you can find in ansible/playbooks/check-services-sender.yml in this project.

Testing

The part of OpenStack CI log processing runs a complete testing and continuous-integration environment, powered by Zuul.

Any changes to logscraper script or tests will trigger jobs to thoroughly test those changes.

To run a single test: tox -epy38 <test_name>

Benchmarking

The large Zuul CI deployments requires many CI log processing resources. In that case, we can do a benchmark with two log processing deployments. All tests will do same:

send 100 log builds to Elasticsearch that is running on same host

logscraper will be using 4 workers

VM will have 8 vcpus, 16 GB of RAM

Testing workflows:

loggearman and logstash

This workflow will spawn 3 additional loggearman workers because it this service is a bottleneck in that log ci workflow.

You can do it with command:

for i in {1..3}; do \
  podman run --network host -d --name loggearman-worker-$i \
   --volume /etc/loggearman:/etc/loggearman:z \
   --volume /var/log/loggearman:/var/log/loggearman:z \
   quay.io/software-factory/loggearman:latest \
   log-gearman-worker -c /etc/loggearman/worker.yml --foreground  -d /var/log/loggearman/worker.log

To remove:

for i in {1..3}; do \
  podman stop loggearman-worker-$i ; podman rm loggearman-worker-$i

On the end, basic calucations:

import datetime
start = datetime.datetime.fromisoformat("2022-02-28 16:44:59")
stop = datetime.datetime.fromisoformat("2022-02-28 16:46:01")
print((stop-start).total_seconds())

Time spend to run logscraper and wait for finish all loggearman workers took: 62 seconds and it takes 680MB of RAM.

logsender workflow

This workflow will only use logsender tool and it will push the logs directly to the Elasticsearch service. Same as in previous test, it will be executed on 4 processes.

To download logs:

logscraper \
 --zuul-api-url https://zuul.opendev.org/api/tenant/openstack \
 --checkpoint-file /tmp/results-checkpoint \
 --worker 8 \
 --max-skipped 100 \
 --download True \
 --directory /tmp/logscraper

This operation took: 30 seconds and it uses 130 MB of RAM.

logsender --username admin --password mypassword --host localhost --port 9200 --insecure --workers 4

Time spend to run logscraper and wait for finish all loggearman workers took: 35 second and it takes 520 MB of RAM.

Conclusion:

The logsender way seems to use less memory (on Opendev deployment, logstash service is on different host, but 4096 MB of RAM is not enough) and it is faster, but the logscraper and logsender process was executed one by one - on the beginning logscraper download logs, then logsender send them to Elasticsearch.

Continuous Deployment

Once changes are reviewed and committed, they will be applied automatically to the production hosts.

Contributing

Contributions are welcome!

Currently only unit tests are available. In the future, functional tests would be added.

Documentation

The latest documentation is available at http://docs.openstack.org/infra/ci-log-processing

That documentation is generated from this repository. You can generate it yourself with tox -e docs.