Files
zuul/playbooks/zuul-stream/functional.yaml
James E. Blair 93f102d546 Limit command stdout/stderr to 1GiB
If an Ansible command task produces a very large amount of data,
that will be sent back to the ansible-playbook process on the executor
and deserialized from JSON.  If it is sufficiently large, it may
cause an OOM.  While we have adjusted settings to encourage the oom-
killer to kill ansible-playbook rather than zuul-executor, it's still
not a great situation to invoke the oom-killer in the first place.

To avoid this in what we presume is an obviously avoidable situation,
we will limit the output sent back to the executor to 1GiB.  This
should be much larger than necessary, in fact, the limit may be too
high, but it seem unlikely to be too low.

Other methods were considered: limiting by 50% of the total ram on
the executor (likely to produce a value even higher than 1GiB), or
50% of the available ram on the executor (may be too variable depending
on executor load).  In the end, 1GiB seems like a good starting point.

Because this affects the structured data returned by ansible and
that may be used by later tasks in the same playbook to check
the returned values, if we hit this limit, we should consider the
task a failure so that users do not inadvertently use invalid data
(consider a task thatk checks for the presence of some token in
stdout).  To that end, if we hit the limit, we will kill the command
process and raise an exception which will cause Ansible to fail
the task (incidentally, it will not include the oversized stdout/
stderr).  The cause of the error will be visible in the json and
text output of the job.

This is not a setting that users or operators should be adjusting,
and normally we would not expose something like this through a
configuration option.  But because we will fail the tasks, we provide
an escape valve for users who upgrade to this version and suddenly
find they are relying on 1GiB+ stdout values.  A deprecated configuration
option is added to adjust the value used.  We can remove it in a
later major version of Zuul.

While we're working on the command module, make it more memory-efficient
for large values by using a BytesIO class instead of concatenating
strings.  This reduces by 1 the number of complete copies of the
stdout/stderr values on the remote node (but does nothing for
the ansible-playbook process on the executor).

Finally, add a "-vvv" argument to the test invocation; this was useful
in debugging this change and will likely be so for future changes.

Change-Id: I3442b09946ecd0ad18817339b090e49f00d51e93
2025-01-16 13:50:58 -08:00

129 lines
5.4 KiB
YAML

- hosts: controller
tasks:
- name: Set python path fact
set_fact:
# This value is used by Ansible to find the zuul.ansible code
# that Zuul's ansible plugins consume. It must be updated when
# the python version of the platform is changed.
python_path: "/usr/local/lib/python3.11/dist-packages"
- name: Run ansible that should succeed against testing console
command: >
/usr/lib/zuul/ansible/{{ zuul_ansible_version }}/bin/ansible-playbook
-vvv
-e "new_console=true"
src/opendev.org/zuul/zuul/playbooks/zuul-stream/fixtures/test-stream.yaml
environment:
# Setup by test-stream.yaml so we start a new zuul_console
# from this checkout.
ZUUL_CONSOLE_PORT: 19887
ZUUL_JOB_LOG_CONFIG: "{{ ansible_user_dir}}/logging.json"
ZUUL_JOBDIR: "{{ ansible_user_dir}}"
ZUUL_ANSIBLE_SPLIT_STREAMS: False
ZUUL_OUTPUT_MAX_BYTES: 1073741824
PYTHONPATH: "{{ python_path }}"
register: _success_output
- name: Save raw output to file
copy:
content: '{{ _success_output.stdout }}'
dest: 'console-job-output-success-19887.txt'
- name: Save output
shell: |
mv job-output.txt job-output-success-19887.txt
mv job-output.json job-output-success-19887.json
# Streamer puts out a line like
# [node1] Starting to log 916b2084-4bbb-80e5-248e-000000000016-1-node1 for task TASK: Print binary data
# One of the tasks in job-output shows find: results;
# the console file for this task should not be there.
- name: Validate temporary files removed
shell: |
for f in $(grep 'Starting to log' console-job-output-success-19887.txt | awk '{print $5}'); do
echo "Checking ${f}"
if grep -q '"path": "/tmp/console-'${f}'.log"' job-output-success-19887.txt; then
echo "*** /tmp/${f}.log still exists"
exit 1
fi
done
# NOTE(ianw) 2022-07 : we deliberatly have this second step to run
# against the console setup by the infrastructure executor in the
# job pre playbooks as a backwards compatability sanity check.
# The py27 container job (node3) is not running an existing
# console streamer, so that will not output anything -- limit this
# out.
- name: Run ansible that should succeed against extant console
command: >
/usr/lib/zuul/ansible/{{ zuul_ansible_version }}/bin/ansible-playbook
-e "new_console=false" --limit="node1,node2"
src/opendev.org/zuul/zuul/playbooks/zuul-stream/fixtures/test-stream.yaml
environment:
ZUUL_JOB_LOG_CONFIG: "{{ ansible_user_dir}}/logging.json"
ZUUL_JOBDIR: "{{ ansible_user_dir}}"
ZUUL_ANSIBLE_SPLIT_STREAMS: False
ZUUL_OUTPUT_MAX_BYTES: 1073741824
PYTHONPATH: "{{ python_path }}"
register: _success_output
- name: Save raw output to file
copy:
content: '{{ _success_output.stdout }}'
dest: 'console-job-output-success-19885.txt'
- name: Save output
shell: |
mv job-output.txt job-output-success-19885.txt
mv job-output.json job-output-success-19885.json
- name: Validate text outputs
include_tasks: validate.yaml
loop:
- { node: 'node1', filename: 'job-output-success-19887.txt' }
- { node: 'node2', filename: 'job-output-success-19887.txt' }
- { node: 'node1', filename: 'job-output-success-19885.txt' }
- { node: 'node2', filename: 'job-output-success-19885.txt' }
# node3 only listen on 19887
- { node: 'node3', filename: 'job-output-success-19887.txt' }
# This shows that zuul_console_disabled has activated and set the
# UUID to "skip"
- name: Validate json output
shell: |
egrep 'zuul_log_id": "skip"' job-output-success-19885.json
egrep 'zuul_log_id": "skip"' job-output-success-19887.json
# failure case
- name: Run ansible playbook that should fail
command: >
/usr/lib/zuul/ansible/{{ zuul_ansible_version }}/bin/ansible-playbook
src/opendev.org/zuul/zuul/playbooks/zuul-stream/fixtures/test-stream-failure.yaml
register: failed_results
failed_when: "failed_results.rc != 2"
environment:
ZUUL_CONSOLE_PORT: 19887
ZUUL_JOB_LOG_CONFIG: "{{ ansible_user_dir}}/logging.json"
ZUUL_JOBDIR: "{{ ansible_user_dir}}"
ZUUL_ANSIBLE_SPLIT_STREAMS: False
ZUUL_OUTPUT_MAX_BYTES: 1073741824
PYTHONPATH: "{{ python_path }}"
- name: Save output
shell: |
mv job-output.txt job-output-failure.txt
mv job-output.json job-output-failure.json
- name: Validate output - failure shell task with exception
shell: |
egrep "^.+\| node1 \| Exception: Test module failure exception fail-task" job-output-failure.txt
egrep "^.+\| node2 \| Exception: Test module failure exception fail-task" job-output-failure.txt
egrep "^.+\| node3 \| Exception: Test module failure exception fail-task" job-output-failure.txt
- name: Validate output - failure item loop with exception
shell: |
egrep "^.+\| node1 \| Exception: Test module failure exception fail-loop" job-output-failure.txt
egrep "^.+\| node2 \| Exception: Test module failure exception fail-loop" job-output-failure.txt
egrep "^.+\| node3 \| Exception: Test module failure exception fail-loop" job-output-failure.txt