If an Ansible command task produces a very large amount of data, that will be sent back to the ansible-playbook process on the executor and deserialized from JSON. If it is sufficiently large, it may cause an OOM. While we have adjusted settings to encourage the oom- killer to kill ansible-playbook rather than zuul-executor, it's still not a great situation to invoke the oom-killer in the first place. To avoid this in what we presume is an obviously avoidable situation, we will limit the output sent back to the executor to 1GiB. This should be much larger than necessary, in fact, the limit may be too high, but it seem unlikely to be too low. Other methods were considered: limiting by 50% of the total ram on the executor (likely to produce a value even higher than 1GiB), or 50% of the available ram on the executor (may be too variable depending on executor load). In the end, 1GiB seems like a good starting point. Because this affects the structured data returned by ansible and that may be used by later tasks in the same playbook to check the returned values, if we hit this limit, we should consider the task a failure so that users do not inadvertently use invalid data (consider a task thatk checks for the presence of some token in stdout). To that end, if we hit the limit, we will kill the command process and raise an exception which will cause Ansible to fail the task (incidentally, it will not include the oversized stdout/ stderr). The cause of the error will be visible in the json and text output of the job. This is not a setting that users or operators should be adjusting, and normally we would not expose something like this through a configuration option. But because we will fail the tasks, we provide an escape valve for users who upgrade to this version and suddenly find they are relying on 1GiB+ stdout values. A deprecated configuration option is added to adjust the value used. We can remove it in a later major version of Zuul. While we're working on the command module, make it more memory-efficient for large values by using a BytesIO class instead of concatenating strings. This reduces by 1 the number of complete copies of the stdout/stderr values on the remote node (but does nothing for the ansible-playbook process on the executor). Finally, add a "-vvv" argument to the test invocation; this was useful in debugging this change and will likely be so for future changes. Change-Id: I3442b09946ecd0ad18817339b090e49f00d51e93
129 lines
5.4 KiB
YAML
129 lines
5.4 KiB
YAML
- hosts: controller
|
|
tasks:
|
|
- name: Set python path fact
|
|
set_fact:
|
|
# This value is used by Ansible to find the zuul.ansible code
|
|
# that Zuul's ansible plugins consume. It must be updated when
|
|
# the python version of the platform is changed.
|
|
python_path: "/usr/local/lib/python3.11/dist-packages"
|
|
|
|
- name: Run ansible that should succeed against testing console
|
|
command: >
|
|
/usr/lib/zuul/ansible/{{ zuul_ansible_version }}/bin/ansible-playbook
|
|
-vvv
|
|
-e "new_console=true"
|
|
src/opendev.org/zuul/zuul/playbooks/zuul-stream/fixtures/test-stream.yaml
|
|
environment:
|
|
# Setup by test-stream.yaml so we start a new zuul_console
|
|
# from this checkout.
|
|
ZUUL_CONSOLE_PORT: 19887
|
|
ZUUL_JOB_LOG_CONFIG: "{{ ansible_user_dir}}/logging.json"
|
|
ZUUL_JOBDIR: "{{ ansible_user_dir}}"
|
|
ZUUL_ANSIBLE_SPLIT_STREAMS: False
|
|
ZUUL_OUTPUT_MAX_BYTES: 1073741824
|
|
PYTHONPATH: "{{ python_path }}"
|
|
register: _success_output
|
|
|
|
- name: Save raw output to file
|
|
copy:
|
|
content: '{{ _success_output.stdout }}'
|
|
dest: 'console-job-output-success-19887.txt'
|
|
|
|
- name: Save output
|
|
shell: |
|
|
mv job-output.txt job-output-success-19887.txt
|
|
mv job-output.json job-output-success-19887.json
|
|
|
|
# Streamer puts out a line like
|
|
# [node1] Starting to log 916b2084-4bbb-80e5-248e-000000000016-1-node1 for task TASK: Print binary data
|
|
# One of the tasks in job-output shows find: results;
|
|
# the console file for this task should not be there.
|
|
- name: Validate temporary files removed
|
|
shell: |
|
|
for f in $(grep 'Starting to log' console-job-output-success-19887.txt | awk '{print $5}'); do
|
|
echo "Checking ${f}"
|
|
if grep -q '"path": "/tmp/console-'${f}'.log"' job-output-success-19887.txt; then
|
|
echo "*** /tmp/${f}.log still exists"
|
|
exit 1
|
|
fi
|
|
done
|
|
|
|
# NOTE(ianw) 2022-07 : we deliberatly have this second step to run
|
|
# against the console setup by the infrastructure executor in the
|
|
# job pre playbooks as a backwards compatability sanity check.
|
|
# The py27 container job (node3) is not running an existing
|
|
# console streamer, so that will not output anything -- limit this
|
|
# out.
|
|
- name: Run ansible that should succeed against extant console
|
|
command: >
|
|
/usr/lib/zuul/ansible/{{ zuul_ansible_version }}/bin/ansible-playbook
|
|
-e "new_console=false" --limit="node1,node2"
|
|
src/opendev.org/zuul/zuul/playbooks/zuul-stream/fixtures/test-stream.yaml
|
|
environment:
|
|
ZUUL_JOB_LOG_CONFIG: "{{ ansible_user_dir}}/logging.json"
|
|
ZUUL_JOBDIR: "{{ ansible_user_dir}}"
|
|
ZUUL_ANSIBLE_SPLIT_STREAMS: False
|
|
ZUUL_OUTPUT_MAX_BYTES: 1073741824
|
|
PYTHONPATH: "{{ python_path }}"
|
|
register: _success_output
|
|
|
|
- name: Save raw output to file
|
|
copy:
|
|
content: '{{ _success_output.stdout }}'
|
|
dest: 'console-job-output-success-19885.txt'
|
|
|
|
- name: Save output
|
|
shell: |
|
|
mv job-output.txt job-output-success-19885.txt
|
|
mv job-output.json job-output-success-19885.json
|
|
|
|
- name: Validate text outputs
|
|
include_tasks: validate.yaml
|
|
loop:
|
|
- { node: 'node1', filename: 'job-output-success-19887.txt' }
|
|
- { node: 'node2', filename: 'job-output-success-19887.txt' }
|
|
- { node: 'node1', filename: 'job-output-success-19885.txt' }
|
|
- { node: 'node2', filename: 'job-output-success-19885.txt' }
|
|
# node3 only listen on 19887
|
|
- { node: 'node3', filename: 'job-output-success-19887.txt' }
|
|
|
|
# This shows that zuul_console_disabled has activated and set the
|
|
# UUID to "skip"
|
|
- name: Validate json output
|
|
shell: |
|
|
egrep 'zuul_log_id": "skip"' job-output-success-19885.json
|
|
egrep 'zuul_log_id": "skip"' job-output-success-19887.json
|
|
|
|
# failure case
|
|
|
|
- name: Run ansible playbook that should fail
|
|
command: >
|
|
/usr/lib/zuul/ansible/{{ zuul_ansible_version }}/bin/ansible-playbook
|
|
src/opendev.org/zuul/zuul/playbooks/zuul-stream/fixtures/test-stream-failure.yaml
|
|
register: failed_results
|
|
failed_when: "failed_results.rc != 2"
|
|
environment:
|
|
ZUUL_CONSOLE_PORT: 19887
|
|
ZUUL_JOB_LOG_CONFIG: "{{ ansible_user_dir}}/logging.json"
|
|
ZUUL_JOBDIR: "{{ ansible_user_dir}}"
|
|
ZUUL_ANSIBLE_SPLIT_STREAMS: False
|
|
ZUUL_OUTPUT_MAX_BYTES: 1073741824
|
|
PYTHONPATH: "{{ python_path }}"
|
|
|
|
- name: Save output
|
|
shell: |
|
|
mv job-output.txt job-output-failure.txt
|
|
mv job-output.json job-output-failure.json
|
|
|
|
- name: Validate output - failure shell task with exception
|
|
shell: |
|
|
egrep "^.+\| node1 \| Exception: Test module failure exception fail-task" job-output-failure.txt
|
|
egrep "^.+\| node2 \| Exception: Test module failure exception fail-task" job-output-failure.txt
|
|
egrep "^.+\| node3 \| Exception: Test module failure exception fail-task" job-output-failure.txt
|
|
|
|
- name: Validate output - failure item loop with exception
|
|
shell: |
|
|
egrep "^.+\| node1 \| Exception: Test module failure exception fail-loop" job-output-failure.txt
|
|
egrep "^.+\| node2 \| Exception: Test module failure exception fail-loop" job-output-failure.txt
|
|
egrep "^.+\| node3 \| Exception: Test module failure exception fail-loop" job-output-failure.txt
|