We occasionally see the this task fail for the first element in the
zuul.projects list with a MODULE FAILURE and a return code of -13
(SIGPIPE) [1]. So far we couldn't identify the root cause, so try to
mitigate this issue by retrying on failure. This solution is similar to
the one used for the "Synchronize repos" task[2].
There is a bug report in Ansible that fits
Since it's only the first element in the loop that is failing while
subsequent elements are successful, we currently have two assumptions:
1. As the task before is using a `delegate_to: localhost' [3],
there might be a problem with Ansible when switching the connection
from localhost to the remote host (node).
2. Since the task before is using the same SSH connection [4] that is
used by Ansible to push the git repository, there might be some
"leftovers" on the connection that make the next task fail.
3. There is also a bug report in Ansible [5] which might be causing
that error.
[1]:
{
"ansible_loop_var": "zj_project",
"changed": false,
"failed": true,
"module_stderr": "",
"module_stdout": "",
"msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
"rc": -13,
"zj_project": {...}
}
[2]: 3b3495e255/roles/mirror-workspace-git-repos/tasks/main.yaml (L32)
[3]: 3b3495e255/roles/mirror-workspace-git-repos/tasks/main.yaml (L25)
[4]: 3b3495e255/roles/mirror-workspace-git-repos/tasks/main.yaml (L16)
[5]: https://github.com/ansible/ansible/issues/81777
Change-Id: I0c4cb87bb076b9b40c9c446dbe5db437daff5897