tripleo-upgrade/tasks/common/l3_agent_connectivity_check_start_script.yml
Sofer Athlan-Guyot a87fcb7ef4 Improve ping test coverage during update.
The ping test starts at the beginning of the "update run" phase and
stops after it finishes. This means after all roles has been updated.

With this patch we stop, test and restart the ping in-between each
role update.

This means:

 1. that we detect error earlier;
 2. we detect error related to new flow being created during update
    run;

The point 2. was discovered to be an important test as ovn can have
existing flow still working, but new flow breaking. With this new
behavior for the ping test we catch such error.

The downside is that we have even more sensible to any % based testing
as the same number of error will give you an higher percentage as we
spend less time in the test for each run. This could be seen as
another improvement.

We're splitting the ping test into two stages so that if the ping
fails to start (as it would be for this particular issue) we would
detect it immediately instead of waiting for the end of the run.

When we doing batch update (all roles in parallel) we deactivate that
mechanism and fall back to the previous one as there is no in-between
role step there.

We also prevent the stop ping from searching into all home
subdirectory as I had an issue in local testing where one subdirectory
had unreadable files (after a local podman run). This shouldn't happen
in CI, but is good to have for local testing.

Change-Id: I7f30f5361773b96de13325f5038c89477b575e65
2020-11-25 22:31:22 +01:00

15 lines
380 B
YAML

---
- name: l3 agent connectivity wait until vm is ready
shell: |
source {{ overcloud_rc }}
{{ l3_agent_connectivity_check_wait_script }}
when: l3_agent_connectivity_check
- name: start l3 agent connectivity check
shell: |
source {{ overcloud_rc }}
{{ l3_agent_connectivity_check_start_script }}
when: l3_agent_connectivity_check
async: 21660
poll: 0