For some time, we've had this timeout bug with tempurls which
we would occassionally encounter in CI the on variable
performance of CI.
What is basically happening
T-75 Seconds - Deploy is requested
T-65 Seconds - Configdrive regeneration starts
T-50 Seconds - Config Drive rebuilt, uploading to Swift.
T-10 Seconds - New Boot image is generated
T+0 Seconds - Power is turned on for the deploying machine
T+60 Seconds - Kernel starts
T+580 seconds - Ramdisk checks in with Ironic.
Keep in mind, we're already past the 600 seconds from the upload
which would occur by default for tempurls based upon the CI config.
So, then when we finally get to writing the configuration drive after
the OS image, we're around T+650, and now the configdrive is timed out.
This change adds 120 seconds to all default config drive timeouts
based upon the deploy_wait timeout, as the value is set by first
looking for a specific config drive tempurl timeout, then falls back
to being based upon the deploy wait timeout, and then finally a static
30 minute timeout.
CI, unfortunately has a set deploywait timeout as well, so... prior
to this change we would just use 600 seconds in CI, and then there
would be sadness.
Explicitly:
- Adds 120 seconds
- Sets a tempurl timeout in CI
- Adds a troubleshooting doc item for this issue.
- Updates the tests.
Claude was kind enough to take care of the unit tests for me.
Assisted-by: Claude Code - Claude Sonnet 4.5
Change-Id: Iecb7837816401b75bf953af763803e2432519c38
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>