nova/releasenotes/notes/deprecate-live-migration-progress-timeout-b4640047dc5c8eed.yaml
John Garbutt 510fe1353d Default live_migration_progress_timeout to off
live_migration_progress_timeout aims to timeout a live-migration well
before the live_migration_completion_timeout limit, by looking for when
it appears that no progress has been made copying the memory between the
hosts. However, it turns out there are several problems with the way we
monitor progress. In production, and stress testing, having
live_migration_progress_timeout > 0 has caused random timeout failures
for live-migrations that take longer than live_migration_progress_timeout

One problem is that block_migrations appear to show no progress, as it
seems we only look for progress in copying memory. Also the way we query
QEMU via libvirt breaks when there are multiple iterations of memory
copying.

We need to revisit this bug and either fix the progress mechanism or
remove the all the code that checks for the progress (including the
automatic trigger for post-copy). But in the mean time, lets default to
having no timeout, and warn users that have overridden this
configuration by deprecating the live_migration_progress_timeout
configuration option.

For users concerned about live-migration timeout errors, I have
cleaned up the configuration option descriptions, so they have a better
chance of stopping the live-migration timeout errors they may come
across.

Related-Bug: #1644248

Change-Id: I1a1143ddf8da5fb9706cf53dbfd6cbe84e606ae1
2017-02-08 18:19:12 +00:00

17 lines
794 B
YAML

---
issues:
- |
The live-migration progress timeout controlled by the configuration option
``[libvirt]/live_migration_progress_timeout`` has been discovered to
frequently cause live-migrations to fail with a progress timeout error,
even though the live-migration is still making good progress.
To minimize problems caused by these checks we have changed the default
to 0, which means do not trigger a timeout.
To modify when a live-migration will fail with a timeout error, please now
look at ``[libvirt]/live_migration_completion_timeout`` and
``[libvirt]/live_migration_downtime``.
deprecations:
- |
``[libvirt]/live_migration_progress_timeout`` has been deprecated as this
feature has been found not to work. See bug 1644248 for more details.