Add release note for live_migration_progress_timeout issue

live_migration_progress_timeout aims to timeout a live-migration well
before the live_migration_completion_timeout limit, by looking for when
it appears that no progress has been made copying the memory between the
hosts. However, it turns out there are several problems with the way we
monitor progress. In production, and stress testing, having
live_migration_progress_timeout > 0 has caused random timeout failures
for live-migrations that take longer than live_migration_progress_timeout

One problem is that block_migrations appear to show no progress, as it
seems we only look for progress in copying memory. Also the way we query
QEMU via libvirt breaks when there are multiple iterations of memory
copying.

We need to revisit this bug and either fix the progress mechanism or
remove all the code that checks for the progress (including the
automatic trigger for post-copy). But in the mean time we recommend
disabling the timeout, and in Ocata and Pike we have already deprecated
the live_migration_progress_timeout configuration option.

Co-Authored-By: John Garbutt <john.garbutt@rackspace.com>
Change-Id: Ib86ee25f2ccf841a8cc9e70acf7d9d1de359e671
Related-Bug: #1644248
This commit is contained in:
Chris Friesen 2017-02-27 11:51:03 -06:00 committed by Matt Riedemann
parent 642caf0c58
commit 64a482c24d
1 changed files with 13 additions and 0 deletions

View File

@ -0,0 +1,13 @@
---
issues:
- |
The live-migration progress timeout controlled by the configuration option
``[libvirt]/live_migration_progress_timeout`` has been discovered to
frequently cause live-migrations to fail with a progress timeout error,
even though the live-migration is still making good progress.
To minimize problems caused by these checks we recommend setting the value
to 0, which means do not trigger a timeout. (This has been made the
default in Ocata and Pike.)
To modify when a live-migration will fail with a timeout error, please now
look at ``[libvirt]/live_migration_completion_timeout`` and
``[libvirt]/live_migration_downtime``.