Default live_migration_progress_timeout to off
live_migration_progress_timeout aims to timeout a live-migration well before the live_migration_completion_timeout limit, by looking for when it appears that no progress has been made copying the memory between the hosts. However, it turns out there are several problems with the way we monitor progress. In production, and stress testing, having live_migration_progress_timeout > 0 has caused random timeout failures for live-migrations that take longer than live_migration_progress_timeout One problem is that block_migrations appear to show no progress, as it seems we only look for progress in copying memory. Also the way we query QEMU via libvirt breaks when there are multiple iterations of memory copying. We need to revisit this bug and either fix the progress mechanism or remove the all the code that checks for the progress (including the automatic trigger for post-copy). But in the mean time, lets default to having no timeout, and warn users that have overridden this configuration by deprecating the live_migration_progress_timeout configuration option. For users concerned about live-migration timeout errors, I have cleaned up the configuration option descriptions, so they have a better chance of stopping the live-migration timeout errors they may come across. Related-Bug: #1644248 Change-Id: I1a1143ddf8da5fb9706cf53dbfd6cbe84e606ae1
This commit is contained in:
parent
f40467b0eb
commit
510fe1353d
@ -340,8 +340,14 @@ Please refer to the libvirt documentation for further details.
|
||||
Maximum permitted downtime, in milliseconds, for live migration
|
||||
switchover.
|
||||
|
||||
Will be rounded up to a minimum of %dms. Use a large value if guest liveness
|
||||
is unimportant.
|
||||
Will be rounded up to a minimum of %dms. You can increase this value
|
||||
if you want to allow live-migrations to complete faster, or avoid
|
||||
live-migration timeout errors by allowing the guest to be paused for
|
||||
longer during the live-migration switch over.
|
||||
|
||||
Related options:
|
||||
|
||||
* live_migration_completion_timeout
|
||||
""" % LIVE_MIGRATION_DOWNTIME_MIN),
|
||||
# TODO(hieulq): Need to add min argument by moving from
|
||||
# LIVE_MIGRATION_DOWNTIME_STEPS_MIN constant.
|
||||
@ -373,16 +379,27 @@ data before aborting the operation.
|
||||
Value is per GiB of guest RAM + disk to be transferred, with lower bound of
|
||||
a minimum of 2 GiB. Should usually be larger than downtime delay * downtime
|
||||
steps. Set to 0 to disable timeouts.
|
||||
Default is 800.
|
||||
|
||||
Related options:
|
||||
|
||||
* live_migration_downtime
|
||||
* live_migration_downtime_steps
|
||||
* live_migration_downtime_delay
|
||||
"""),
|
||||
cfg.IntOpt('live_migration_progress_timeout',
|
||||
default=150,
|
||||
default=0,
|
||||
deprecated_for_removal=True,
|
||||
deprecated_reason="Serious bugs found in this feature.",
|
||||
mutable=True,
|
||||
help="""
|
||||
Time to wait, in seconds, for migration to make forward progress in
|
||||
transferring data before aborting the operation.
|
||||
|
||||
Set to 0 to disable timeouts.
|
||||
|
||||
This is deprecated, and now disabled by default because we have found serious
|
||||
bugs in this feature that caused false live-migration timeout failures. This
|
||||
feature will be removed or replaced in a future release.
|
||||
"""),
|
||||
cfg.BoolOpt('live_migration_permit_post_copy',
|
||||
default=False,
|
||||
|
@ -0,0 +1,16 @@
|
||||
---
|
||||
issues:
|
||||
- |
|
||||
The live-migration progress timeout controlled by the configuration option
|
||||
``[libvirt]/live_migration_progress_timeout`` has been discovered to
|
||||
frequently cause live-migrations to fail with a progress timeout error,
|
||||
even though the live-migration is still making good progress.
|
||||
To minimize problems caused by these checks we have changed the default
|
||||
to 0, which means do not trigger a timeout.
|
||||
To modify when a live-migration will fail with a timeout error, please now
|
||||
look at ``[libvirt]/live_migration_completion_timeout`` and
|
||||
``[libvirt]/live_migration_downtime``.
|
||||
deprecations:
|
||||
- |
|
||||
``[libvirt]/live_migration_progress_timeout`` has been deprecated as this
|
||||
feature has been found not to work. See bug 1644248 for more details.
|
Loading…
Reference in New Issue
Block a user