Add spec for expose-auto-converge-post-copy
Change-Id: I08ab3fb105fd83a0b3095ea30a0f0518cfaa24ea
This commit is contained in:
261
specs/train/approved/expose-auto-converge-post-copy.rst
Normal file
261
specs/train/approved/expose-auto-converge-post-copy.rst
Normal file
@@ -0,0 +1,261 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==================================
|
||||
Expose auto converge and post copy
|
||||
==================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/expose-auto-converge-post-copy
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Currently auto converge and post copy can only be enabled/disabled via
|
||||
configuration, which is somewhat inflexible. If an application sensitive to
|
||||
reduced performance (some scientific computing applications may be more
|
||||
sensitive to memory access latency) is on a host with these options enabled,
|
||||
live migration may cause the application to raise an error. Therefore, the user
|
||||
wants to control whether to enable/disable auto converge or post copy during
|
||||
live migration.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
* Some applications do not want increased risk of being rebooted due to a
|
||||
network failure or memory page access failure during post-copy
|
||||
live-migration.
|
||||
|
||||
* Some applications are performance sensitive (such as some scientific
|
||||
computing applications); such applications do not want performance throttled
|
||||
back by the auto-converge feature during live-migration.
|
||||
|
||||
* Some applications would like to avoid reboot risk and performance
|
||||
throttling. If the network between two compute nodes is interrupted during
|
||||
post-copy live-migration, the live-migration will fail and the user will need
|
||||
to reset the instance to make it available. Therefore such applications do
|
||||
not want use both features during live-migration.
|
||||
|
||||
* For the above problems, the operator wants to control whether a single
|
||||
instance enables auto converge or post copy during live migration. But
|
||||
currently the minimum unit that can be controlled is the compute node.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Support for auto converge and post copy requires QEMU version >= 2.5.0. Since
|
||||
the Rocky release, the minimum required version of QEMU is 2.5.0 [1]_.
|
||||
Therefore, all compute nodes using the libvirt driver should support these
|
||||
features. There are flags from the libvirt ``virDomainMigrateFlags`` enum
|
||||
[2]_::
|
||||
|
||||
...
|
||||
VIR_MIGRATE_AUTO_CONVERGE = 8192
|
||||
VIR_MIGRATE_POSTCOPY = 32768
|
||||
...
|
||||
|
||||
The configurations ``live_migration_permit_auto_converge`` and
|
||||
``live_migration_permit_post_copy`` can only affect the hypervisor by
|
||||
modifying the configuration, but traits can affect a single instance.
|
||||
|
||||
In order to request the feature (scheduling an instance to nodes that provide
|
||||
the feature) we propose defining two new traits. The traits are reported by the
|
||||
libvirt driver, regardless of the conf:
|
||||
|
||||
* ``COMPUTE_MIGRATE_AUTO_CONVERGE``
|
||||
* ``COMPUTE_MIGRATE_POST_COPY``
|
||||
|
||||
Introduce two new flavor extra specs:
|
||||
|
||||
* ``compute:live_migration_auto_converge=true/false``
|
||||
* ``compute:live_migration_post_copy=true/false``
|
||||
|
||||
And introduce two new image properties:
|
||||
|
||||
* ``compute_live_migration_auto_converge=true/false``
|
||||
* ``compute_live_migration_post_copy=true/false``
|
||||
|
||||
Use these properties, instead of asking the operator to set
|
||||
``required``/``forbidden`` on the traits. Before calling placement, when
|
||||
``compute:live_migration_auto_converge=true`` or
|
||||
``compute:live_migration_post_copy=true``, we add required traits
|
||||
for the corresponding feature to the placement request. When
|
||||
``compute:live_migration_auto_converge=false`` and
|
||||
``compute:live_migration_post_copy=false``, we just add nothing to
|
||||
the placement request. Thus we still can schedule an instance on a host with
|
||||
the features but we disable these two features for that instance. We use these
|
||||
keys in the scheduler to optionally add required traits to ensure that the
|
||||
instance can land on a host that is capable of the requested behavior. The
|
||||
libvirt driver will then interpret the values to decide whether to use the
|
||||
features during live migration. For example, if the flavor says "false":
|
||||
|
||||
* We don't add the trait to the scheduling request, so the instance can land
|
||||
anywhere.
|
||||
* The driver will **not** use the feature for live-migrate, regardless of what
|
||||
the compute's config says.
|
||||
|
||||
By default, when the operator creates an instance without any related metadata,
|
||||
the scheduler will not care whether the host supports auto-converge or
|
||||
post-copy. If the configurations ``live_migration_permit_auto_converge`` or
|
||||
``live_migration_permit_post_copy`` are True, the libvirt driver will prefer to
|
||||
use auto-converge or post-copy. These can be used when the operator wants **all
|
||||
instances** on a given compute node to use auto-converge/post-copy. For
|
||||
example:
|
||||
|
||||
* If an instance that has not requested related metadata is scheduled to a host
|
||||
that enabled ``live_migration_permit_auto_converge`` or
|
||||
``live_migration_permit_post_copy``, then libvirt will try to use
|
||||
auto-converge or post-copy during live migration.
|
||||
|
||||
If the operator creates instance with
|
||||
``compute:live_migration_auto_converge`=true/false`` or
|
||||
``compute:live_migration_post_copy=true/false``,
|
||||
these metadata will override the configurations:
|
||||
``live_migration_permit_auto_converge`` or
|
||||
``live_migration_permit_post_copy``.
|
||||
|
||||
When ``compute:live_migration_auto_converge`` and
|
||||
``compute_live_migration_post_copy`` are both true or flavor extra specs
|
||||
is in conflict with image properties, the 'create' API call will raise an
|
||||
exception.
|
||||
|
||||
When using auto-converge during live migration, if the operator calls the force
|
||||
complete API, libvirt will not be converted to use post-copy because it's not
|
||||
required in flavor extra specs or image properties.
|
||||
|
||||
According to this spec [3]_, if post-copy is enabled during live migration, the
|
||||
abort API call will be rejected by libvirt driver. Now we can reject the
|
||||
request in the API by checking ``hw_live_migration_permit_reboot_risk``
|
||||
properties.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
Another method is to use traits in flavor extra_specs/image properties. This
|
||||
could work well when the operators need auto-converge/post-copy. But it can't
|
||||
be used to disable auto-converge/post-copy.
|
||||
Since the Rocky release, all libvirt hypervisor hosts support
|
||||
auto-converge/post-copy, which means every libvirt hypervisor host would have
|
||||
traits ``COMPUTE_MIGRATE_AUTO_CONVERGE`` and ``COMPUTE_MIGRATE_POST_COPY``.
|
||||
If operators want to not use auto-converge or post-copy, they would use
|
||||
forbidden traits: ``traits:COMPUTE_MIGRATE_AUTO_CONVERGE=forbidden`` or
|
||||
``traits:COMPUTE_MIGRATE_POST_COPY=forbidden``. Which means **don't** schedule
|
||||
my vm to the hosts who support auto-converge/post-copy, as the above says, this
|
||||
means that all libvirt compute nodes will be ignored. The result will be that
|
||||
the vm creation failed because the compute node can't be scheduled.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
Add the two image properties to the ImageMeta object:
|
||||
|
||||
* compute_live_migration_auto_converge
|
||||
* compute_live_migration_post_copy
|
||||
|
||||
The ImageMeta is stored in table instance_system_metadata, no schema
|
||||
modification is needed.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Upgrade impact
|
||||
--------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Ya Wang
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Support for new placement traits.
|
||||
|
||||
* Libvirt driver changes to report traits to placement, the traits will be
|
||||
reported by the libvirt driver as part of ``update_provider_tree``. This will
|
||||
*not* be added to the generic compute capabilities dict inherited by all the
|
||||
virt drivers because these traits are libvirt-specific.
|
||||
|
||||
* Scheduler changes to translate metadata to traits.
|
||||
|
||||
* Recalculate ``_live_migration_flags`` before live migration start in
|
||||
the libvirt driver.
|
||||
|
||||
* Add functional tests and unit tests.
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Unit tests and functional tests will be included to test the new functionality.
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
* The live migration document should be changed to introduce this new feature.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] https://wiki.openstack.org/wiki/LibvirtDistroSupportMatrix
|
||||
.. [2] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMigrateFlags
|
||||
.. [3] https://specs.openstack.org/openstack/nova-specs/specs/newton/implemented/auto-live-migration-completion.html
|
||||
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Train
|
||||
- Introduced
|
||||
Reference in New Issue
Block a user