Currently the Nova provides single VM snapshot API (createImage), which will take a consistency snapshot of a VM and regarding volumes, and will quiesce unquiesce VM automatily with guest agent support.This method is good for single VM consistency snapshot, but no way to make consistency snapshot for an application which consists of multiple VMs. Provide quiesce unquiesce API from Nova, to make consistency snapshot of an application, which consists of a group of VMs, for disaster recovery purpose from another site. Atomic quiesce / unquiesce API will allow to make snapshot of a group of VMs in a transaction way, for example, quiesce VM1, quiesce VM2, quiesce VM3, snapshot VM1's volumes, snapshot VM2's volumes, snapshot VM3's volumes, unquiesce VM3, unquiesce VM2, unquiesce VM1. For some telecom application, the order is important for a group of VMs with strong relationship. APIImpact: Expose quiesce unquiesce API DocImpact: Expose quiesce unquiesce API Blueprint: https://blueprints.launchpad.net/nova/+spec/expose-quiesce-unquiesce-api Change-Id: I3cc247fba7a07dceb42704022444a450c31ea0e8 Signed-off-by: Chaoyi Huang <joehuang@huawei.com>
255 lines
7.8 KiB
ReStructuredText
255 lines
7.8 KiB
ReStructuredText
..
|
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
|
License.
|
|
|
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
|
|
|
=====================================================================
|
|
Expose Quiesce Unquiesce API
|
|
=====================================================================
|
|
|
|
https://blueprints.launchpad.net/nova/+spec/expose-quiesce-unquiesce-api
|
|
|
|
Provide quiesce unquiesce API from Nova, to make consistency snapshot of
|
|
an application, which consists of a group of VMs, for disaster recovery
|
|
purpose from another site.
|
|
|
|
Problem description
|
|
===================
|
|
Currently the Nova provides single VM snapshot API (createImage), which will
|
|
take a consistency snapshot of a VM and regarding volumes, and will quiesce
|
|
unquiesce VM automatily with guest agent support.This method is good
|
|
for single VM consistency snapshot, but no way to make consistency
|
|
snapshot for an application which othen consists of multiple VMs.
|
|
|
|
Use Cases
|
|
---------
|
|
|
|
In NFV scenario, a VNF (telecom application) often consists of a group
|
|
of VMs. To make it be able to restore in another site for catastrophic
|
|
failures happened, this group of VMs snapshot/backup/restore should be done
|
|
in a transaction way to guarantee the application level consistency but not
|
|
only on single VM level : for example, quiesce VM1, quiesce VM2, quiesce VM3,
|
|
snapshot VM1's volumes, snapshot VM2's volumes, snapshot VM3's volumes,
|
|
unquiesce VM3, unquiesce VM2, unquiesce VM1. For some telecom application,
|
|
the order is very important for a group of VMs with strong relationship.
|
|
|
|
Therefore the OPNFV multsite project expects Nova to provide quiesce
|
|
unquiesce API, to make consistency snapshot of a group of VMs in a transaction
|
|
way is possible (but not only one single VM instead).
|
|
|
|
The disater recovery process will work like this:
|
|
|
|
1).DR(Geo site disaster recovery )software get the volumes for each VM
|
|
in the VNF from Nova
|
|
|
|
2).DR software call Nova quiesce API to quarantee quiecing VMs in desired
|
|
order
|
|
|
|
3).DR software takes snapshots of these volumes in Cinder (NOTE: Because
|
|
storage often provides fast snapshot, so the duration between quiece and
|
|
unquiece is a short interval)
|
|
|
|
4).DR software call Nova unquiece API to unquiece VMs of the VNF in reverse
|
|
order
|
|
|
|
5).DR software create volumes from the snapshots just taken in Cinder
|
|
|
|
6).DR software create backup (incremental) for these volumes to remote
|
|
backup storage ( swift or ceph, or.. ) in Cinder
|
|
|
|
7).if this site failed,
|
|
|
|
7.1)DR software restore these backup volumes in remote Cinder in the
|
|
backup site.
|
|
|
|
7.2)DR software boot VMs from bootable volumes from the remote Cinder in
|
|
the backup site and attach the regarding data volumes.
|
|
|
|
Note: It's up to the DR policy and VNF character how to use the API. Some
|
|
VNF may allow the standby of the VNF or member of the cluster to do
|
|
quiece/unquiece to avoid interfering the service provided by the VNF.
|
|
Some other VNF may afford short unavailable for DR purpose.
|
|
|
|
Not only a VNF (telecom application) can benefit from the API, but also it
|
|
should be usable by any other application for consistency snapshot on
|
|
application level.
|
|
|
|
|
|
Project Priority
|
|
----------------
|
|
|
|
None
|
|
|
|
Proposed change
|
|
===============
|
|
|
|
Expose 'quiesce' and 'unquiesce' admin API actions for DR software to make
|
|
application level consistency snapshot for application disater recovery
|
|
purpose.
|
|
|
|
'quiesce' and 'unquiesce' has already been implemented in VM createImage,
|
|
but no API exposed. It is only applied in single VM snapshot scenario.
|
|
|
|
The prerequisites of this feature is the hypervisor driver supports this
|
|
operation and with guest agent installed and enbaled.
|
|
|
|
This BP mainly focuses on Nova-API part to expose the API, nova.virt
|
|
driver.py has already provided the interface 'quiesce' 'unquiesce', some
|
|
other hypervisor drivers may support this feature now or in the future, it
|
|
should be out of the scope of this BP.
|
|
|
|
The 'quiesce' and 'unquiesce' API should work in asyn. way, that means the
|
|
caller of the API should check to see whether the operation finished
|
|
successfully. And the DR software to guarantee the API calling order for
|
|
multiple VMs' quiescing unquiescing.
|
|
|
|
One vm_state 'quiesced' will be added. Two task_state 'quiescing',
|
|
'unquiescing' will be added too.
|
|
|
|
Requirements for commands:
|
|
Command Req.d VM States Req.d Task States Target State
|
|
quiesce active None quiesced
|
|
unquiesce quiesced None active
|
|
|
|
VM states and possible commands
|
|
VM State Commands
|
|
quiesced unquiesce
|
|
|
|
If the hypervisor does not support quiesce,unquiesce, the VM state should
|
|
be kept as active, and the task_state will be set to None, and use instance
|
|
action to tell user what happened.
|
|
|
|
If there is expecetion captured during the quiesce, unquiesce action, the
|
|
VM state will be set to error, and the exception will be saved to the DB
|
|
as other operation.
|
|
|
|
No matter in quiesced or ERROR state, the admin reset VM state action will
|
|
take the VM to desired state.
|
|
|
|
Alternatives
|
|
------------
|
|
1. Usually Nova API will manipulate one VM per action. One proposal is
|
|
to expose quiesce, unquiesce single API action on multiple VMs in order,
|
|
this will break Nova API fasion and leads to implementation complexity,
|
|
especially under cells deployment.
|
|
2. Another proposal is to make quiesce, unquiesce API work in "sync." way
|
|
due to the short execution time of the quiesce,unquiesce. "sync"
|
|
implementation is not the fasion in web service and Nova API.
|
|
|
|
Data model impact
|
|
-----------------
|
|
|
|
None
|
|
|
|
REST API impact
|
|
---------------
|
|
|
|
* URL:
|
|
* /v2/{tenant_id}/servers/{server_id}/action:
|
|
* /v2.1/servers/{server_id}/action/{server_id}/action:
|
|
|
|
* Request method:
|
|
* POST
|
|
|
|
* JSON request body for 'quiesce'::
|
|
|
|
{
|
|
"quiesce": null
|
|
}
|
|
|
|
* JSON request body for 'unquiesce'::
|
|
|
|
{
|
|
"unquiesce": null
|
|
}
|
|
|
|
* This operation does not return a response body
|
|
|
|
* Normal response code:
|
|
* 202: Accepted
|
|
|
|
* Error response codes:
|
|
* 409: Invalid instance state. Quiece expects the VM is in active state
|
|
before the command to be executed, for unquiece, quiesced state is
|
|
expected. The VM state other than the state mentioned above will lead
|
|
to the 409 response.
|
|
|
|
Security impact
|
|
---------------
|
|
|
|
None
|
|
|
|
Notifications impact
|
|
--------------------
|
|
|
|
None
|
|
|
|
Other end user impact
|
|
---------------------
|
|
|
|
None
|
|
|
|
Performance Impact
|
|
------------------
|
|
While taking quiece, disk writes from the instance are blocked.
|
|
|
|
Other deployer impact
|
|
---------------------
|
|
|
|
None
|
|
|
|
Developer impact
|
|
----------------
|
|
|
|
None
|
|
|
|
Implementation
|
|
==============
|
|
|
|
Assignee(s)
|
|
-----------
|
|
|
|
Primary assignee:
|
|
joehuang
|
|
|
|
Work Items
|
|
----------
|
|
|
|
1. Add 'quiesce' and 'unquiesce' server admin actions APIs for Nova
|
|
|
|
Dependencies
|
|
============
|
|
|
|
None
|
|
|
|
Testing
|
|
=======
|
|
|
|
1. Live quiece/unquice of VMs with a guest booted with qemu-guest-agent should
|
|
be added to scenario tests.
|
|
2. A tempest test should also be added for this.
|
|
3. Note that it requires environment with hypervisor supports the action.
|
|
|
|
Documentation Impact
|
|
====================
|
|
|
|
New REST APIs (server admin actions) should be added to the API documentation.
|
|
Also, need to document how to use this feature in the operation guide (which
|
|
currently recommends you use the fsfreeze tool manually, or invisible in VM
|
|
createImage action).
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] nova-specs: 'Quiesce filesystems with QEMU guest agent during image
|
|
snapshot':
|
|
https://review.openstack.org/#/c/126966/
|
|
|
|
.. [2] 'quiesce' and 'unquiesce' methods for libvirt driver:
|
|
https://blueprints.launchpad.net/nova/+spec/quiesced-image-snapshots-with-qemu-guest-agen/atomic/async
|
|
|
|
.. [3] a VNF (telecom application) should, be able to restore in another site
|
|
for catastrophic failures happened
|
|
https://git.opnfv.org/cgit/multisite/tree/multisite-vnf-gr-requirement.rst
|