Move implemented mitaka specs
Note that resource-classes was already moved but the redirects file wasn't updated, that's fixed here. There are some partial blueprints that were marked completed in mitaka and are still being worked in newton, like the config option work. I've moved those to implemented here also. Change-Id: I16f279b4794127cb7abc40ffc22cc237702d14ed
This commit is contained in:
338
specs/mitaka/implemented/abort-live-migration.rst
Normal file
338
specs/mitaka/implemented/abort-live-migration.rst
Normal file
@@ -0,0 +1,338 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
================================================
|
||||
Provide a way to abort an ongoing live migration
|
||||
================================================
|
||||
|
||||
Blueprint:
|
||||
https://blueprints.launchpad.net/nova/+spec/abort-live-migration
|
||||
|
||||
At present, intervention at the hypervisor level is required to cancel
|
||||
a live migration. This spec proposes adding a new operation on the
|
||||
instance object to cancel a live migration of that instance.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
It may be that an operator decides, after starting a live migration,
|
||||
that they would like to cancel it. Effectively this would mean
|
||||
rolling-back any partial migration that has happened and leaving the
|
||||
instance on the source node. It may be that the migration is taking too
|
||||
long, or some operational problem is discovered with the target node.
|
||||
As the set of operations that can be performed on an instance during
|
||||
live migration is restricted (only delete is currently allowed), it may
|
||||
be that an instance owner has requested that their instance be
|
||||
made available urgently.
|
||||
|
||||
Currently aborting a live migration requires intervention at the
|
||||
hypervisor level, which Nova recognises and resets the instance state.
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
As an operator of an OpenStack cloud, I would like the ability to
|
||||
query, stop and roll back an ongoing live migration. This is required
|
||||
for a number of reasons.
|
||||
|
||||
1. The migration may be failing to complete due to the instance's
|
||||
workload. In some cases the solution to this issue may be to pause
|
||||
the instance but in other cases the migration may need to be
|
||||
abandoned or at least postponed.
|
||||
2. The migration may be having an adverse impact on the instance,
|
||||
i.e. the instance owner may be observing degraded performance of
|
||||
their application and be requesting that the cloud operator address
|
||||
this issue.
|
||||
3. The instance migration may be taking too long due to the large
|
||||
amount of data to be copied (i.e. the instance's ephemeral disk is
|
||||
very full) and the cloud operator may have consulted with the
|
||||
instance owner and decided to abandon the live migration and employ
|
||||
a different strategy. For example, stop the instance, perform the
|
||||
hypervisor maintenance, then restart the instance.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
New API operations on the instance object are proposed which can be used
|
||||
to obtain details of migration operations on the instance and abort
|
||||
an active operation. This will include a GET to obtain details of
|
||||
migration operations. If the instance does not exist (or is not
|
||||
visible to the tenant id being used) or has not been the subject of any
|
||||
migrations the GET will return a 404 response code. If the GET
|
||||
returns details of an active migration, a DELETE can be used to abort
|
||||
the migration operation. Again, if the instance does not exist (as in
|
||||
the case where it has been deleted since the GET call) or no migration
|
||||
is in progress (i.e. it is ended since the GET call) the DELETE will
|
||||
return a 404 response code. Otherwise it will return a 202 response
|
||||
code.
|
||||
|
||||
Rolling back a live migration should be very quick, as the source host
|
||||
is still active until the migration finishes. However this depends on
|
||||
the approach implemented by the virtualization driver. For example Qemu
|
||||
is planning to implement a 'post copy' feature -
|
||||
https://www.redhat.com/archives/libvir-list/2014-December/msg00093.html
|
||||
In this situation a cancellation request should be declined because
|
||||
rolling back to the source node would be more work than completing the
|
||||
migration. In fact it is probably impossible! Nova would need to be
|
||||
involved in the switch from pre-copy to post-copy so that it could
|
||||
switch the networking to the target host. Thus nova would know that the
|
||||
instance has switched and decline any cancellation requests. If the
|
||||
instance migration were to encounter difficulties completing during the
|
||||
post copy the instance would need to be paused to allow the migration
|
||||
to complete.
|
||||
|
||||
The GET /servers/{id}/migrations operation will entail the API server
|
||||
verifying the existence and task state of the instance. If the
|
||||
instance does not exist (or is not visible to the user invoking this
|
||||
operation) a 404 response code will be returned. Otherwise the API
|
||||
server will return details of all the running migration operations for
|
||||
the instance. It will use an new method on the migration class called
|
||||
get_by_instance_and_status specifying the instance uuid and status of
|
||||
running. If no migration objects are returned an empty list will be
|
||||
returned in the API response. If one or more migration object is
|
||||
returned then the new_instance_type_id and old_instance_type_id fields
|
||||
will be used to retrieve flavor objects for the relevant flavors to
|
||||
obtain the falvor id. These values will be included in the response
|
||||
as new_flavor_id and old_flavor_id. This will mean that a user will be
|
||||
able to use this information to obtain details of the flavors.
|
||||
|
||||
The DELETE /servers/{id}/migrations/{id} operation will entail the API
|
||||
server calling the migration_get method on the migration class to
|
||||
verify the existence of an ongoing live migration operation on the
|
||||
instance. It will then call a method on the ServersController class
|
||||
called live_migrate_abort
|
||||
|
||||
If the invoking user does not have authority to perform the operation
|
||||
(as defined in the policy.json file) then a 403 response code will be
|
||||
returned. The policy.json file will be updated to define the
|
||||
live_migrate_abort as accessible to cloud admin users only.
|
||||
|
||||
If the API server determines that the operation can proceed it will
|
||||
send an async message to the compute manager and return a 202
|
||||
response code to the user.
|
||||
|
||||
The compute manager will emit a notification message indicating that
|
||||
the live_migrate_abort operation has started. It will then invoke a
|
||||
method on the driver to abort the migration. If the driver is unable
|
||||
to perform this operation a new exception called
|
||||
'AbortMigrationNotSupported' will be returned.
|
||||
|
||||
The compute manager method invoked will be wrapped with the decorators
|
||||
that cause it to generate instance action and notification events. The
|
||||
exception generated here would be processed by those wrappers and thus
|
||||
the user would be able to query the instance actions to discover the
|
||||
outcome of the cancellation operation.
|
||||
|
||||
Note the instance task state will not be updated by the
|
||||
live_migrate_abort operation. If the operator were to execute the
|
||||
operation multiple times the subsequent invocations would simply fail.
|
||||
|
||||
In the case of the libvirt driver it will obtain the domain object for
|
||||
the target instance and invoke job abort on it. If there is no job
|
||||
active an error will be returned. This could occur if the instance
|
||||
migration has recently finished or has completed the libvirt migration
|
||||
and is executing the post migration phase. It could also occur if the
|
||||
migration is still executing the pre migration phase. Finally, if it
|
||||
could mean the libvirt job has failed but nova has not updated the
|
||||
task state. In all of these cases an exception will be returned to the
|
||||
compute manager to indicate that the operation was unsuccessful.
|
||||
|
||||
If the libvirt job abort operation succeeds then the thread performing
|
||||
the live migration will receive an error from the libvirt driver and
|
||||
perform the live migration rollback steps, including reseting the
|
||||
instance's task state to none.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
One alternative is not doing this, leaving it up to operators to roll
|
||||
up their sleeves and get to work on the hypervisor.
|
||||
|
||||
The topic of cancelling an ongoing live migration has been mooted
|
||||
before in Nova, and has been thought of as being suitable for a
|
||||
"Tasks API" for managing long-running tasks [#]_. There is not
|
||||
currently any Tasks API, but if one were to be added to Nova, it would
|
||||
be suitable.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
To be added in a new microversion.
|
||||
|
||||
* Obtain details of live migration operations on an instance that have
|
||||
a status of running. There should only be one migration per instance
|
||||
in this state but the API call supports returning more than one.
|
||||
|
||||
The operation will return the id of the active migration operation
|
||||
for the instance.
|
||||
|
||||
`GET /servers/{id}/migrations`
|
||||
|
||||
Body::
|
||||
|
||||
None
|
||||
|
||||
Normal http response code: `200 OK`
|
||||
|
||||
Body::
|
||||
|
||||
{
|
||||
"migrations": [
|
||||
{
|
||||
"created_at": "2013-10-29T13:42:02.000000",
|
||||
"dest_compute": "compute3",
|
||||
"id": 6789,
|
||||
"server_uuid": "6ff1c9bf-09f7-4ce3-a56f-fb46745f3770",
|
||||
"new_flavor_id": 2,
|
||||
"old_flavor_id": 1,
|
||||
"source_compute": "compute2",
|
||||
"status": "running",
|
||||
"updated_at": "2013-10-29T14:42:02.000000",
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
Expected error http response code: `404 Not Found`
|
||||
- the instance does not exist
|
||||
|
||||
Expected error http response code: `403 Forbidden`
|
||||
- Policy violation if the caller is not granted access to
|
||||
'os_compute_api:servers:migrations:index' in policy.json
|
||||
|
||||
* Stop an in-progress live migration
|
||||
|
||||
The operation will return the instance task state to none.
|
||||
|
||||
`DELETE /servers/{id}/migrations/{id}`
|
||||
|
||||
Body::
|
||||
|
||||
None
|
||||
|
||||
Normal http response code: `202 Accepted`
|
||||
No response body is needed
|
||||
|
||||
Expected error http response code: `404 Not Found`
|
||||
- the instance does not exist
|
||||
|
||||
Expected error http response code: `403 Forbidden`
|
||||
- Policy violation if the caller is not granted access to
|
||||
'os_compute_api:servers:migrations:delete' in policy.json
|
||||
|
||||
Expected error http response code: `400 Bad Request`
|
||||
- the instance state is invalid for cancellation, i.e. the task
|
||||
state is not 'migrating' or the migration is not in a running
|
||||
state and the type is 'live-migration'
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
Emit notification messages indicating the start and outcome of the
|
||||
migration cancellation operation.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
A new python-novaclient command will be available, e.g.
|
||||
|
||||
nova live-migration-abort <instance>
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Paul Carlton (irc: paul-carlton2)
|
||||
|
||||
Other assignees:
|
||||
Claudiu Belu
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* python-novaclient 'nova live-migration-abort'
|
||||
* Cancel live migration API operation
|
||||
* Cancelling a live migration per hypervisor
|
||||
* libvirt
|
||||
* hyper-v
|
||||
* vmware
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Unit tests will be added using fake virt driver to simulate a live
|
||||
migration. The fake driver implementation will simply wait for the
|
||||
cancelation. We also want to test attempts to cancel a migration
|
||||
during pre or post migration, which can be done using a fake
|
||||
implementation of those steps that will also wait for an indication
|
||||
that the cancel attempt has been performed.
|
||||
|
||||
The functional testing will utilize the new live migration CI job.
|
||||
An instance with memory activity and a large disk will be used so we
|
||||
can test all aspects of live migration, including aborting the live
|
||||
migration.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
New API needs to be documented:
|
||||
|
||||
* Compute API extensions documentation
|
||||
http://developer.openstack.org/api-ref-compute-v2.1.html
|
||||
|
||||
* nova.compute.api documentation
|
||||
http://docs.openstack.org/developer/nova/api/nova.compute.api.html
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
Some details of how this can be done with libvirt:
|
||||
https://www.redhat.com/archives/libvirt-users/2014-January/msg00008.html
|
||||
|
||||
.. [#] http://lists.openstack.org/pipermail/openstack-dev/2015-February/055751.html
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Mitaka
|
||||
- Introduced
|
||||
177
specs/mitaka/implemented/add-os-win-library.rst
Normal file
177
specs/mitaka/implemented/add-os-win-library.rst
Normal file
@@ -0,0 +1,177 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=====================
|
||||
Add os-win dependency
|
||||
=====================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/add-os-win-library
|
||||
|
||||
Hyper-V is involved in many of OpenStack components (nova, neutron, cinder,
|
||||
ceilometer, etc.) and will be involved with other components in the future.
|
||||
|
||||
A common library has been created, named os-win, in order to reduce the code
|
||||
duplication between all these components (utils classes, which interacts
|
||||
directly with Hyper-V through WMI), making it easier to maintain, review and
|
||||
propose new changes to current and future components.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
There are many Hyper-V utils modules duplicated across several projects,
|
||||
which can be refactored into os-win, reducing the code duplication and making
|
||||
it easier to maintain. Plus, the review process will be simplified, as
|
||||
reviewers won't have to review Hyper-V related code, in which not everyone is
|
||||
proficient.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
This blueprint impacts Developers and Reviewers.
|
||||
|
||||
Developers will be able to submit Hyper-V related commits directly to os-win.
|
||||
|
||||
Reviewers will not have to review low level Hyper-V related code. Thus, the
|
||||
amount of code that needs to be reviewed will be reduced by approximately 50%.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
In order to implement this blueprint, minimal changes are necessary, as the
|
||||
behaviour will stay the same.
|
||||
|
||||
The primary changes that needs to be done on nova are as follows:
|
||||
|
||||
* add os-win in requirements.txt
|
||||
* replace ``nova.virt.hyperv.vmutils.HyperVException`` references to
|
||||
``os_win.HyperVException``
|
||||
* replace all ``nova.virt.hyperv.utilsfactory`` imports used by the
|
||||
`HyperVDriver` with ``os_win.utilsfactory``
|
||||
* remove all utils modules and their unit tests in ``nova.virt.hyperv``, since
|
||||
they will no longer be used.
|
||||
* other trivial changes, which are to be seen in the implementation.
|
||||
|
||||
Changes that needs to be done on other projects:
|
||||
|
||||
* add os-win in global-requirements.txt [1]
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Originally, os-win was planned to be part of Oslo, it was suggested that os-win
|
||||
should be a standalone project, as otherwise the Oslo team would also have to
|
||||
maintain in and there aren't many / anyone that specializes in Windows /
|
||||
Hyper-V related code.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
os-win dependency will have to be installed in order for the HyperVDriver to be
|
||||
used.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
In a typical scenario, a blueprint implementation for the Hyper-V Driver will
|
||||
require 2 parts:
|
||||
|
||||
* os-win commit, adding Hyper-V related utils required in order to implement
|
||||
the blueprint.
|
||||
* nova commit, implementing the blueprint and using the changes made in os-win.
|
||||
|
||||
If a nova commit requires a newer version of os-win, the patch to
|
||||
global-requirements should be referenced with Depends-On in the commit message.
|
||||
|
||||
For bugfixes, there are chances that they require 2 patches: one for nova and
|
||||
one for os-win. The backported bugfix must be a squashed version of the 2
|
||||
patches, referencing both commit IDs in the commit message::
|
||||
|
||||
(cherry picked from commit <nova-commit-id>)
|
||||
(cherry picked from commit <os-win-commit-id)
|
||||
|
||||
If the bugfix requires only one patch to either project, backporting will
|
||||
proceed as before.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Claudiu Belu <cbelu@cloudbasesolutions.com>
|
||||
|
||||
Other contributors:
|
||||
Lucian Petrut <lpetrut@cloudbasesolutions.com>
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
As described in the `Proposed change` section.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
Adds os-win library as a dependency.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
* Unit tests
|
||||
* Hyper-V CI
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
The Hyper-V documentation page [3] will have to be updated to include os-win
|
||||
as a dependency.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[1] os-win added to global-requirements.txt:
|
||||
https://review.openstack.org/#/c/230394/
|
||||
|
||||
[2] os-win repository:
|
||||
https://github.com/openstack/os-win
|
||||
|
||||
[3] Hyper-V virtualization platform documentation page:
|
||||
http://docs.openstack.org/liberty/config-reference/content/hyper-v-virtualization-platform.html
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
Mitaka: Introduced
|
||||
@@ -0,0 +1,241 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=======================================================================
|
||||
Show the 'project_id' and 'user_id' information in os-server-groups API
|
||||
=======================================================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/add-project-id-and-user-id
|
||||
|
||||
Show the 'project_id' and 'user_id' information of the server
|
||||
groups in os-server-groups API. This fix will allow admin user
|
||||
to identify server group easier.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The os-server-groups API currently allows admin user to list server
|
||||
groups for all projects and the response body doesn't contain project
|
||||
id information of each server group, it will be hard to identify which
|
||||
server group belong to which project in multi-tenant env.
|
||||
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
As a cloud administrator, I want to easily identify which server group
|
||||
belongs to which project when sending GET request.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Add a new API microversion to the os-server-groups API extension such that if:
|
||||
* The version on the API 'list' request satisfies the minimum version include
|
||||
the 'project_id' and 'user_id' information of server groups in the
|
||||
response data.
|
||||
* The version on the API 'show' request satisfies the minimum version include
|
||||
the 'project_id' and 'user_id' information of server groups in the response
|
||||
data.
|
||||
* The version on the API 'create' request satisfies the minimum version
|
||||
include the 'project_id' and 'user_id' information of server groups in
|
||||
the response data.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
None
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
The proposed change updates the GET response data in the os-server-groups
|
||||
API extension to include the 'project_id' and 'user_id' field if the request
|
||||
has a minimum supported version.
|
||||
|
||||
The proposed change also updates the POST response data in the
|
||||
os-server-groups API extension to include the 'project_id' and 'user_id'
|
||||
field if the request has a minimum supported version.
|
||||
|
||||
* Modifications for the method
|
||||
|
||||
* Add project id information to the current response data.
|
||||
* Add user id information to the current response data.
|
||||
* GET requests response data will be affected.
|
||||
* POST requests response data will be affected.
|
||||
|
||||
* Example use case:
|
||||
|
||||
Request:
|
||||
|
||||
GET --header "X-OpenStack-Nova-API-Version: 2.12" \
|
||||
http://127.0.0.1:8774/v2.1/e0c1f4c0b9444fa086fa13881798144f/os-server-groups
|
||||
|
||||
Response:
|
||||
|
||||
::
|
||||
|
||||
{
|
||||
"server_groups": [
|
||||
{
|
||||
"user_id": "ed64bccd0227444fa02dbd7695769a7d",
|
||||
"policies": [
|
||||
"affinity"
|
||||
],
|
||||
"name": "test1",
|
||||
"members": [],
|
||||
"project_id": "b8112a8d8227490eba99419b8a8c2555",
|
||||
"id": "e64b6ae1-4d05-4faa-9f53-72c71f8e6f1a",
|
||||
"metadata": {}
|
||||
},
|
||||
{
|
||||
"user_id": "9128b975e91846f882eb63dc35c2ffd8",
|
||||
"policies": [
|
||||
"anti-affinity"
|
||||
],
|
||||
"name": "test2",
|
||||
"members": [],
|
||||
"project_id": "b8112a8d8227490eba99419b8a8c2555",
|
||||
"id": "b1af831c-69b5-4d42-be44-d710f2b8954c",
|
||||
"metadata": {}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
Request:
|
||||
|
||||
GET --header "X-OpenStack-Nova-API-Version: 2.12" \
|
||||
http://127.0.0.1:8774/v2.1/e0c1f4c0b9444fa086fa13881798144f/os-server-groups/
|
||||
e64b6ae1-4d05-4faa-9f53-72c71f8e6f1a
|
||||
|
||||
Response:
|
||||
|
||||
::
|
||||
|
||||
{
|
||||
"user_id": "ed64bccd0227444fa02dbd7695769a7d",
|
||||
"policies": [
|
||||
"affinity"
|
||||
],
|
||||
"name": "test1",
|
||||
"members": [],
|
||||
"project_id": "b8112a8d8227490eba99419b8a8c2555",
|
||||
"id": "e64b6ae1-4d05-4faa-9f53-72c71f8e6f1a",
|
||||
"metadata": {}
|
||||
}
|
||||
|
||||
Request:
|
||||
|
||||
POST --header "X-OpenStack-Nova-API-Version: 2.12" \
|
||||
http://127.0.0.1:8774/v2.1/e0c1f4c0b9444fa086fa13881798144f/os-server-groups \
|
||||
-d {"server_group": { "name": "test", "policies": [ "affinity" ] }}
|
||||
|
||||
Response:
|
||||
|
||||
::
|
||||
|
||||
{
|
||||
"user_id": "ed64bccd0227444fa02dbd7695769a7d",
|
||||
"policies": [
|
||||
"affinity"
|
||||
],
|
||||
"name": "test",
|
||||
"members": [],
|
||||
"project_id": "b8112a8d8227490eba99419b8a8c2555",
|
||||
"id": "e64b6ae1-4d05-4faa-9f53-72c71f8e6f1a",
|
||||
"metadata": {}
|
||||
}
|
||||
|
||||
* There should not be any impacts to policy.json files for this change.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
* The python-novaclient server-group-list, server-group-show
|
||||
server-group-create command will be updated to handle microversions
|
||||
to show the 'project_id' and 'user_id' information in it's output
|
||||
if the requested microversion provides that infomation.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None; if a deployer is using the required minimum version of the API to get
|
||||
the 'project_id' and 'user_id' data they can begin using it, otherwise they
|
||||
won't see a change.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Zhenyu Zheng <zhengzhenyu@huawei.com>
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Add a new microversion and change
|
||||
nova/api/openstack/compute/server_groups.py to use it to determine
|
||||
if the 'project_id' and 'user_id' information of the server group
|
||||
should be returned.
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
* Unit tests and API samples functional tests in the nova tree.
|
||||
* There are currently not any compute API microversions tested in Tempest
|
||||
beyond v2.1. We could add support for testing the new version in Tempest
|
||||
but so far the API is already at least at v2.10 without changes to Tempest.
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
* nova/api/openstack/rest_api_version_history.rst document will be updated.
|
||||
* api-ref at https://github.com/openstack/api-site will be updated.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* Originally reported as a bug:
|
||||
https://bugs.launchpad.net/python-novaclient/+bug/1481210
|
||||
|
||||
172
specs/mitaka/implemented/boot-from-uefi.rst
Normal file
172
specs/mitaka/implemented/boot-from-uefi.rst
Normal file
@@ -0,0 +1,172 @@
|
||||
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================================
|
||||
Boot From UEFI image
|
||||
==========================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/boot-from-uefi
|
||||
|
||||
The nova compute libvirt driver does not support booting from UEFI images.
|
||||
This is a problem because there is a slow but steady trend for OSes to move
|
||||
to the UEFI format and in some cases to make the UEFI format their only
|
||||
format. Microsoft Windows is moving in this direction and Clear Linux is
|
||||
already in this category. Given this, we propose enabling UEFI boot with
|
||||
the libvirt driver. Additionally, we propose using the well tested and
|
||||
battle hardened Open Virtual Machine Firmware (OVMF) as the VM firmware
|
||||
for x86_64.
|
||||
|
||||
Unified Extensible Firmware Interface (UEFI) is a standard firmware designed
|
||||
to replace BIOS. Booting a VM using UEFI/OVMF is supported by libvirt since
|
||||
version 1.2.9.
|
||||
|
||||
OVMF is a port of Intel's tianocore firmware to qemu virtual machine, in other
|
||||
words this project enables UEFI support for Virtual Machines.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
Platform vendors have been increasingly adopting UEFI for the platform firmware
|
||||
over traditional BIOS. This, in part, is leading to OS vendors also shifting to
|
||||
support or provide UEFI images. However, as adoption of UEFI for OS images
|
||||
increases, it has become apparent that OpenStack through its Nova compute
|
||||
Libvirt driver, does not support UEFI image boot. This is problematic and needs
|
||||
to be resolved.
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
1. User wants to launch a VM with UEFI. In this case the user needs to be able
|
||||
to tell Nova everything that is needed to launch the desired VM. The only
|
||||
additional information that should be required is new image properties
|
||||
indicating which kind of firmware type will be used, uefi or bios.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Add missing elements when generating XML definition in libvirt driver to
|
||||
support OVMF firmware. Add also a new image metadata value to specify which
|
||||
firmware type will be used.
|
||||
|
||||
The following is the new metadata value.
|
||||
|
||||
* 'hw_firmware_type': fields.EnumField()
|
||||
|
||||
This indicates that which kind of firmware type will be used to boot VM.
|
||||
This property can be set to 'uefi' or 'bios'. 'uefi' will indicate that
|
||||
uefi firmware will be used. If the property is not set, 'bios' firmware
|
||||
will be used.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
None
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
The following packages should be added to the system:
|
||||
|
||||
* ovmf
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
qiaowei-ren
|
||||
|
||||
Other contributors:
|
||||
Victor Morales <victor.morales@intel.com>
|
||||
Xin Xiaohui <xiaohui.xin@intel.com>
|
||||
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
The primary work items are
|
||||
|
||||
* Add the 'hw_firmware_type' field to the ImageMetaProps object
|
||||
* Update the libvirt guest XML configuration when the UEFI image
|
||||
property is present
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
This spec only implements uefi boot for x86_64 and arm64. And this
|
||||
spec will depend on the following libraries:
|
||||
|
||||
* libvirt >= 1.2.9
|
||||
* OVMF from EDK2
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Would need new unit tests. Without some kind of functional testing,
|
||||
there is a warning emitted when this is used saying it's untested
|
||||
and therefore considered experimental.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Some minor additions for launching a UEFI image with Nova, note on
|
||||
extra config option and metadata property, Operator / installation
|
||||
information for the UEFI firmware. In addition, hypervisor support
|
||||
matrix should be also updated.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* http://www.linux-kvm.org/downloads/lersek/ovmf-whitepaper-c770f8c.txt
|
||||
|
||||
* https://libvirt.org/formatdomain.html#elementsOSBIOS
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Mitaka
|
||||
- Introduced
|
||||
160
specs/mitaka/implemented/cells-db-connection-switching.rst
Normal file
160
specs/mitaka/implemented/cells-db-connection-switching.rst
Normal file
@@ -0,0 +1,160 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=======================================
|
||||
Database connection switching for cells
|
||||
=======================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/cells-db-connection-switching
|
||||
|
||||
In order for Nova API to perform queries on cell databases, the database
|
||||
connection information for the target cell must be used. Nova API must
|
||||
pass the cell database connection information to the DB API layer.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
In Cells v2, instead of using a nova-cells proxy, nova-api will interact
|
||||
directly with the database and message queue of the cell for an instance.
|
||||
Instance -> cell mappings are stored in a table in the API level database.
|
||||
Each InstanceMapping refers to a CellMapping, and the CellMapping contains
|
||||
the connection information for the cell. We need a way to communicate the
|
||||
database connection information from the CellMapping to the DB layer, so
|
||||
when we update an instance, it will be updated in the cell database where
|
||||
the instance's data resides.
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
* Operators want to partition their deployments into cells for scaling, failure
|
||||
domain, and buildout reasons. When partitioned, we need a way to route
|
||||
queries to the cell database for an instance.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
We propose to store the database connection information for a cell in the
|
||||
RequestContext where it can be used by the DB API layer to interact with
|
||||
the cell database. Currently, there are two databases that can be used at
|
||||
the DB layer: 'main' and 'api' that are selected by the caller by method
|
||||
name. We will want to consolidate the two methods into one that takes a
|
||||
parameter to choose which EngineFacade to use. The field 'db_connection'
|
||||
will be added to RequestContext to store the key to use for looking up the
|
||||
EngineFacade.
|
||||
|
||||
When a request comes in, nova-api will look up the instance mapping in the
|
||||
API database. It will get the database information from the instance's
|
||||
CellMapping and store a key based on it in the RequestContext 'db_connection'
|
||||
field. Then, the DB layer will look up the EngineFacade object for interacting
|
||||
with the cell database using the 'db_connection' key stored in the
|
||||
RequestContext.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
One alternative would be to add an argument to DB API methods to optionally
|
||||
take database connection information to use instead of the configuration
|
||||
setting and pass it when taking action on objects. This would require changing
|
||||
the signatures of all the DB API methods to take the keyword argument or
|
||||
otherwise finding a way to let all of the DB API methods derive from such an
|
||||
interface. There is also precedent of allowing use of a field in the
|
||||
RequestContext to communicate "read_deleted" to the DB API model_query.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
The database connection field in the RequestContext could contain sensitive
|
||||
data.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
This change on its own does not introduce a performance impact. The overall
|
||||
design of keeping only mappings in the API DB and instance details in the
|
||||
cell databases introduces an additional database lookup for the cell database
|
||||
connection information. This can however be addressed by caching mappings.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
This change means that developers should be aware that cell database connection
|
||||
information is contained in the RequestContext and be mindful that it could
|
||||
contain sensitive data. Developers will need to use the interfaces for getting
|
||||
database connection information from a CellMapping and setting it in a
|
||||
RequestContext in order to interact query a cell database.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
melwitt
|
||||
|
||||
Other contributors:
|
||||
dheeraj-gupta4
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Add a database connection field to RequestContext
|
||||
|
||||
* Add a context manager to nova.context that populates a RequestContext with
|
||||
the database connection information given a CellMapping
|
||||
|
||||
* Modify nova.db.sqlalchemy.api get_session and get_engine to use the database
|
||||
connection information from the context, if it's set
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
* https://blueprints.launchpad.net/nova/+spec/cells-v2-mapping
|
||||
|
||||
* https://blueprints.launchpad.net/nova/+spec/cells-instance-mapping
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Since no user visible changes will occur with this change, the current suite of
|
||||
Tempest or functional tests should be sufficient.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Developer documentation could be written to describe how to use the new
|
||||
interfaces.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* https://etherpad.openstack.org/p/kilo-nova-cells
|
||||
374
specs/mitaka/implemented/centralize-config-options.rst
Normal file
374
specs/mitaka/implemented/centralize-config-options.rst
Normal file
@@ -0,0 +1,374 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=========================
|
||||
Centralize Config Options
|
||||
=========================
|
||||
|
||||
Include the URL of your launchpad blueprint:
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/centralize-config-options
|
||||
|
||||
Nova has around 800 config options*. Those config options are the interface
|
||||
to the cloud operators. Unfortunately they often lack a good documentation
|
||||
which
|
||||
* explains their impact,
|
||||
* shows their interdependency to other config options and
|
||||
* explains which of the Nova services they influence.
|
||||
This cloud operator interface needs to be consolidated and one way of doing
|
||||
this is, to move the config options from their declaration in multiple modules
|
||||
to a few centrally managed modules. These centrally managed modules should
|
||||
also provide the bigger picture of the configuration surface we provide. This
|
||||
got already discussed on the ML [1].
|
||||
|
||||
\* see the "nova.flagmappings" file which get generated in the
|
||||
"openstack-manuals" project for the "configuration reference" manual.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Same as above
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
* As an end user I'm not affected by this change and won't notice a difference.
|
||||
* As a developer I will find all config options in one place and will add
|
||||
further config options to that central place.
|
||||
* As a cloud operator I will see more helpful descriptions on the config
|
||||
options. The default values, names, sections won't change in any way and
|
||||
my ``nova.conf`` files will work as before.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
The change consists of two views,
|
||||
|
||||
* a technical one, which describes how the refactoring is done in terms
|
||||
of code placement
|
||||
* and a quality view, which describes the standard a good config option
|
||||
help text has to fulfill.
|
||||
|
||||
Technical View
|
||||
--------------
|
||||
|
||||
There was a proof of concept in Gerrit which shows the intention [2]. The
|
||||
steps are as followed:
|
||||
|
||||
#. There will be a new package called ``nova/conf``.
|
||||
|
||||
#. This package contains a module for each natural grouping (mostly the
|
||||
section name) in the ``nova.conf`` file. For example:
|
||||
|
||||
* ``nova/conf/default.py``
|
||||
* ``nova/conf/ssl.py``
|
||||
* ``nova/conf/cells.py``
|
||||
* ``nova/conf/libvirt.py``
|
||||
* [...]
|
||||
|
||||
#. All ``CONF.import_opt(...)`` calls get removed from the functional modules
|
||||
as they don't serve their purpose anymore. That's because after the import
|
||||
of ``nova.conf``, all config options will be available.
|
||||
|
||||
#. All ``CONF.register_opts(...)`` calls get moved to the modules
|
||||
``nova/conf/<module-name>.py``. By that these modules can control
|
||||
themselves under which group name the options get registered. The module
|
||||
``nova/conf/__init__.py`` imports those modules and triggers the
|
||||
registration with ``<module-name>.register_opts(CONF)``. This allows the
|
||||
usage of::
|
||||
|
||||
import nova.conf
|
||||
|
||||
CONF = nova.conf.CONF
|
||||
|
||||
if CONF.<section>.<config-option>:
|
||||
# do something
|
||||
|
||||
Which means that the normal functional code, which uses the config options
|
||||
doesn't need to get changed for this.
|
||||
|
||||
#. There will only be one ``nova/conf/opts.py`` module which is necessary to
|
||||
build the ``nova.conf.sample`` file. This ``opts.py`` module is the single
|
||||
point of entry for that. All other ``opts.py`` will be removed at the end,
|
||||
for example the ``nova/virt/opts.py`` file.
|
||||
|
||||
|
||||
Quality View
|
||||
------------
|
||||
|
||||
Operators will work with this interface, so the documentation has to be
|
||||
precise and non-ambiguous. So let's have a view at some negative examples and
|
||||
why I consider them not sufficient. After that, the changed positive example
|
||||
should show which direction we should go. This section will close with a
|
||||
generic template for config options which should be implemented during this
|
||||
refactoring.
|
||||
|
||||
**Negative Examples:**
|
||||
|
||||
The following example is from the *serial console* feature::
|
||||
|
||||
cfg.StrOpt('base_url',
|
||||
default='ws://127.0.0.1:6083/',
|
||||
help='Location of serial console proxy.'),
|
||||
|
||||
It lacks the description which services use this, how one can decide to
|
||||
use another port and what the impact this has.
|
||||
|
||||
Another example from the *image cache* feature::
|
||||
|
||||
cfg.IntOpt('image_cache_manager_interval',
|
||||
default=2400,
|
||||
help='Number of seconds to wait between runs of the '
|
||||
'image cache manager. Set to -1 to disable. '
|
||||
'Setting this to 0 will run at the default rate.'),
|
||||
|
||||
On the plus side, it shows the possible values and their impact, but does
|
||||
not describe which service consumes this and if it has interdependencies
|
||||
to other config options.
|
||||
|
||||
**Positive Example:**
|
||||
|
||||
Here is an example how this could look like for a config option of the
|
||||
*serial console* feature::
|
||||
|
||||
serial_opt_base_url = cfg.StrOpt('base_url',
|
||||
default='ws://127.0.0.1:6083/',
|
||||
help="""The token enriched URL which is
|
||||
returned to the end user to connect to the nova-serialproxy service.
|
||||
|
||||
This URL is the handle an end user will get (enriched with a token at
|
||||
the end) to establish the connection to the console of a guest.
|
||||
|
||||
Services which consume this:
|
||||
|
||||
* ``nova-compute``
|
||||
|
||||
Possible values:
|
||||
|
||||
* A string which is a URL
|
||||
|
||||
Related options:
|
||||
|
||||
* The IP address must be identical to the address to which the
|
||||
``nova-serialproxy`` service is listening (see option
|
||||
``serialproxy_host`` in section ``[serial_console]``).
|
||||
* The port must be the same as in the option ``serialproxy_port``
|
||||
of section ``[serial_console]``.
|
||||
* If you choose to use a secured websocket connection, start this
|
||||
option with ``wss://`` instead of the unsecured ``ws://``.
|
||||
The options ``cert`` and ``key`` in the ``[DEFAULT]`` section
|
||||
have to be set for that.'"""),
|
||||
|
||||
serial_console_group = cfg.OptGroup(name="serial_console",
|
||||
title="The serial console feature",
|
||||
help="""The serial console feature
|
||||
allows you to connect to a guest in case a graphical console like VNC or
|
||||
SPICE is not available.""")
|
||||
|
||||
CONF.register_opt(serial_opt_base_url, group=serial_console_group)
|
||||
|
||||
Another example can be made for the *image cache* feature::
|
||||
|
||||
cfg.IntOpt('image_cache_manager_interval',
|
||||
default=2400,
|
||||
min=-1,
|
||||
help="""Number of seconds to wait between runs of
|
||||
the image cache manager.
|
||||
|
||||
The image cache manager is responsible for ensuring that local disk doesn't
|
||||
fill with backing images that aren't currently in use. It should be noted
|
||||
that if local disk is too full to start a new instance, and cleaning the
|
||||
image cache would free enough space to make the hypervisor node usable then
|
||||
the hypervisor node wont be usable until the next run of the image cache
|
||||
manager. In other words, the cache manager is not run more frequently as
|
||||
a hypervisor node becomes resource constrained.
|
||||
|
||||
Services which consume this:
|
||||
|
||||
* ``nova-compute``
|
||||
|
||||
Possible values:
|
||||
|
||||
* ``-1`` Disables the cleaning of the image cache.
|
||||
* ``0`` Runs the cleaning at the default rate.
|
||||
* Other values greater than ``0`` describes the number of seconds
|
||||
between two cleanups
|
||||
|
||||
Related options:
|
||||
|
||||
* None
|
||||
"""),
|
||||
|
||||
**Generic Template**
|
||||
|
||||
Based on the positive example above, the generic template a config option
|
||||
should fulfill to be descriptive to the operators would be::
|
||||
|
||||
help="""#A short description what it does. If it is a unit (e.g. timeout)
|
||||
# describe the unit which is used (seconds, megabyte, mebibyte, ...)
|
||||
|
||||
# A long description what the impact and scope is. The operators should
|
||||
# know the expected change in the behavior of Nova if they tweak this.
|
||||
|
||||
Services which consume this:
|
||||
|
||||
# A list of services which consume this option. Operators should not
|
||||
# read code to know which one of the services will change its behavior.
|
||||
# Nor should they set this in every ``nova.conf`` file to be sure.
|
||||
|
||||
Possible values:
|
||||
|
||||
# description of possible values. Especially if this is an option
|
||||
# with numeric values (int, float), describe the edge cases (like the
|
||||
# min value, max value, 0, -1).
|
||||
|
||||
Related options:
|
||||
|
||||
# Which other config options have to be considered when I change this
|
||||
# one? If it stand solely on its own, use "None"
|
||||
"""),
|
||||
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
The ML discussion [2] concluded that the following ideas wouldn't work for us:
|
||||
|
||||
#. *Move all of the config options into one single ``flags.py`` module.*
|
||||
It was reasoned that this file would be vastly huge and that merge
|
||||
conflicts for the contributors would be unavoidable.
|
||||
|
||||
#. *Ship the config options in data files with the code rather than being*
|
||||
*inside the Python code itself.* It was reasoned that this could cause a
|
||||
missing update of the config options description if it was used in a
|
||||
different way than before.
|
||||
|
||||
#. *Don't use config options directly in the functional code. Make a*
|
||||
*dependency injection to the object which needs the configured value*
|
||||
*and depend only on that objects attributes.* Yes, this is the one with
|
||||
the most benefit in terms of testability, clean code, OOP practices and
|
||||
so on. The outcome of this blueprint is also to get a feeling how that
|
||||
approach could be done in the end. A first proof of concept [3] was a bit
|
||||
cumbersome.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
#. It could also be that we like to deprecate options because they don't get
|
||||
used anymore.
|
||||
|
||||
#. Otherwise the deployer should get more and more happy about helpful texts
|
||||
and descriptions.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
#. Contributors which are actively working on config options could have merge
|
||||
conflicts and need to rebase.
|
||||
#. New config options should directly be added to the new central place at
|
||||
``nova/conf/<section>.py``.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Markus Zoeller (markus_z)
|
||||
https://launchpad.net/~mzoeller
|
||||
|
||||
Other contributors:
|
||||
None (but highly welcome)
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
#. create folder ``nova/conf`` with modules for each ``nova.conf`` section
|
||||
#. move options from a functional module to the section module from above
|
||||
#. enhance the help texts from config options and option groups.
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
#. Depending on the outcome of the discussion of [4] which proposes to enrich
|
||||
the config option object by interdependencies, we could use that. But this
|
||||
blueprint doesn't have a hard dependency on that.
|
||||
#. Depending on the outcome of the discussion of [5] which proposes to enrich
|
||||
the config option object by allowing to format the help text with a markup
|
||||
language, we could use that. But this blueprint doesn't have a hard
|
||||
dependency on that.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
The ``nova.conf`` sample gets generated as part of the ``docs`` build.
|
||||
If this fails we know that something went wrong.
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
None
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[1] MailingList "openstack-dev"; July 2015; "Streamlining of config options
|
||||
in nova":
|
||||
http://lists.openstack.org/pipermail/openstack-dev/2015-July/070306.html
|
||||
[2] Gerrit; PoC; "DO NOT MERGE: Example of config options reshuffle":
|
||||
https://review.openstack.org/#/c/214581
|
||||
[3] Gerrit; PoC; "DO NOT MERGE: replace global CONF access by object":
|
||||
https://review.openstack.org/#/c/218319
|
||||
[4] Launchpad; oslo.config; blueprint "option-interdependencies"
|
||||
https://blueprints.launchpad.net/oslo.config/+spec/option-interdependencies
|
||||
[5] Launchpad; oslo.config; blueprint "help-text-markup"
|
||||
https://blueprints.launchpad.net/oslo.config/+spec/help-text-markup
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Mitaka
|
||||
- Introduced
|
||||
315
specs/mitaka/implemented/check-destination-on-migrations.rst
Normal file
315
specs/mitaka/implemented/check-destination-on-migrations.rst
Normal file
@@ -0,0 +1,315 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=======================================================
|
||||
Check the destination host when migrating or evacuating
|
||||
=======================================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/check-destination-on-migrations
|
||||
|
||||
Provide a way to make sure that resource allocation is consistent for all
|
||||
operations, even if a destination host is provided.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Live migrations and evacuations allow the possibility to either specify a
|
||||
destination host or not. The former option totally bypasses the scheduler by
|
||||
calling the destination Compute RPC API directly.
|
||||
|
||||
Unfortunately, there are some cases when migrating a VM, it breaks the
|
||||
scheduler rules so it so it potentially breaks future boot requests due
|
||||
to some constraints not enforced when migrating/evacuating (like allocation
|
||||
ratios).
|
||||
|
||||
We should modify that logic to explicitly call the Scheduler any time a move
|
||||
(ie. either a live-migration or an evacuation) is requested (whether the
|
||||
destination host is provided or not) so that the Scheduler would verify the
|
||||
destination host thru all the enabled filters and if successful consume the
|
||||
instance usage from its internal HostState.
|
||||
|
||||
That said, we also understand that there are usecases where an
|
||||
operator wants to move an instance manually and not call the scheduler, even
|
||||
if the operator knows that he explicitly breaks scheduler rules (eg. a
|
||||
filter not passing, an affinity policy violated or an instance taking an
|
||||
already allocated pCPU in the context of CPU pinning).
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
Some of the normal usecases (verifying the destination) could be :
|
||||
|
||||
As an operator, I want to make sure that the destination host I'm providing
|
||||
when live migrating a specific instance would be correct and wouldn't break my
|
||||
internal cloud because of a discrepancy between how I calculate the destination
|
||||
host capacity and how the scheduler is taking in account memory allocation
|
||||
ratio (see the References section below)
|
||||
|
||||
As an operator, I want to make sure that live-migrating an instance to a
|
||||
specific destination wouldn't impact my existing instances running on that
|
||||
destination host because of some affinity that I missed.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
This spec goes beyond what the persist-request-spec blueprint [1] by making
|
||||
sure that before each call to select_destinations(), the RequestSpec object is
|
||||
read from the current instance to schedule and will make sure that after the
|
||||
result of select_destinations(), the RequestSpec object will be persisted.
|
||||
|
||||
That way, we will be able to get the original RequestSpec from the
|
||||
corresponding instance from the user creating the VM including the scheduler
|
||||
hints. Given that, we propose to amend the RequestSpec object to include a new
|
||||
field called ``requested_destination`` which would be a ComputeNode object (at
|
||||
least having the host and hypervisor_hostname fields set) and would be set by
|
||||
the conductor for each method (here live-migrate and rebuild_instance
|
||||
respectively) accepting an optional destination host.
|
||||
|
||||
Note that this new field would nothing have in common with a migration object
|
||||
or an Instance.host field, since it would just be a reference to an equivalent
|
||||
scheduler hint saying 'I want to go there' (and not the ugly force_hosts
|
||||
information passed as an Availability Zone hack...).
|
||||
|
||||
It will be the duty of the conductor (within the live_migrate and evacuate
|
||||
methods) to get the RequestSpec related to the instance, add the
|
||||
``requested_destination`` field, set the related Migration object to
|
||||
``scheduled`` and call the scheduler's ``select_destinations`` method.
|
||||
The last step would be of course to store the updated RequestSpec object.
|
||||
If the requested destination is unacceptable for the scheduler, then the
|
||||
conductor will change the Migration status to ``conflict``.
|
||||
|
||||
The idea behind that is that the Scheduler would check that field in the
|
||||
_schedule() method of FilterScheduler and would then just call the filters only
|
||||
for that destination.
|
||||
|
||||
As the RequestSpec object blueprint cares about backwards compatibility by
|
||||
providing the legacy ``request_spec`` and ``filter_properties`` to the old
|
||||
``select_destinations`` API method, we wouldn't pass the new
|
||||
``requested_destination`` field as a key for the request_spec.
|
||||
|
||||
|
||||
Since this BP also provides a way for operators to bypass the Scheduler, we
|
||||
will amend the API for all migrations including a destination host by adding an
|
||||
extra request body argument called ``force`` (accepting True or False,
|
||||
defaulted to False) and the corresponding CLI methods will expose that
|
||||
``force`` option. If the microversion asked by the client is older than the
|
||||
version providing the field, then it won't be passed (neither True or False,
|
||||
rather the key won't exist) to the conductor so the conductor won't call the
|
||||
scheduler - to keep the existing behaviour (see the REST API section below for
|
||||
further details).
|
||||
|
||||
In order to keep track of those forced calls, we propose to log as an instance
|
||||
action the fact that the migration has been forced so that the operator could
|
||||
potentially reschedule the instance later on if he wishes. For that, we propose
|
||||
to add two new possible actions, called ``FORCED_MIGRATE`` (when live-migrating
|
||||
) and ``FORCED_REBUILD`` (when evacuating)
|
||||
That way means that an operator can get all the instances having either
|
||||
``FORCED_MIGRATE`` or ``FORCED_REBUILD`` just by calling the
|
||||
/os-instance-actions API resource for each instance, and we could also later
|
||||
add a new blueprint (out of that spec scope) for getting the list of instances
|
||||
having the last specific action set to something (here FORCED_something).
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
We could just provide a way to call the scheduler for having an answer if the
|
||||
destination host is valid or not, but it wouldn't consume the instance usage
|
||||
which is from our perspective the key problem with the existing design.
|
||||
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
The proposed change just updates the POST request body for the
|
||||
``os-migrateLive`` and ``evacuate`` actions to include the
|
||||
optional ``force`` boolean field defaulted to False if the request has a
|
||||
minimum version.
|
||||
|
||||
Depending on whether the ``host`` and ``force`` fields are set or null, the
|
||||
actions and return codes are:
|
||||
|
||||
- If a host parameter is supplied in the request body, the scheduler will now
|
||||
be asked to verify that the requested target compute node is actually able to
|
||||
accommodate the request, including honouring all previously-used scheduler
|
||||
hints. If the scheduler determines the request cannot be accommodated by the
|
||||
requested target host node, the related Migration object will change the
|
||||
``status`` field to ``conflict``.
|
||||
|
||||
- If a host parameter is supplied in the request body, a new --force parameter
|
||||
may also be supplied in the request body. If present, the scheduler shall
|
||||
**not** be consulted to determine if the target compute node can be
|
||||
accommodated, and no Migration object will be updated.
|
||||
|
||||
- If --force parameter is supplied in the request body but the host parameter
|
||||
is either null (for live-migrate) or not provided (for evacuate), then an
|
||||
HTTP 400 Bad Request will be served to the user.
|
||||
|
||||
Of course, since it's a new request body attribute, it will get a new API
|
||||
microversion, meaning that if the attribute is not provided, the scheduler
|
||||
won't be called by the conductor (to keep the existing behaviour where setting
|
||||
a host bypasses the scheduler).
|
||||
|
||||
* JSON schema definition for the body data of ``os-migrateLive``:
|
||||
|
||||
::
|
||||
|
||||
migrate_live = {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'os-migrateLive': {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'block_migration': parameter_types.boolean,
|
||||
'disk_over_commit': parameter_types.boolean,
|
||||
'host': host,
|
||||
'force': parameter_types.boolean
|
||||
},
|
||||
'required': ['block_migration', 'disk_over_commit', 'host'],
|
||||
'additionalProperties': False,
|
||||
},
|
||||
},
|
||||
'required': ['os-migrateLive'],
|
||||
'additionalProperties': False,
|
||||
}
|
||||
|
||||
|
||||
* JSON schema definition for the body data of ``evacuate``:
|
||||
|
||||
::
|
||||
|
||||
evacuate = {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'evacuate': {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'host': parameter_types.hostname,
|
||||
'force': parameter_types.boolean,
|
||||
'onSharedStorage': parameter_types.boolean,
|
||||
'adminPass': parameter_types.admin_password,
|
||||
},
|
||||
'required': ['onSharedStorage'],
|
||||
'additionalProperties': False,
|
||||
},
|
||||
},
|
||||
'required': ['evacuate'],
|
||||
'additionalProperties': False,
|
||||
}
|
||||
|
||||
|
||||
* There should be no policy change as we're not changing the action by itself
|
||||
but rather just providing a new option.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
Python-novaclient will accept a ``force`` option for the following methods :
|
||||
|
||||
- evacuate
|
||||
- live-migrate
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
A new RPC call will be done by default when migrating or evacuating
|
||||
but it shouldn't really impact the performance since it's the normal behaviour
|
||||
for a general migration. In order to leave that RPC asynchronous from the API
|
||||
query, we won't give the result of the check within the original request, but
|
||||
rather modify the Migration object status (see the REST API impact section
|
||||
above).
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
sylvain-bauza
|
||||
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
- Read any existing RequestSpec before calling ``select_destinations()`` in all
|
||||
the conductor methods calling it
|
||||
- Amend RequestSpec object with ``requested_destination`` field
|
||||
- Modify conductor methods for evacuate and live_migrate to fill in
|
||||
``requested_destination``, call ``scheduler_client.select_destinations()``
|
||||
and persist the amended RequestSpec object right after the call.
|
||||
- Modify FilterScheduler._schedule() to introspect ``requested_destination``
|
||||
and call filters for only that host if so.
|
||||
- Extend the API (and bump a new version) to add a ``force`` attribute for both
|
||||
above API resources with the appropriate behaviours.
|
||||
- Bypass the scheduler if the flag is set and log either ``FORCED_REBUILD`` or
|
||||
``FORCED_MIGRATE`` action.
|
||||
- Add a new ``force`` option to python-novaclient and expose it in CLI for both
|
||||
``evacuate`` and ``live-migrate`` commands
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
As said above in the proposal, since scheduler hints are part of the request
|
||||
and are not persisted yet, we need to depend on persisting the RequestSpec
|
||||
object [1] before calling ``select_destinations()`` so that a future migration
|
||||
would read that RequestSpec and provide it again.
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
API samples will need to be updated and unittests will cover the behaviour.
|
||||
In-tree functional tests will be amended to cover that option.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
As said, API samples will be modified to include the new attribute.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[1] http://specs.openstack.org/openstack/nova-specs/specs/liberty/approved/persist-request-spec.html
|
||||
|
||||
Lots of bugs are mentioning the caveat we described above. Below are the ones
|
||||
I identified and who will be closed once the spec implementation lands :
|
||||
|
||||
- https://bugs.launchpad.net/nova/+bug/1451831
|
||||
Specifying a destination node with nova live_migration does not take into
|
||||
account overcommit setting (ram_allocation_ratio)
|
||||
- https://bugs.launchpad.net/nova/+bug/1214943
|
||||
Live migration should use the same memory over subscription logic as instance
|
||||
boot
|
||||
- https://bugs.launchpad.net/nova/+bug/1452568
|
||||
nova allows to live-migrate instance from one availability zone to another
|
||||
194
specs/mitaka/implemented/cinder-backend-report-discard.rst
Normal file
194
specs/mitaka/implemented/cinder-backend-report-discard.rst
Normal file
@@ -0,0 +1,194 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
============================================================
|
||||
Add ability to support discard/unmap/trim for Cinder backend
|
||||
============================================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/cinder-backend-report-discard
|
||||
|
||||
Currently, libvirt/qemu has support for a discard option when attaching a
|
||||
volume to an instance. With this feature, the unmap/trim command can be sent
|
||||
from guest to the physical storage device.
|
||||
|
||||
A cinder back-end will report a connection capability that Nova will use
|
||||
in attaching a volume.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Currently there is no way for Nova to know if a Cinder back end supports
|
||||
discard/trim/unmap functionality. Functionality is being added in Cinder
|
||||
to supply this information. The spec seeks to add the ability to consume
|
||||
that information.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
If a Cinder backend uses media that can make use of discard functionality
|
||||
there should be a way to do this. This will improve long term performance
|
||||
of such back ends.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Code will be added to check for a 'discard' property returned to Nova from
|
||||
the Cinder attach API. When present and set to True we will modify the config
|
||||
returned by the libvirt volume driver to contain::
|
||||
|
||||
driver_discard = "unmap"
|
||||
|
||||
This will only give the desired support if the instance is configured with a
|
||||
interface and bus type that will support Trim/Unmap commands. In the case where
|
||||
it is possible to detect that discard will not actually work for the instance
|
||||
we will log a warning, but continue on with the attach anyway.
|
||||
|
||||
Currently the virtio-blk backend does not support discard.
|
||||
|
||||
There will be several ways to get an instance that will support discard, one
|
||||
example is to use the virtio-scsi storage interface with a scsi bus type. To
|
||||
create an instance with this support it must be booted from an image
|
||||
configured with ``hw_scsi_model=virtio-scsi`` and ``hw_disk_bus=scsi``.
|
||||
|
||||
It is important to note that the nova.conf option hw_disk_discard is NOT read
|
||||
for this feature. We rely entirely on Cinder to specify whether or not discard
|
||||
should be used for the volume.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Alternatives include adding discard for all drives if the operator has set
|
||||
hw_disk_discard but it was decided this was not a good way to solve the
|
||||
problem as you could not mix different underlying volume providers easily.
|
||||
|
||||
We could also hot-plug a SCSI controller that is capable of supporting discard
|
||||
when attaching Cinder volumes. This would allow for mixing a non-trim boot
|
||||
disk from an image and then attaching a Cinder volume that would get the
|
||||
benefits. The risk is that the instance may not be able to actually support
|
||||
doing UNMAP.
|
||||
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
There will be a performance gain for back ends that benefit from having
|
||||
discard functionality.
|
||||
|
||||
See https://en.wikipedia.org/wiki/Trim_(computing) for more info.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
Deployers wanting to use this feature with their Cinder backend will need to
|
||||
ensure the instances are configured with a SCSI model and bus that support
|
||||
discard. This includes IDE, AHCI, and Xen disks. virtio-blk is the only
|
||||
backend missing this support.
|
||||
|
||||
A simple way to enable this is to modify Glance images to contain the
|
||||
following properties::
|
||||
|
||||
hw_scsi_model=virtio-scsi
|
||||
hw_disk_bus=scsi
|
||||
|
||||
In addition compute nodes will need to be using libvirt 1.0.6 or higher and
|
||||
QEMU 1.6.0 or higher.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
* Patrick East
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Modify volume attach code in libvirt driver to check for the new Cinder
|
||||
connection property.
|
||||
* Add unit tests for new functionality, modify any existing as needed.
|
||||
* Configure Pure Storage 3rd party CI system to enable the feature and
|
||||
validate it as a Cinder CI. This configuration change will be made available
|
||||
to any other 3rd party CI maintainer to allow additional systems to test with
|
||||
this feature enabled.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
Cinder Blueprint (Completed and released in Liberty):
|
||||
https://blueprints.launchpad.net/cinder/+spec/cinder-backend-report-discard
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Unit tests needs to include all permutations of the discard
|
||||
flag from Cinder.
|
||||
|
||||
We could enable one of the jenkins jobs to be configured to enable this. A nice
|
||||
starting point would maybe be the Ceph jobs. Potentially a Tempest test could
|
||||
be added behind a config option to validate volume attachments do get the
|
||||
correct discard settings.
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
We may want to add documentation to the Cloud Administrator Guide on how to
|
||||
utilize this feature.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
Cinder Blueprint:
|
||||
https://blueprints.launchpad.net/cinder/+spec/cinder-backend-report-discard
|
||||
|
||||
Cinder Spec:
|
||||
http://specs.openstack.org/openstack/cinder-specs/specs/liberty/cinder-backend-report-discard.html
|
||||
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Mitaka
|
||||
- Introduced
|
||||
229
specs/mitaka/implemented/get-valid-server-state.rst
Normal file
229
specs/mitaka/implemented/get-valid-server-state.rst
Normal file
@@ -0,0 +1,229 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================================
|
||||
Get valid server state
|
||||
==========================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/get-valid-server-state
|
||||
|
||||
When a compute service fails, the power states of the hosted VMs are not
|
||||
updated. A normal user querying his or her VMs does not get any indication
|
||||
about the failure. Also there is no indication about maintenance.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
VM query do not give needed information to the user about a compute host that
|
||||
is failed/unreachable, nova-compute service that is failed/stopped or
|
||||
nova-compute service that is explicitly marked as failed or disabled. The user
|
||||
should get the information about nova-compute state when querying his or her
|
||||
VMs to get better understanding about the situation.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
As a user I want to be able to have accurate VM state information even when the
|
||||
compute service fails or host is down, so I can do quick actions for my VMs.
|
||||
Mostly the failure information is critical to a user having HA type of VMs that
|
||||
needs to make a quick switch over for service. Other thing is for user or admin
|
||||
to do something for the VMs on the host. Action might be case and deployment
|
||||
specific, as some admin actions can be automated for external service and some
|
||||
left to user. Normally user can just do just delete or create for a VM.
|
||||
|
||||
As a user I want to get information about maintenance, so I can do actions for
|
||||
my VMs. As user get information about host being in maintenance (service=
|
||||
disabled), user knows to plan what to do for his or her VMs as host may be
|
||||
rebooted soon.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
A new ``host_status`` field will be added to the ``/servers/{server_id}`` and
|
||||
``/servers/detail`` endpoints. ``host_status`` will be ``UP`` if nova-compute's
|
||||
state is up, ``DOWN`` if nova-compute is forced_down, ``UNKNOWN`` if
|
||||
nova-compute last_seen_up is not up-to-date and ``MAINTENANCE`` if
|
||||
nova-compute's state disabled. Needed information can be retriewed by host
|
||||
API and servicegroup API if new policy allows. forced_down flag handling is
|
||||
described in this spec:
|
||||
http://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/mark-host-down.html
|
||||
|
||||
A new policy element will be added to control access to ``host_status``. This
|
||||
can be used both to prevent this host-based data being disclosed as well as to
|
||||
eliminate the performance impact of this feature.
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
When returning the VM power_state, check the service status for the host. If
|
||||
the service is ``forced_down``, return ``UNKNOWN`` instead. This would be an
|
||||
API-only change, it is NOT proposed that we update the DB value to
|
||||
``UNKNOWN``. This means we retain a record of the VM power state independent
|
||||
of the service state, which may be interesting in case the host lost network
|
||||
rather than power. Community feedback indicated that as the power_state is only
|
||||
true for a point in time anyway, technically the state is always ``UNKNOWN``.
|
||||
|
||||
``os-services/force-down`` could mark all VMs managed by the affected service
|
||||
as ``UNKNOWN`` in db. This would sometimes be wrong as a VM can be up even if
|
||||
its host is unreachable. This would make also a need to remove this state data
|
||||
in case VM evacuated to another compute node.
|
||||
|
||||
A possible extension is a host ``NEEDS_MAINTENANCE`` state, which would show
|
||||
that maintenance is required soon. This would allow users who monitor this info
|
||||
to prepare their VMs for downtime and enter maintenance at a time convenient
|
||||
for them.
|
||||
|
||||
An extension could be added for filtering ``/servers`` and ``/servers/detail``
|
||||
endpoints response message by ``host_status``.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
GET ``/v2.1/{tenant_id}/servers/{server_id}`` and ``/v2.1/{tenant_id}/servers/
|
||||
detail`` will return ``host_status`` field if "os_compute_api:servers:show:
|
||||
host_status" policy is defined for the user. This will require a microversion.
|
||||
|
||||
Case where nova-compute enabled and reporting normally::
|
||||
|
||||
GET /v2.1/{tenant_id}/servers/{server_id}
|
||||
|
||||
200 OK
|
||||
{
|
||||
"server": {
|
||||
"host_status": "UP",
|
||||
...
|
||||
}
|
||||
}
|
||||
|
||||
Case where nova-compute enabled, but not reporting normally::
|
||||
|
||||
GET /v2.1/{tenant_id}/servers/{server_id}
|
||||
|
||||
200 OK
|
||||
{
|
||||
"server": {
|
||||
"host_status": "UNKNOWN",
|
||||
...
|
||||
}
|
||||
}
|
||||
|
||||
Case where nova-compute enabled, but forced_down::
|
||||
|
||||
GET /v2.1/{tenant_id}/servers/{server_id}
|
||||
|
||||
200 OK
|
||||
{
|
||||
"server": {
|
||||
"host_status": "DOWN",
|
||||
...
|
||||
}
|
||||
}
|
||||
|
||||
Case where nova-compute disabled::
|
||||
|
||||
GET /v2.1/{tenant_id}/servers/{server_id}
|
||||
|
||||
200 OK
|
||||
{
|
||||
"server": {
|
||||
"host_status": "MAINTENANCE",
|
||||
...
|
||||
}
|
||||
}
|
||||
|
||||
This may be presented by python-novaclient as::
|
||||
|
||||
+-------+------+--------+------------+-------------+----------+-------------+
|
||||
| ID | Name | Status | Task State | Power State | Networks | Host Status |
|
||||
+-------+------+--------+------------+-------------+----------+-------------+
|
||||
| 9a... | vm1 | ACTIVE | - | RUNNING | xnet=... | UP |
|
||||
+-------+------+--------+------------+-------------+----------+-------------+
|
||||
|
||||
New policy element to be added to allow assigning permission to see
|
||||
host_status:
|
||||
|
||||
::
|
||||
|
||||
"os_compute_api:servers:show:host_status": "rule:admin_api"
|
||||
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
Normal users may be able to correlate host states across multiple VMs to draw
|
||||
conclusions about the cloud topology. This can be prevented by not granting the
|
||||
policy.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
An additional database query will be required to look up the service when a
|
||||
server detail request is received.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee: Tomi Juvonen
|
||||
Other contributors: None
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Expose host_status as detailed.
|
||||
* Update python-novaclient.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Unit and functional test cases needs to be added.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
API change needs to be documented:
|
||||
|
||||
* Compute API extensions documentation.
|
||||
http://developer.openstack.org/api-ref-compute-v2.1.html
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* https://blueprints.launchpad.net/nova/+spec/mark-host-down
|
||||
* OPNFV Doctor project: https://wiki.opnfv.org/doctor
|
||||
387
specs/mitaka/implemented/image-verification.rst
Normal file
387
specs/mitaka/implemented/image-verification.rst
Normal file
@@ -0,0 +1,387 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===========================
|
||||
Nova Signature Verification
|
||||
===========================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/nova-support-image-signing
|
||||
|
||||
OpenStack currently does not support signature validation of uploaded signed
|
||||
images. Equipping Nova with the ability to validate image signatures will
|
||||
provide end users with stronger assurances of the integrity of the image data
|
||||
they are using to create servers. This change will use the same data model for
|
||||
image metadata as the accompanying functionality in Glance, which will allow
|
||||
the end user to sign images and verify these image signatures upon upload [1].
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Currently, OpenStack's protection against unexpected modification of images is
|
||||
limited to verifying an MD5 checksum. While this may be sufficient for
|
||||
protecting against accidental modifications, MD5 is a hash function, not an
|
||||
authentication primitive [2], and thus provides no protection against
|
||||
deliberate, malicious modification of images. An image could potentially be
|
||||
modified in transit, such as when it is uploaded to Glance or transferred to
|
||||
Nova. An image that is modified could include malicious code. Providing
|
||||
support for signature verification would allow Nova to verify the signature
|
||||
before booting and alert the user of successful signature verification via a
|
||||
future API change. This feature will secure OpenStack against the following
|
||||
attack scenarios:
|
||||
|
||||
* Man-in-the-Middle Attack - An adversary with access to the network between
|
||||
Nova and Glance is altering image data as Nova downloads the data from
|
||||
Glance. The adversary is potentially incorporating malware into the image
|
||||
and/or altering the image metadata.
|
||||
|
||||
* Untrusted Glance - In a hybrid cloud deployment, Glance is hosted on
|
||||
machines which are located in a physically insecure location or is hosted by
|
||||
a company with limited security infrastructure. Adversaries may be able to
|
||||
compromise the integrity of Glance and/or the integrity of images stored by
|
||||
Glance through physical access to the host machines or through poor network
|
||||
security on the part of the company hosting Glance.
|
||||
|
||||
Please note that our threat model considers only threats to the integrity of
|
||||
images while they are in transit between the end user and Glance, while they
|
||||
are at rest in Glance and while they are in transit between Glance and Nova.
|
||||
This threat model does not include, and this feature therefore does not
|
||||
address, threats to the integrity, availability, or confidentiality of Nova.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
* A user wants a high degree of assurance that a customized image which they
|
||||
have uploaded to Glance has not been accidentally or maliciously modified
|
||||
prior to booting the image.
|
||||
|
||||
With this proposed change, Nova will verify the signature of a signed image
|
||||
while downloading that image. If the image signature cannot be verified, then
|
||||
Nova will not boot the image and instead place the instance into an error
|
||||
state. The user will begin to use this feature by uploading the image and the
|
||||
image signature metadata to Glance via the Glance API's image-create method.
|
||||
The required image signature metadata properties are as follows:
|
||||
|
||||
* img_signature - A string representation of the base 64 encoding of the
|
||||
signature of the image data.
|
||||
|
||||
* img_signature_hash_method - A string designating the hash method used for
|
||||
signing. Currently, the supported values are SHA-224, SHA-256, SHA-384 and
|
||||
SHA-512. MD5 and other cryptographically weak hash methods will not be
|
||||
supported for this field. Any image signed with an unsupported hash
|
||||
algorithm will not pass validation.
|
||||
|
||||
* img_signature_key_type - A string designating the signature scheme used to
|
||||
generate the signature.
|
||||
|
||||
* img_signature_certificate_uuid - A string encoding the certificate
|
||||
uuid used to retrieve the certificate from the key manager.
|
||||
|
||||
The image verification functionality in Glance uses the signature_utils
|
||||
module to verify this signature metadata before storing the image. If the
|
||||
signature is not valid or the metadata is incomplete, this API method will
|
||||
return a 400 error status and put the image into a "killed" state. Note that,
|
||||
if the signature metadata is simply not present, the image will be stored as
|
||||
it would normally.
|
||||
|
||||
The user would then create an instance from this image using the Nova API's
|
||||
boot method. If the verify_glance_signatures flag in nova.conf is set to
|
||||
'True', Nova will call out to Glance for the image's properties, which include
|
||||
the properties necessary for image signature verification. Nova will pass the
|
||||
image data and image properties to the signature_utils module, which will
|
||||
verify the signature. If signature verification fails, or if the image
|
||||
signature metadata is either incomplete or absent, booting the instance will
|
||||
fail and Nova will log an exception. If signature verification succeeds, Nova
|
||||
will boot the instance and log a message indicating that image signature
|
||||
verification succeeded along with detailed information about the signing
|
||||
certificate.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
The first component in this change is the creation of a standalone module
|
||||
responsible for the bulk of the functionality necessary for image signature
|
||||
verification. This module will primarily consist of three public-facing
|
||||
methods: an initializing method, an updating method, and a verifying method.
|
||||
The initializing method will take the signing certificate uuid and the
|
||||
specified hash method as inputs. This method will then fetch the signing
|
||||
certificate by interfacing with the key manager through Castellan, extract the
|
||||
public key, store the public key, certificate and hash method as attributes
|
||||
and return an instance of the signature verification module. As the image's
|
||||
data is downloaded, the signature verification module will be updated by
|
||||
passing chunks of image to the verifying module via the update method. When
|
||||
all chunks of image data have been passed to the verifier, the service
|
||||
desiring verfication will call the verify method, passing it the image
|
||||
signature. More specifically, this module will apply the public key to the
|
||||
signature, and compare this result to the result of applying the hash
|
||||
algorithm to the image data. This workflow is essentially a wrapped version of
|
||||
the workflow by which signature verification occurs in pyca/cryptography.
|
||||
|
||||
We then propose an initial implementation by incorporating this module into
|
||||
Nova's control flow for booting instances from images. Upon downloading an
|
||||
image, Nova will check whether the verify_glance_signatures configuration flag
|
||||
is set in nova.conf. If so, the module will perform image signature
|
||||
verification using image properties passed to Nova by Glance. If this fails,
|
||||
or if the image signature metadata is incomplete or missing, Nova will not
|
||||
boot the image. Instead, Nova will throw an exception and log an error. If the
|
||||
signature verification succeeds, Nova will proceed with booting the instance.
|
||||
|
||||
The next component will be to add functionality to the pyca/cryptography
|
||||
library which will validate a given certificate chain against a pool of given
|
||||
root certificates which are known to be trusted. This algorithm for validating
|
||||
chains of certificates against a set of trusted root certificates is a
|
||||
standard, and has been outlined in RFC 5280 [3].
|
||||
|
||||
Once the certificate validation functionality has been added to the
|
||||
pyca/cryptography library, we will amend the signature_utils module by
|
||||
incorporating certificate validation into the signature verification workflow.
|
||||
We will implement functionality in the signature_utils module which will use
|
||||
GET requests to dynamically fetch the certificate chain for a given
|
||||
certificate. Any service using the signature_utils module will now call the
|
||||
signature_utils module's initializing method with an additional parameter: a
|
||||
list of references representing a pool of trusted root certificates. This
|
||||
module will then use its certificate chain fetching functionality to build the
|
||||
certificate chain for the signing certificate, fetch the root certificates
|
||||
through Castellan, and will verify this chain against the trusted root
|
||||
certificates using the functionality in the pyca/cryptography library. If the
|
||||
chain fails validation, then an exception will be thrown and signature
|
||||
verification will fail. Nova will retrieve the root certificate references
|
||||
necessary to call the updated functionality of the signature_utils module by
|
||||
reading the references in from a root_certificate_references configuration
|
||||
option in nova.conf.
|
||||
|
||||
Future API changes are necessary to mitigate attacks that are possible when
|
||||
Glance is untrusted; such as Glance returning a different signed image than the
|
||||
image that was requested. Possible changes include the following extensions:
|
||||
|
||||
* Modify the REST API to accept a specific signature required to verify the
|
||||
integrity of the image. If the specified signature cannot be verified, then
|
||||
Nova refuses to boot the image and returns an appropriate error message to
|
||||
the end user. This change builds upon a spec that allows overriding image
|
||||
properties at boot time [4].
|
||||
|
||||
* Modify the REST API to provide metadata back to the end user for successful
|
||||
boot requests. This metadata would include the signing certificate ownership
|
||||
information and a base64 encoding of the signature. The user can use an out-
|
||||
of-band mechanism to manually verify that the encoded version of the
|
||||
signature matches the expected signature.
|
||||
|
||||
The first approach is preferred since it may be fully automated whereas the
|
||||
second approach requires manual verification by the end user.
|
||||
|
||||
The certificate references will be used to access the certificates from a key
|
||||
manager through the interface provided by Castellan.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
An alternative to signing the image's data directly is to support signatures
|
||||
which are created by signing a hash of the image data. This introduces
|
||||
unnecessary complexity to the feature by requiring an additonal hashing stage
|
||||
and an additional metadata option. Due to the Glance community's performance
|
||||
concerns associated with hashing image data, we initially pursued an
|
||||
implementation which produced the signature by signing an MD5 checksum which
|
||||
was already computed by Glance. This approach was rejected by the Nova
|
||||
community due to the security weaknesses of MD5 and the unnecessary complexity
|
||||
of performing a hashing operation twice and maintaining information about both
|
||||
hash algorithms.
|
||||
|
||||
An alternative to using pyca/cryptography for the hashing and signing
|
||||
functionality is to use PyCrypto. We are electing to use pyca/cryptography
|
||||
based on both the shift away from PyCrypto in OpenStack's requirements and the
|
||||
recommendations of cryptographers reviewing the accompanying Glance spec [5].
|
||||
|
||||
An alternative to using certificates for signing and signature verification
|
||||
would be to use a public key. However, this approach presents the significant
|
||||
weakness that an attacker could generate their own public key in the key
|
||||
manager, use this to sign a tampered image, and pass the reference to their
|
||||
public key to Nova along with their signed image. Alternatively, the use of
|
||||
certificates provides a means of attributing such attacks to the certificate
|
||||
owner, and follows common cryptographic standards by placing the root of trust
|
||||
at the certificate authority.
|
||||
|
||||
An alternative to using the verify_glance_signatures configuration flag to
|
||||
specify that Nova should perform image signature verification is to use
|
||||
"trusted" flavors to specify that individual instances should be created from
|
||||
signed images. The user, when using the Nova CLI to boot an instance, would
|
||||
specify one of these "trusted" flavors to indicate that image signature
|
||||
verification should occur as part of the control flow for booting the
|
||||
instance. This may be added in a later change, but will not be included in the
|
||||
initial implementation. If added, the trusted flavors option will work
|
||||
alongside the configuration option approach. In this case, Nova would perform
|
||||
image signature verification if either the configuration flag is set, or if
|
||||
the user has specified booting an instance of the "trusted" flavor.
|
||||
|
||||
Supporting the untrusted Glance use case requires future modifications to the
|
||||
REST API as previously described. An alternative to the proposed approach uses
|
||||
a "sign-the-hash" method for signatures instead of signing the image content
|
||||
directly. In this case, Nova's REST API can be modified to allow the user to
|
||||
specify a hash algorithm and expected hash value as part of the boot command.
|
||||
If the actual hash value does not match, then Nova will not boot the image.
|
||||
Signing the hash instead of the image directly is useful because hashes are
|
||||
commonly provided for cloud images and users can obtain these hashes
|
||||
out-of-band.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
The accompanying work in Glance introduced additional Glance image properties
|
||||
necessary for image signing. The initial implementation in Nova will introduce
|
||||
a configuration flag indicating whether Nova should perform image signature
|
||||
verification before booting an image. The updated implementation which
|
||||
includes certificate validation will introduce an addtional configuration flag
|
||||
for specifying the trusted root certificates.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
A future change will modify the request or response to the boot command. This
|
||||
change supports the untrusted Glance use cases by giving the user additional
|
||||
assurance that the desired image has been booted.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
Nova currently lacks a mechanism to validate images prior to booting them. The
|
||||
checksum included with an image protects against accidental modifications but
|
||||
provides little protection against an adversary with access to Glance or to
|
||||
the communication network between Nova and Glance. This feature facilitates
|
||||
the creation of a logical trust boundary between Nova and Glance; this trust
|
||||
boundary permits the end user to have high assurance that Nova is booting an
|
||||
image signed by a trusted user.
|
||||
|
||||
Although Nova will use certificates to perform this task, the certificates
|
||||
will be stored by a key manager and accessed via Castellan.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
If the verification of a signature fails, then Nova will not boot an instance
|
||||
from the image, and an error message will be logged. The user would then have
|
||||
to edit the image's metadata through the Glance API, the Nova API, or the
|
||||
Horizon interface; or reinitiate an upload of the image to Glance with the
|
||||
correct signature metadata in order to boot the image.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
This feature will only be used if the verify_glance_signatures configuration
|
||||
flag is set.
|
||||
|
||||
When signature verification occurs there will be latency as a result of
|
||||
retrieving certificates from the key manager through the Castellan interface.
|
||||
There will also be CPU overhead associated with hashing the image data and
|
||||
decrypting a signature using a public key.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
In order to use this feature, a key manager must be deployed and configured.
|
||||
Additionally, Nova must be configured to use a root certificate which has a
|
||||
root of trust that can respond to an end user's certificate signing requests.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
dane-fichter
|
||||
|
||||
Other contributors:
|
||||
brianna-poulos
|
||||
joel-coffman
|
||||
|
||||
Reviewers
|
||||
---------
|
||||
|
||||
Core reviewer(s):
|
||||
None
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
The feature will be implemented in the following stages:
|
||||
|
||||
* Create standalone signature_utils module which handles interfacing with a
|
||||
key manager through Castellan and verifying signatures.
|
||||
|
||||
* Add functionality to Nova which calls the standalone module when Nova
|
||||
uploads a Glance image and the verify_glance_signatures configuration flag
|
||||
is set.
|
||||
|
||||
* Add certificate validation functionality to the pyca/cryptography library.
|
||||
|
||||
* Add functionality to the signature_utils module which fetches certificate
|
||||
chains. Incorporate this method, along with the pyca/cryptography library's
|
||||
certificate validation functionality into the signature_utils module's
|
||||
functionality for verifying image signatures.
|
||||
|
||||
* Amend the initial implementation in Nova to utilize this change by allowing
|
||||
Nova to fetch root certificate references and pass them to the image
|
||||
signature verification method.
|
||||
|
||||
* Implement a REST API change to respond to a successful boot request with
|
||||
information relevant to the signing data and/or implement a REST API change
|
||||
to allow the end user to specify the expected signature at boot time.
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
The pyca/cryptography library, which is already a Nova requirement, will be
|
||||
used for hash creation and signature verification. The certificate validation
|
||||
portion of this change is dependent upon adding certificate validation
|
||||
functionality to the pyca/cryptography library.
|
||||
|
||||
In order to simplify the interaction with the key manager and allow multiple
|
||||
key manager backends, this feature will use the Castellan library [6]. Since
|
||||
Castellan currently only supports integration with Barbican, using Castellan
|
||||
in this feature indirectly requires Barbican. In the future, as Castellan
|
||||
supports a wider variety of key managers, our feature will require minimal
|
||||
upkeep to support these key managers; we will simply update Nova's and
|
||||
Glance's requirements to use the latest Castellan version.
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Unit tests will be sufficient to test the functionality implemented in Nova.
|
||||
We will need to implement Tempest and functional tests to test the
|
||||
interoperability of this feature with the accompanying functionality in
|
||||
Glance.
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Instructions for how to use this functionality will need to be documented.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
Cryptography API: https://pypi.python.org/pypi/cryptography/0.2.2
|
||||
|
||||
[1] https://review.openstack.org/#/c/252462/
|
||||
[2] https://en.wikipedia.org/wiki/MD5#Security
|
||||
[3] https://tools.ietf.org/html/rfc5280#section-6.1
|
||||
[4] https://review.openstack.org/#/c/230382/
|
||||
[5] https://review.openstack.org/#/c/177948/
|
||||
[6] http://git.openstack.org/cgit/openstack/castellan
|
||||
245
specs/mitaka/implemented/instance-crash-dump.rst
Normal file
245
specs/mitaka/implemented/instance-crash-dump.rst
Normal file
@@ -0,0 +1,245 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=========================================
|
||||
Support triggering crash dump in a server
|
||||
=========================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/instance-crash-dump
|
||||
|
||||
This spec adds a new API to trigger crash dump in a server (instance or
|
||||
baremetal) by injecting a driver-specific signal to the server.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
For now, we can not trigger crash dump in a server from nova. But users need
|
||||
this functionality for some debug purpose.
|
||||
|
||||
If OS occurs a bug(kernel panic), it triggers the kernel crash dump by itself.
|
||||
But if the OS is *stalling*, we need to trigger crash dump from hardware.
|
||||
Different platforms could have different ways to trigger crash dump in a
|
||||
server. And Nova drivers need to implement them.
|
||||
|
||||
For x86 platform, using NMI(Non-maskable Interruption) could trigger crash dump
|
||||
in OS. User should configure the OS to trigger crash dump when it receives an
|
||||
NMI. In Linux, it can be done by::
|
||||
|
||||
$ echo 1 > /proc/sys/kernel/panic_on_io_nmi
|
||||
|
||||
Many hypervisors support injecting NMI to instance.
|
||||
|
||||
* Libvirt supports the command "virsh inject-nmi" [1].
|
||||
|
||||
* Ipmitool supports the command "ipmitool chassis power diag" [2].
|
||||
|
||||
* Hyper-V Cmdlets supports the command
|
||||
"Debug-VM -InjectNonMaskableInterrupt" [3].
|
||||
|
||||
So we should add an API to inject NMI to server in driver level. Libvirt driver
|
||||
has implemented such an API [6]. And so will ironic driver for baremetal. And
|
||||
then add an Nova API to trigger crash dump in server.
|
||||
|
||||
This should be optional for drivers.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
An end user needs an interface to trigger crash dump in his servers. By the
|
||||
trigger, the kernel crash dump mechanism dumps the production memory image as
|
||||
dump file, and reboot the kernel again. After that, the end user can get the
|
||||
dump file in his server's disk, and investigate the problem reason based on the
|
||||
file.
|
||||
|
||||
This spec only implement the process of triggering crash dump. Where the dump
|
||||
file will be depends on how the user configures the dump mechanism in his
|
||||
server. Take Linux as an example:
|
||||
|
||||
* If user configures kdump to store dump file on local disk, then user needs to
|
||||
reboot the server and access the dump file on local disk.
|
||||
* If user configures kdump to copy dump file to NFS storage, then user could
|
||||
find the dump file on NFS storage without rebooting the server.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
* Add a libvirt driver API to inject NMI to an instance.
|
||||
(Already merged in Liberty. [6])
|
||||
|
||||
* Add an ironic driver API to inject NMI to a baremetal.
|
||||
|
||||
* Add a Nova API to trigger crash dump in server using the driver API
|
||||
introduced above. If the hypervisor doesn't support injecting NMI,
|
||||
NotImplementedError will be raised. This method does not modify instance's
|
||||
task_state or vm_state.
|
||||
|
||||
* A new instance action will be introduced.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
None
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
* Specification for the method
|
||||
|
||||
* A description of what the method does suitable for use in user
|
||||
documentation
|
||||
|
||||
* Trigger crash dump in a server.
|
||||
|
||||
* Method type
|
||||
|
||||
* POST
|
||||
|
||||
* Normal http response code
|
||||
|
||||
* 202: Accepted
|
||||
|
||||
* Expected error http response code
|
||||
|
||||
* badRequest(400)
|
||||
|
||||
* When RPC doesn't support this API, this error will be returned. If a
|
||||
driver does not implement the API, the error is handled by the new
|
||||
instance action because the API is asynchronous.
|
||||
|
||||
* itemNotFound(404)
|
||||
|
||||
* There is no instance or baremetal which has the specified uuid.
|
||||
|
||||
* conflictingRequest(409)
|
||||
|
||||
* The server status must be ACTIVE, PAUSED, RESCUED, RESIZED or ERROR.
|
||||
If not, this code is returned.
|
||||
|
||||
* If the specified server is locked, this code is returned to a user
|
||||
without administrator privileges. When using the kernel dump
|
||||
mechanism, it causes a server reboot. So, only administrators can
|
||||
send an NMI to a locked server as other power actions.
|
||||
|
||||
* URL for the resource
|
||||
|
||||
* /v2.1/servers/{server_id}/action
|
||||
|
||||
* Parameters which can be passed via the url
|
||||
|
||||
* A server uuid is passed.
|
||||
|
||||
* JSON schema definition for the body data
|
||||
|
||||
::
|
||||
|
||||
{
|
||||
"trigger_crash_dump": null
|
||||
}
|
||||
|
||||
* JSON schema definition for the response data
|
||||
|
||||
* When the result is successful, no response body is returned.
|
||||
|
||||
* When an error occurs, the response data includes the error message [5].
|
||||
|
||||
* This REST API will require an API microversion.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
* A client API for this new API will be added to python-novaclient
|
||||
|
||||
* A CLI for the new API will be added to python-novaclient. ::
|
||||
|
||||
nova trigger-crash <server>
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
The default policy for this API is for admin and owners by default.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
This spec will implement the new API in libvirt driver, ironic driver, and
|
||||
nova itself.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Tang Chen (tangchen)
|
||||
|
||||
Other contributors:
|
||||
shiina-horonori (hshiina)
|
||||
|
||||
Work Items
|
||||
----------
|
||||
* Add a new REST API.
|
||||
|
||||
* Add a new driver API.
|
||||
|
||||
* Implement the API in libvirt driver.
|
||||
|
||||
* Implement the API in ironic driver.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
This spec is related to the blueprint in ironic.
|
||||
|
||||
* https://blueprints.launchpad.net/ironic/+spec/enhance-power-interface-for-soft-reboot-and-nmi
|
||||
|
||||
Testing
|
||||
=======
|
||||
Unit tests will be added.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
* The new API should be added to the documentation.
|
||||
|
||||
* The support matrix below will be updated because this functionality is
|
||||
optional to drivers.
|
||||
http://docs.openstack.org/developer/nova/support-matrix.html
|
||||
|
||||
References
|
||||
==========
|
||||
[1] http://linux.die.net/man/1/virsh
|
||||
|
||||
[2] http://linux.die.net/man/1/ipmitool
|
||||
|
||||
[3] https://technet.microsoft.com/en-us/library/dn464280.aspx
|
||||
|
||||
[4] https://review.openstack.org/#/c/183456/
|
||||
|
||||
[5] http://docs.openstack.org/developer/nova/v2/faults.html
|
||||
|
||||
[6] https://review.openstack.org/#/c/202380/
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Liberty
|
||||
- Introduced
|
||||
* - Mitaka
|
||||
- Change API action name, and add ironic driver plan
|
||||
265
specs/mitaka/implemented/internal-dns-resolution.rst
Normal file
265
specs/mitaka/implemented/internal-dns-resolution.rst
Normal file
@@ -0,0 +1,265 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===============================
|
||||
Neutron DNS Using Nova Hostname
|
||||
===============================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/neutron-hostname-dns
|
||||
|
||||
Users of an OpenStack cloud would like to look up their instances by name in an
|
||||
intuitive way using the Domain Name System (DNS).
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Users boot an instance using Nova and they give that instance an "Instance
|
||||
Name" as it is called in the Horizon interface. That name is used as the
|
||||
foundation for the hostname from the perspective of the operating system
|
||||
running in the instance. It is reasonable to expect some integration of this
|
||||
name with DNS.
|
||||
|
||||
Neutron already enables DNS lookup for instances using an internal dnsmasq
|
||||
instance. It generates a generic hostname based on the private IP address
|
||||
assigned to the system. For example, if the instance is booted with
|
||||
*10.224.36.4* then the hostname generated is *host-10-224-36-4.openstacklocal.*
|
||||
The generated name from Neutron is not presented anywhere in the API and
|
||||
therefore cannot be presented in any UI either.
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
#. DNS has a name matching the hostname which is something that sudo looks for
|
||||
each time it is run [#]_. Other software exists which wants to be able to
|
||||
look up the hostname in DNS. Sudo still works but a number of people
|
||||
complain about the warning generated::
|
||||
|
||||
$ sudo id
|
||||
sudo: unable to resolve host vm-1
|
||||
uid=0(root) gid=0(root) groups=0(root)
|
||||
#. The End User has a way to know the DNS name of a new instance. These names
|
||||
are often easier to use than the IP address.
|
||||
#. Neutron can automatically share the DNS name with an external DNS system
|
||||
[#]_ such as Designate. This isn't in the scope of this blueprint but is
|
||||
something that cannot be done without it.
|
||||
|
||||
.. [#] https://bugs.launchpad.net/nova/+bug/1175211
|
||||
.. [#] https://review.openstack.org/#/c/88624/
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
This blueprint will reconcile the DNS name between Nova and Neutron. Nova will
|
||||
pass the *hostname* to the Neutron API as part of any port create or update
|
||||
using a new *dns_name* field in the Neutron API. Neutron DHCP offers will use
|
||||
the instance's hostname. Neutron DNS will reply to queries for the new
|
||||
hostname.
|
||||
|
||||
To handle existing installations, Neutron will fall back completely to the
|
||||
current behavior in the event that a dns_name is not supplied on the port.
|
||||
|
||||
Nova will pass its sanitized hostname when it boots using an existing Neutron
|
||||
port by updating the port with the dns_name field. This will be augmented in
|
||||
the following ways:
|
||||
|
||||
#. Nova will pass the VM hostname using a new *dns_name* field in the port
|
||||
rather than the *name* field on create or update. The handling of the
|
||||
hostname will be consistent with cloud-init.
|
||||
|
||||
- If Nova is creating the port, or updating a port where dns_name is not
|
||||
set, then it sets dns_name to the VM hostname.
|
||||
- If an existing port is passed to Nova with dns_name set then Nova will
|
||||
reject that as an invalid network configuration and fail the request,
|
||||
unless the hostname and the port's dns_name match. Nova will not attempt
|
||||
to adopt the name from the port. This is confusing to the user and a
|
||||
source of errors if a port is reused between instances.
|
||||
|
||||
#. Nova will recognize an error from the Neutron API server and retry without
|
||||
*dns_name* if it is received. This error will be returned if Neutron has
|
||||
not been upgraded to handle the dns_name field. This check will be
|
||||
well-documented in the code as a short-term issue and will be deprecated in
|
||||
a following release. Adding this check will save deployers from having to
|
||||
coordinate deployment of Nova and Neutron.
|
||||
#. Neutron will insure the dns_name passed to it for DNS label validity and
|
||||
also for uniqueness within the scope of the configured domain name. If it
|
||||
fails, then both the port create and the instance boot will fail. Neutron
|
||||
will *only* begin to fail port creations *after* it has been upgraded with
|
||||
the corresponding changes *and* the user has enabled DNS resolution on the
|
||||
network by associating a domain name other than the default openstack.local.
|
||||
This will avoid breaking existing work-flows that might use unacceptable DNS
|
||||
names.
|
||||
|
||||
.. NOTE:: If the user updates the dns_name on the Neutron port after the VM has
|
||||
already booted then there will be an inconsistency between the hostname in
|
||||
DNS and the instance hostname. This blueprint will not do any special
|
||||
handling of this case. The user should not be managing the hostname through
|
||||
both Nova and Neutron. I don't see this as a big concern for user
|
||||
experience.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Move Validation to Nova
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Duplicate name detection could be attempted in Nova. I've seen duplicate names
|
||||
in the wild. Nova likely does not have the information necessary to check for
|
||||
duplicate names within the appropriate scope. For example, I would like to
|
||||
check duplicate names per domain across networks, this will be difficult for
|
||||
Nova.
|
||||
|
||||
Move Port Creation Earlier
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
It may be better if Nova could attempt port creation with Neutron before the
|
||||
API operation completes so that the API operation will fail if the port
|
||||
creation fails. In the current design, the Nova API call will succeed and the
|
||||
port creation failure will cause the instance to go to an error state. I
|
||||
believe the thing preventing this is the use case where a bare-metal instance
|
||||
is being booted. In that use case, Nova must wait until the instance has been
|
||||
scheduled before it can get the mac address of the interface to give to port
|
||||
creation.
|
||||
|
||||
This change will make for a better user experience in the long run. However,
|
||||
this work is out of the scope of this blueprint and can be done as follow up
|
||||
work independently. One possibility that should be explored is to allow
|
||||
updating the Neutron port with the mac address when it is known.
|
||||
|
||||
Send Neutron DNS name back to Nova
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
I briefly considered a design where instead of returning an error to Nova,
|
||||
Neutron would accept whatever Nova passed as the hostname. If it failed
|
||||
validation then Neutron would fall back to its old behavior and generate a DNS
|
||||
name based on the IP address. This IP address would've been fed back to Nova
|
||||
through the existing port status notifications that Neutron already sends back
|
||||
to Nova. It would then be written in to the Nova database so that it can be
|
||||
shown to the user.
|
||||
|
||||
Feedback from the community told me that this would create a poor user
|
||||
experience because the system would be making a decision to ignore user input
|
||||
without a good mechanism for communicating that back to the user.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
This will provide a better user experience overall. With the hostname being
|
||||
fed to Neutron, it will be available in the DNS in Neutron and optionally -- in
|
||||
the future -- in DNSaaS externally, as specified in [#]_. This improves the
|
||||
integration of these services from the user's point of view.
|
||||
|
||||
.. [#] https://review.openstack.org/#/c/88624/
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
If the Nova upgrade is deployed before the corresponding Neutron upgrade then
|
||||
there will be a period of time where Nova will make two calls to Neutron for
|
||||
every port create. The first call will fail and then Nova will make a second
|
||||
call without the *dns_name* field which will be expected to pass like before.
|
||||
|
||||
To avoid undue performance impact in situations where the Nova upgrade is
|
||||
deployed but Neutron is not upgraded for a significant period of time, a
|
||||
configuration option will be implemented to enable or disable the behavior
|
||||
described in the previous paragraph. The default value will be disabled.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
This change was carefully designed to allow new Nova and Neutron code to be
|
||||
deployed independently. The new feature will be available when both upgrades
|
||||
are complete.
|
||||
|
||||
DNS names will only be passed for new instances after this feature is enabled.
|
||||
Nova will begin passing dns_name to Neutron after an upgrade only for new
|
||||
instances.
|
||||
|
||||
If Neutron is upgraded before Nova, there is no problem because the dns_name
|
||||
field is not required and behavior defaults to old behavior.
|
||||
|
||||
If Nova is upgraded before Neutron then Nova will see errors from the
|
||||
Neutron API when it tries passing the dns_name field. Once again, Nova
|
||||
should recognize this error and retry the operation without the dns_name.
|
||||
|
||||
The deployer should be aware of the `Performance Impact`_ discussed.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
`miguel-lavalle <https://launchpad.net/~minsel>`_
|
||||
|
||||
Other contributors:
|
||||
`zack-feldstein <https://launchpad.net/~zack-feldstein>`_
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
#. Modify existing proposal to pass hostname using *dns_name* field rather
|
||||
than *host*.
|
||||
#. Handle expected errors by retrying without dns_name set.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
In order for this to work end to end, the corresponding changes in Neutron
|
||||
merged during the Liberty cycle.
|
||||
|
||||
https://blueprints.launchpad.net/neutron/+spec/internal-dns-resolution
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Tempest tests should be added or modified for the following use cases
|
||||
|
||||
- An instance created using the nova API can be looked up using the instance
|
||||
name.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Mention in the documentation that instance names will be used for DNS. Be
|
||||
clear that it will be the Nova *hostname* that will be used. Also, detail the
|
||||
scenarios where instance creation will fail.
|
||||
|
||||
#. It will only fail when DNS has been enabled for the Neutron network by
|
||||
associating a domain other than openstack.local.
|
||||
#. An invalid DNS label was given.
|
||||
#. Duplicate names were found on the same domain.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
None
|
||||
202
specs/mitaka/implemented/libvirt-aio-mode.rst
Normal file
202
specs/mitaka/implemented/libvirt-aio-mode.rst
Normal file
@@ -0,0 +1,202 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================================
|
||||
Libvirt: AIO mode for disk devices
|
||||
==========================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/libvirt-aio-mode
|
||||
|
||||
Libvirt and qemu provide two different modes for asynchronous IO (AIO),
|
||||
"native" and "threads". Right now nova always uses the default thread mode.
|
||||
Depending on the disk type that is used for backing guest disks,
|
||||
it can be beneficial to use the native IO mode instead.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Storage devices that are presented to instances can be backed by a variety
|
||||
of different storage backends. The storage device can be an image residing
|
||||
in the file system of the hypervisor, it can be a block device which
|
||||
is passed to the guest or it can be provided via a network. Images can have
|
||||
different formats (raw, qcow2 etc.) and block devices can be backed by
|
||||
different hardware (ceph, iSCSI, fibre channel etc.).
|
||||
|
||||
These different image formats and block devices require different settings
|
||||
in the hypervisor for optimizing IO performance. Libvirt/qemu offers a
|
||||
configurable asynchronous IO mode which increases performance when it
|
||||
is set correctly for the underlying image/block device type.
|
||||
|
||||
Right now nova sticks with the default setting, using userspace threads
|
||||
for asynchronous IO.
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
A deployer or operator wants to make sure that the users get the best
|
||||
possible IO performance based on the hardware and software stack that is
|
||||
used.
|
||||
|
||||
Users may have workloads that depend on optimal disk performance.
|
||||
|
||||
Both users and deployers would prefer that the nova libvirt driver
|
||||
automatically picks the asynchronous IO mode that best fits the
|
||||
underlying hardware and software.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
The goal is to enhance the nova libvirt driver to let it choose the disk
|
||||
IO mode based on the knowledge it already has about the device in use.
|
||||
|
||||
For cinder volumes, different LibvirtVolumeDriver implementations exist
|
||||
for the different storage types. A new interface will be added to let
|
||||
the respective LibvirtVolumeDriver choose the AIO mode.
|
||||
|
||||
For ephemeral storage, the XML is generated by LibvirtConfigGuestDisk,
|
||||
which also allows to distinguish between file, block and network
|
||||
attachment of the guest disk.
|
||||
|
||||
Restrictions on when to use native AIO mode
|
||||
-------------------------------------------
|
||||
|
||||
* Native AIO mode will not be enabled for sparse images as it can cause
|
||||
Qemu threads to be blocked when filesystem metadata need to be updated.
|
||||
This issue is much more unlikely to appear when using preallocated
|
||||
images. For the full discussion, see the IRC log in `[4]`_.
|
||||
* AIO mode has no effect if using the in-qemu network clients (any disks
|
||||
that use <disk type='network'>). It is only relevant if using the
|
||||
in-kernel network drivers (source: danpb)
|
||||
|
||||
In the scenarios above, the default AIO mode (threads) will be used.
|
||||
|
||||
Cases where AIO mode is beneficial
|
||||
----------------------------------
|
||||
|
||||
* Raw images and pre-allocated images in qcow2 format
|
||||
* Cinder volumes that are located on iSCSI, NFS or FC devices.
|
||||
* Quobyte (reported by Silvan Kaiser)
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
An alternative implementation would be to let the user specify the AIO mode
|
||||
for disks, similar to the current configurable caching mode which allows to
|
||||
distinguish between file and block devices. However, the AIO mode that
|
||||
fits best for a given storage type does not depend on the workload
|
||||
running in the guest, and it would be beneficial not to bother the operator
|
||||
with additional configuration parameters.
|
||||
|
||||
Another option would be to stick with the current approach - using the
|
||||
libvirt/qemu defaults. As there is no single AIO mode that fits best for
|
||||
all storage types, this would leave many users with inefficient settings.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
No changes to the data model are expected, code changes would only impact the
|
||||
libvirt/qemu driver and persistent data are not affected.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
IO performance for instances that run on backends which allow to exploit
|
||||
native IO mode will be improved. No adverse effect on other components.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
alexs-h
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Collect performance data for comparing AIO modes on different storage types
|
||||
* Implement AIO mode selection for cinder volumes
|
||||
* Implement AIO mode selection for ephemeral storage
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Unit tests will be provided that verify the libvirt XML changes generated
|
||||
by this feature.
|
||||
|
||||
Also, CI systems that run libvirt/qemu would use the new AIO mode
|
||||
configuration automatically.
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Wiki pages that cover IO configuration with libvirt/qemu as a hypervsior
|
||||
should be updated.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* _`[1]` General overview on AIO:
|
||||
http://www.ibm.com/developerworks/library/l-async/
|
||||
|
||||
* _`[2]` Best practices: Asynchronous I/O model for KVM guests
|
||||
https://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaat/liaatbpkvmasynchio.htm
|
||||
|
||||
* _`[3]` Libvirt and QEMU Performance Tweaks for KVM Guests
|
||||
"http://wiki.mikejung.biz/KVM/_Xen#AIO_Modes"
|
||||
|
||||
* _`[4]` qemu irc log
|
||||
http://paste.openstack.org/show/480498/
|
||||
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
None
|
||||
@@ -0,0 +1,329 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
======================================
|
||||
Libvirt hardware policy from libosinfo
|
||||
======================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/libvirt-hardware-policy-from-libosinfo
|
||||
|
||||
When launching an instance Nova needs to make decisions about how to configure
|
||||
the virtual hardware. Currently these decisions are often hardcoded, or driven
|
||||
by nova.conf settings, and sometimes by Glance image properties. The goal of
|
||||
this feature is to allow the user to specify the guest OS type and then drive
|
||||
decisions from this fact, using the libosinfo database.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
When launching an instance Nova needs to make decisions about how to configure
|
||||
the virtual hardware in order to optimize operation of the guest OS. The right
|
||||
decision inevitably varies depending on the type of operating system being
|
||||
run. The right decision for a Linux guest, might be the wrong decision for a
|
||||
Windows guest or vica-verca. The most important example is the choice of the
|
||||
disk and network device models. All Linux guests want to use virtio, since it
|
||||
offers by far the best performance, but this is not available out of the box in
|
||||
Windows images so is a poor default for them. A second example is whether the
|
||||
BIOS clock is initialized with UTC (preferred by UNIX) or localtime (preferred
|
||||
by Windows). Related to the clock are various timer policy settings which
|
||||
control behaviour when the hypervisor cannot keep up with the required
|
||||
interrupt injection rate. The Nova defaults work for Linux and Windows, but
|
||||
are not suitable for some other proprietary operating systems.
|
||||
|
||||
While it is possible to continue to allow overrides of config via glance
|
||||
image properties this is not an particularly appealing approach. A number of
|
||||
the settings are pretty low level and so not the kind of thing that a cloud
|
||||
application should directly expose to users. The more hypervisor specific
|
||||
settings are placed on a glance image, the harder it is for one image to be
|
||||
used to boot VMs across multiple different hypervisors. It also creates a
|
||||
burden on the user to remember a long list of settings they must place on the
|
||||
images to obtain optimal operation.
|
||||
|
||||
Historically most virtualization applications have gone down the route of
|
||||
creating a database of hardware defaults for each operating system. Typically
|
||||
though, each project has tried to reinvent the wheel each time duplicating
|
||||
each others work leading to a plethora of incomplete & inconsistent databases.
|
||||
The libosinfo project started as an attempt to provide a common solution for
|
||||
virtualization applications to use when configuring virtual machines. It
|
||||
provides a user extendible database of information about operating systems,
|
||||
including facts such as the supported device types, minimum resource level
|
||||
requirements, installation media and more. Around this database is a C API for
|
||||
querying information, made accessible to non-C languages (including python) via
|
||||
the magic of GObject Introspection. This is in use by the virt-manager and
|
||||
GNOME Boxes applications for configuring KVM and Xen guests and is easily
|
||||
consumable from Nova's libvirt driver.
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
The core goal is to make it simpler for an end user to boot a disk image with
|
||||
the optimal virtual hardware configuration for the guest operating system.
|
||||
|
||||
Consider that Nova is configured to use virtio disk & network devices by
|
||||
default, so optimize performance for the common Linux guests. In modern
|
||||
Linux though, there is the option of using a better virtio SCSI driver.
|
||||
Currently the user has to set properties like
|
||||
|
||||
# glance image-update \
|
||||
--property hw_disk_bus=scsi \
|
||||
--property hw_scsi_model=virtio-scsi \
|
||||
...other properties...
|
||||
name-of-my-fedora21-image
|
||||
|
||||
There's a similar issue if the user wants to run guests which do not
|
||||
support virtio drivers at all:
|
||||
|
||||
# glance image-update \
|
||||
--property hw_disk_bus=ide \
|
||||
--property hw_nic_model=e1000 \
|
||||
...other properties...
|
||||
name-of-my-windows-xp-image
|
||||
|
||||
We also wish to support per-OS timer drift policy settings and do not
|
||||
wish to expose them as properties, since it would be even more onerous
|
||||
on the user. eg
|
||||
|
||||
# glance image-update \
|
||||
--property hw_rtc_policy=catchup \
|
||||
--property hw_pit_policy=delay \
|
||||
...other properties...
|
||||
name-of-my-random-os-image
|
||||
|
||||
With this feature, in the common case it will be sufficient to just inform
|
||||
Nova of the operating system name
|
||||
|
||||
# glance image-update \
|
||||
--property os_name=fedora21 \
|
||||
name-of-my-fedora-image
|
||||
|
||||
Project Priority
|
||||
-----------------
|
||||
|
||||
None.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
There is an existing 'os_type' glance property that can be used to indicate
|
||||
the overall operating system family (windows vs linux vs freebsd). This is too
|
||||
coarse to be able to correctly configure all the different versions of these
|
||||
operating systems. ie the right settings for Windows XP are not the same as the
|
||||
right settings for Windows 2008. The intention is to declare support for a
|
||||
new standard property 'os_name'. The acceptable values for this property will
|
||||
be taken from the libosinfo database, either of these attributes:
|
||||
|
||||
* 'short-id' - the short name of the OS
|
||||
eg fedora21, winxp, freebsd9.3
|
||||
|
||||
* 'id' - the unique URI identifier of the OS
|
||||
eg http://fedoraproject.org/fedora/21, http://microsoft.com/win/xp,
|
||||
http://freebsd.org/freebsd/9.3
|
||||
|
||||
For example the user can set one of:
|
||||
|
||||
'''
|
||||
|
||||
# glance image-update \
|
||||
--property os_name=fedora21 \
|
||||
name-of-my-fedora-image
|
||||
|
||||
# glance image-update \
|
||||
--property os_name=http://fedoraproject.org/fedora/21 \
|
||||
name-of-my-fedora-image
|
||||
|
||||
When building the guest configuration, the Nova libvirt driver will look
|
||||
for this 'os_name' property and query the libosinfo database to locate
|
||||
the operating system records. It will then use this to choose the default
|
||||
disk bus and network model. If available it will also lookup clock and
|
||||
timer settings, but this requires further development in libosinfo before
|
||||
it can be used.
|
||||
|
||||
In the case that libosinfo is not installed on the compute host, the
|
||||
current Nova libvirt driver functionality will be unchanged.
|
||||
|
||||
It may be desirable to add a new nova.conf setting in the '[libvirt]'
|
||||
section to turn on/off the use of libosinfo for hardware configuration.
|
||||
This would make it easier for the cloud admin to control behaviour
|
||||
without having to change which RPMs/packages are installed. eg
|
||||
|
||||
'''
|
||||
[libvirt]
|
||||
hardware_config=default|fixed|libosinfo
|
||||
|
||||
Where
|
||||
|
||||
* default - try to use libosinfo, otherwise fallback to fixed defaults
|
||||
* fixed - always use fixed defaults even if libosinfo is installed
|
||||
* libosinfo - always use libosinfo and abort if not installed
|
||||
|
||||
In the future it might be possible to automatically detect what operating
|
||||
system is present inside a disk image using libguestfs. This would remove
|
||||
the need to even set the 'os_name' image property, and thus allow people to
|
||||
obtain optimal guest performance out of the box with no special config tasks
|
||||
required. Such auto-detection is out of scope for this blueprint though.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
A 1st alternative would be for Nova to maintain its own database of preferred
|
||||
hardware settings for each operating system. This is the trap most previous
|
||||
virtualization applications have fallen into. This has a significant burden
|
||||
because of the huge variety of operating systems in existence. It is
|
||||
undesirable to attempt to try to reinvent the libosinfo wheel which is already
|
||||
mostly round in shape.
|
||||
|
||||
An 2nd alternative would be for Nova to expose glance image properties for
|
||||
every single virtual hardware configuration aspect that needs to vary per
|
||||
guest operating system type. This would mean the user is required to have a
|
||||
lot of knowledge about low level hardware configuration which goes against
|
||||
the general cloud paradigm. It is also a significant burden to remember to
|
||||
set so many values.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
There will be no database schema changes.
|
||||
|
||||
There will be a new standard glance image property defined which will be stored
|
||||
in the existing database tables, and should be considered a long term supported
|
||||
setting.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
There are no API changes required. The existing glance image property support
|
||||
is sufficient to achieve the goals of this blueprint.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
Since this is simply about tuning the choice of virtual hardware settings there
|
||||
should not be any impact on security of the host / cloud system.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
No change.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
The end user will need to know about the 'os_name' glance property and the list
|
||||
of permissible values, as defined by the libosinfo project. This is primarily a
|
||||
documentation task.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Broadly speaking there should be no performance impact on the operation of the
|
||||
OpenStack services themselves. Some choices of guest hardware, however, might
|
||||
impose extra CPU overhead on the hypervisors. Since users already have the
|
||||
ability to choose different disk/net models directly, this potential
|
||||
performance impact is not a new (or significant) concern. It falls under the
|
||||
general problem space of achieving strong separation between guest virtual
|
||||
machines via resource utilization limits.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
There is likely to be a new configuration option in the nova.conf file under
|
||||
the libvirt group. Most deployers can ignore this and leave it on its default
|
||||
value which should just "do the right thing" in normal operation. It is there
|
||||
as a override to force a specific usage policy.
|
||||
|
||||
Deployers may wish to install the libosinfo library on their compute nodes, in
|
||||
order to allow Nova libvirt driver to use this new feature. If they do not
|
||||
install the libosinfo library, operation of Nova will be unchanged vs previous
|
||||
releases. Installation can be done with the normal distribution package
|
||||
management tools. It is expected that OpenStack specific provisioning tools
|
||||
will eventually choose to automate this during cloud deployment.
|
||||
|
||||
In the case of private cloud deployments, the cloud administrator may wish to
|
||||
provide additional libosinfo database configuration files, to optimize any
|
||||
custom operating systems their organization uses.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
Maintainers of other virtualization drivers may wish to engage with the
|
||||
libosinfo project to collaborate on extending its database to be suitable for
|
||||
use with more virtualization technologies beyond KVM and Xen. This would
|
||||
potentially enable its use with other virt drivers within Nova. It is none the
|
||||
less expected that the non-libvirt virt drivers will simply ignore this new
|
||||
feature in the short-to-medium term at least.
|
||||
|
||||
The new 'os_name' property might be useful for VMWare which has a mechanism for
|
||||
telling the VMWare hypervisor what guest operating system is installed in a VM.
|
||||
This would entail defining some mapping between libosinfo values and the VMWare
|
||||
required values, which is a fairly straightforward task.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
vladikr
|
||||
|
||||
Other contributors:
|
||||
berrange
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Integrate with libosinfo for setup of default disk/network device
|
||||
models in the Nova libvirt driver
|
||||
|
||||
* Extend devstack to install the libosinfo & object introspection packages
|
||||
|
||||
* Work with libosinfo community to define metadata for clock and timer
|
||||
preferences per OS type
|
||||
|
||||
* Extend Nova libvirt driver to configure clock/timer base on libosinfo
|
||||
database
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
The Nova libvirt driver will gain an optional dependency on the libosinfo
|
||||
project/library. This will be accessed by the GObject introspection Python
|
||||
bindings. On Fedora / RHEL systems this will entail installation of the
|
||||
'libosinfo' packages and either the 'pyobject2' or 'python3-gobject' packages
|
||||
(yes, both Python 2 and 3 are supported). Other modern Linux distros also
|
||||
have these packages commonly available.
|
||||
|
||||
Note that although the GObject Introspection framework was developed under the
|
||||
umbrella of the GNOME project, it does not have any direct requirements for the
|
||||
graphical desktop infrastructure. It is part of their low level gobject library
|
||||
which is a reusable component leveraged by many non-desktop related projects
|
||||
now.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
The unit tests will of course cover the new code.
|
||||
|
||||
To test in Tempest would need a gate job which has the suitable packages
|
||||
installed. This can be achieved by updating devstack to install the necessary
|
||||
bits. Some new tests would need to be created to set the new glance image
|
||||
property and then verify that the guest virtual machine has received the
|
||||
expected configuration changes.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
The new glance image property will need to be documented. It is also likely
|
||||
that we will want to document the list of valid values for this property.
|
||||
Alternatively document how the user can go about learning the valid values
|
||||
defined by libosinfo.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* http://libosinfo.org
|
||||
* https://wiki.gnome.org/action/show/Projects/GObjectIntrospection
|
||||
* https://live.gnome.org/PyGObject
|
||||
352
specs/mitaka/implemented/libvirt-real-time.rst
Normal file
352
specs/mitaka/implemented/libvirt-real-time.rst
Normal file
@@ -0,0 +1,352 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===========================
|
||||
Libvirt real time instances
|
||||
===========================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/libvirt-real-time
|
||||
|
||||
The CPU pinning feature added to the ability to assign guest virtual CPUs
|
||||
to dedicated host CPUs, providing guarantees for CPU time and improved worst
|
||||
case latency for CPU scheduling. The real time feature builds on that work
|
||||
to provide stronger guarantees for worst case scheduler latency for vCPUs.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The CPU pinning feature allowed guest vCPUs to be given dedicated access to
|
||||
individual host pCPUs. This means virtual instances will no longer suffer
|
||||
from "steal time" where their vCPU is pre-empted in order to run a vCPU
|
||||
belonging to another guest. Removing overcommit eliminates the high level
|
||||
cause of guest vCPU starvation, but guest vCPUs are still susceptible to
|
||||
latency spikes from various areas in the kernel.
|
||||
|
||||
For example, there are various kernel tasks that run on host CPUs, such as
|
||||
interrupt processing that can preempt guest vCPUs. QEMU itself has a number
|
||||
of sources of latency, due to its big global mutex. Various device models
|
||||
have sub-optimal characteristics that will cause latency spikes in QEMU,
|
||||
as may underling host hardware. Avoiding these problems requires that the
|
||||
host kernel and operating system be configured in a particular manner, as
|
||||
well as the careful choice of which QEMU features to exercise. It also
|
||||
requires that suitable scheduler policies are configured for the guest
|
||||
vCPUs.
|
||||
|
||||
Assigning huge pages to a guest ensures that guest RAM cannot be swapped out
|
||||
on the host, but there are still other arbitrary memory allocations for the
|
||||
QEMU emulator. If parts of QEMU get swapped out to disk, then can have an
|
||||
impact on the performance of the realtime guest.
|
||||
|
||||
Enabling realtime is not without cost. In order to meet the strict worst
|
||||
case requirements for CPU latency, overall throughput of the system must
|
||||
necessarily be compromised. As such it is not reasonable to have the
|
||||
real time feature unconditionally enabled for an OpenStack deployment.
|
||||
It must be an opt-in that is used only in the case where the guest workload
|
||||
actually demands it.
|
||||
|
||||
As an indication of the benefits and tradeoffs of realtime, it is useful
|
||||
to consider some real performance numbers. With bare metal and dedicated
|
||||
CPUs but non-realtime scheduler policy, worst case latency is on the order
|
||||
of 150 microseconds, and mean latency is approx 2 microseconds. With KVM
|
||||
and dedicated CPUs and a realtime scheduler policy, worst case latency
|
||||
is 14 microseconds, and mean latency is < 10 microseconds. This shows
|
||||
that while realtime brings significant benefits in worst case latency,
|
||||
the mean latency is still significantly higher than that achieved on
|
||||
bare metal with non-realtime policy. This serves to re-inforce the point
|
||||
that realtime is not something to unconditionally use, it is only
|
||||
suitable for specific workloads that require latency guarantees. Many
|
||||
apps will find dedicated CPUs alone to be sufficient for their needs.
|
||||
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
Tenants who wish to run workloads where CPU execution latency is important
|
||||
need to have the guarantees offered by a real time KVM guest configuration.
|
||||
The NFV appliances commonly deployed by members of the telco community are
|
||||
one such use case, but there are plenty of other potential users. For example,
|
||||
stock market trading applications greatly care about scheduling latency, as
|
||||
may scientific processing workloads.
|
||||
|
||||
It is expected that this feature would predominently be used in private
|
||||
cloud deployments. As well as real-time compute guarantees, tenants will
|
||||
usually need corresponding guarantees in the network layer between the
|
||||
cloud and the service/system it is communicating with. Such networking
|
||||
guarantees are largely impractical to achieve when using remote public
|
||||
clouds across the internet.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
The intention is to build on the previous work done to enable use of NUMA
|
||||
node placement policy, dedicated CPU pinning and huge page backed guest
|
||||
RAM.
|
||||
|
||||
The primary requirement is to have a mechanism to indicate whether realtime
|
||||
must be enabled for an instance. Since real time has strict pre-requisites
|
||||
in terms of host OS setup, the cloud administrator will usually not wish
|
||||
to allow arbitrary use of this feature. Realtime workloads are likely to
|
||||
comprise a subset of the overall cloud usage, so it is anticipated that
|
||||
there will be a mixture of compute hosts, only some of which provide a
|
||||
realtime capability.
|
||||
|
||||
For this reason, an administrator will need to make use of host aggregates
|
||||
to partition their compute hosts into those which support real time and
|
||||
those which do not.
|
||||
|
||||
There will then need to be a property available on the flavor
|
||||
|
||||
* hw:cpu_realtime=yes|no
|
||||
|
||||
which will indicate whether instances booted with that flavor will be
|
||||
run with a realtime policy. Flavors with this property set to 'yes'
|
||||
will need to be associated with the host aggregate that contains hosts
|
||||
supporting realtime.
|
||||
|
||||
A pre-requisite for enabling the realtime feature on a flavor is that
|
||||
it must also have 'hw:cpu_policy' is set to 'dedicated'. ie all real
|
||||
time guests must have exclusive pCPUs assigned to them. You cannot give
|
||||
a real time policy to vCPUs that are susceptible to overcommit, as that
|
||||
would lead to starvation of the other guests on that pCPU, as well as
|
||||
degrading the latency guarantees.
|
||||
|
||||
The precise actions that a hypervisor driver takes to configure a guest
|
||||
when real time is enabled are implementation defined. Different hypevisors
|
||||
will have different configuration steps, but the commonality is that all
|
||||
of them will be providing vCPUs with an improved worst case latency
|
||||
guarantee, as compared to non-realtime instances. The tenant user does
|
||||
not need to know the details of how the requirements are met, merely
|
||||
that the cloud can support the necessary latency guarantees.
|
||||
|
||||
In the case of the libvirt driver with the KVM hypervisor, it is expected
|
||||
that setting the real time flavor will result in the following guest
|
||||
configuration changes
|
||||
|
||||
* Entire QEMU and guest RAM will be locked into memory
|
||||
* All vCPUs will be given a fixed realtime scheduler priority
|
||||
|
||||
As well as the vCPU workload, most hypervisors have one or more other
|
||||
threads running in the control plane which do work on behalf of the
|
||||
virtual machine. Most hypervisors hide this detail from users, but
|
||||
the QEMU/KVM hypervispor exposes it via the concept of emulator
|
||||
threads. With the initial support for dedicated CPUs, Nova was set
|
||||
to confine the emulator threads to run on the same set of pCPUs
|
||||
that the guest's vCPUs are placed. This is highly undesirable in
|
||||
the case of realtime, because these emulator threads will be
|
||||
doing work that can impact latency guarantees. There is thus a
|
||||
need to place emulator threads in a more fine precise fashion.
|
||||
|
||||
Most guest OS will run with multiple vCPUs and have at least one of
|
||||
their vCPUs dedicated to running non-realtime house keeping tasks.
|
||||
Given this, the intention is that the emulator threads be co-located
|
||||
with the vCPU that is running non-realtime tasks. This will in turn
|
||||
require another tunable, which can be set either on the flavor, or
|
||||
on the image. This will indicate which vCPUs will have realtime policy
|
||||
enabled:
|
||||
|
||||
* hw:cpu_realtime_mask=^0-1
|
||||
|
||||
This indicates that all vCPUs, except vCPUs 0 and 1 will have
|
||||
a realtime policy. ie vCPUs 0 and 1 will remain non-realtime.
|
||||
The vCPUs which have a non-realtime policy will also be used to
|
||||
run the emulator thread(s). At least one vCPU must be reserved
|
||||
for non-realtime workloads, it is an error to configure all
|
||||
vCPUs to be realtime. If the property is not set, then the
|
||||
default behaviour will be to reserve vCPU 0 for non-realtime
|
||||
tasks. This property will be overridable on the image too via
|
||||
the hw_cpu_realtime_mask property.
|
||||
|
||||
In the future it may be desirable to allow emulator threads to
|
||||
be run on a host pCPU that is completely separate from those
|
||||
running the vCPUs. This would, for example, allow for running
|
||||
of guest OS, where all vCPUs must be real-time capable, and so
|
||||
cannot reserve a vCPU for real-time tasks. This would require
|
||||
the scheduler to treat the emulator threads as essentially being
|
||||
a virtual CPU in their own right. Such an enhancement is considered
|
||||
out of scope for this blueprint in order to remove any dependency
|
||||
on scheduler modifications. It will be dealt with in a new blueprint
|
||||
|
||||
* https://blueprints.launchpad.net/nova/+spec/libvirt-emulator-threads-policy
|
||||
|
||||
A significant portion of the work required will be documenting the
|
||||
required compute host and guest OS setup, as much of this cannot be
|
||||
automatically performed by Nova itself. It is anticipated that the
|
||||
developers of various OpenStack deployment tools will use the
|
||||
documentation to extend their tools to be able to deploy realtime
|
||||
enabled compute hosts. This is out of scope of this blueprint,
|
||||
however, which will merely document the core requirements. Tenants
|
||||
building disk images will also need to consume this documentation
|
||||
to determine how to configure their guest OS.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
One option would be to always enable a real time scheduler policy when the
|
||||
guest is using dedicated CPU pinning and always enable memory locking when
|
||||
the guest has huge pages. As explained in the problem description, this is
|
||||
highly undesirable as an approach. The real time guarantees are only achieved
|
||||
by reducing the overall throughput of the system. So unconditionally enabling
|
||||
realtime for hosts / guests which do not require it would significantly waste
|
||||
potential compute resources. As a result it is considered mandatory to have
|
||||
an opt-in mechanism for enabling real time.
|
||||
|
||||
Do nothing is always an option. In the event of doing nothing, guests would
|
||||
have to put up with the latencies inherent in non-real time scheduling, even
|
||||
with dedicated pCPUs. Some of those latencies could be further mitigated by
|
||||
careful host OS configuration, but extensive performance testing as shown that
|
||||
even with carefully configured host and dedicated CPUs, worst case latencies
|
||||
for a non-realtime task will be at least a factor of x10 worse than when
|
||||
realtime is enabled. Thus not supporting realtime guests within OpenStack
|
||||
will exclude Nova from use in a variety of scenarios, forcing users to
|
||||
deployment alternative non-openstack solutions, or requiring openstack
|
||||
vendors to fork the code and ship their own custom realtime solutions. Neither
|
||||
of these are attractive options for OpenStack users or vendors in the long
|
||||
term, as it would either loose user share, or balkanize the openstack
|
||||
ecosystem.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None required
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None required
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
The enablement of real time will only affect the pCPUs that are assigned to
|
||||
the guest. Thus if the tenant is already permitted to use dedicated pCPUs
|
||||
by the operator, enabling real time does not imply any further privileges.
|
||||
Thus real time is not considered to introduce any new security concerns.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
The tenant will have the ability to request real time via an image property.
|
||||
They will need to carefully build their guest OS images to take advantage
|
||||
of the realtime characteristics. They will to obtain information from their
|
||||
cloud provider as to the worst case latencies their deployment is capable
|
||||
of satisfying, to ensure that it can achieve the requirements of their
|
||||
workloads.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
There will be no new performance impact to Nova as a whole. This is building
|
||||
on the existing CPU pinning and huge pages features, so the scheduler logic is
|
||||
already in place. Likewise the impact on the host is restricted to pCPUs which
|
||||
are already assigned to a guest.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
The operator will have the ability to define real time flavors by setting a
|
||||
flavor extra spec property.
|
||||
|
||||
The operator will likely wish to make use of host aggregates to assign a
|
||||
certain set of compute nodes for use in combination with huge pages and CPU
|
||||
pinning. This is a pre-existing impact from those features, and real time does
|
||||
not alter that.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
Other virt drivers may wish to support the flavor/image properties for
|
||||
enabling real time scheduling of their instances, if their hypervisor has
|
||||
such a feature.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
sahid
|
||||
|
||||
Other contributors:
|
||||
berrange
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
The primary work items are
|
||||
|
||||
* Add the 'hw_cpu_realtime_mask' field to the ImageMetaProps object
|
||||
* Update the libvirt guest XML configuration when the real time flavor or
|
||||
image properties are present
|
||||
* Update the Nova deployment documentation to outline what host OS setup
|
||||
steps are required in order to make best use of the real time feature
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
* The libvirt project needs to add support for the XML feature to enable
|
||||
real time scheduler priority for guests. Merged as of 1.2.13
|
||||
* The KVM/kernel project needs to produce recommendations for optimal
|
||||
host OS setup. Partially done - see KVM Forum talks. Collaboration
|
||||
will be ongoing during development to produce Nova documentation.
|
||||
|
||||
If the libvirt emulator threads policy blueprint is implemented, then
|
||||
the restriction that real-time guests must be SMP can be lifted, to
|
||||
allow for UP realtime guests. This is not a strict pre-requisite
|
||||
though, merely a complementary piece of work to allow real-time to
|
||||
be used in a broader range of scenarios.
|
||||
|
||||
* https://blueprints.launchpad.net/nova/+spec/libvirt-emulator-threads-policy
|
||||
* https://review.openstack.org/225893
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
None of the current OpenStack community test harnesses check the performance
|
||||
characteristics of guests deployed by Nova, which is what would be needed to
|
||||
validate this feature.
|
||||
|
||||
The key functional testing requirement is around correct operation of
|
||||
the existing Nova CPU pinning and huge pages features and their
|
||||
scheduler integration. This is outside the scope of this particular
|
||||
blueprint.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
The deployment documentation will need to be updated to describe how to setup
|
||||
hosts and guests to take advantage of real time scheduler prioritization.
|
||||
Since this is requires very detailed knowledge of the system, it is expected
|
||||
that the feature developers will write the majority of the content for this
|
||||
documentataion, as the documentation team cannot be expected to learn the
|
||||
details required.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* KVM Forum 2015: Real-Time KVM (Rik van Riel)
|
||||
|
||||
* https://www.youtube.com/watch?v=cZ5aTHeDLDE
|
||||
* http://events.linuxfoundation.org/sites/events/files/slides/kvmforum2015-realtimekvm.pdf
|
||||
|
||||
* KVM Forum 2015: Real-Time KVM for the Masses (Jan Kiszka)
|
||||
|
||||
* https://www.youtube.com/watch?v=SyhfctYqjc8
|
||||
* http://events.linuxfoundation.org/sites/events/files/slides/KVM-Forum-2015-RT-OpenStack_0.pdf
|
||||
|
||||
* KVM Forum 2015: Realtime KVM (Paolo Bonzini)
|
||||
|
||||
* https://lwn.net/Articles/656807/
|
||||
|
||||
* Linux Kernel Realtime
|
||||
|
||||
* https://rt.wiki.kernel.org/index.php/Main_Page
|
||||
413
specs/mitaka/implemented/live-migration-progress-report.rst
Normal file
413
specs/mitaka/implemented/live-migration-progress-report.rst
Normal file
@@ -0,0 +1,413 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================================
|
||||
Report more live migration progress detail
|
||||
==========================================
|
||||
|
||||
Blueprint:
|
||||
https://blueprints.launchpad.net/nova/+spec/live-migration-progress-report
|
||||
|
||||
When live migrations take a long time, an operator might want to take some
|
||||
actions on it, such as pause the VM being migrated or cancel the live
|
||||
migrations operation, or do some performance optimization.
|
||||
All these actions will need based on the judgment of migration progress detail.
|
||||
|
||||
This spec proposes adding more progress detail report for live migration
|
||||
in os-migrations API.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Some busy enterprise workloads hosted on large sized VM, such as SAP ERP
|
||||
Systems, VMs running memory write intensive workloads, this may lead migration
|
||||
not converge.
|
||||
|
||||
Now nova can not report more details of migration statistics, such as how many
|
||||
data are transferred, how many data are remaining.
|
||||
Without those details, the operator may not decide how to take the next action
|
||||
on the migration.
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
* As an operator of an OpenStack cloud, I would like to know the detail of the
|
||||
migration, then I can pause/cancel or do some performance optimization.
|
||||
|
||||
* Some other projects, such as watcher project, want to make a strategy to
|
||||
optimize performance dynamically during live migration. The strategy depends
|
||||
on some details status of migration.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Extend os-migrations API. Some new fields will be added in migration DB
|
||||
and os-migrations API response.
|
||||
|
||||
The new fields will be updated to the migration object in
|
||||
live_migration_monitor method of the libvirt driver so that API call just
|
||||
needs to retrieve the object form db, traditionally API calls do not block
|
||||
while they send a request to the compute node and wait for a reply.
|
||||
|
||||
New fields:
|
||||
* memory_total: the total guest memory size.
|
||||
* memory_processed: the amount memory has been transferred.
|
||||
* memory_remaining: amount memory remaining to transfer.
|
||||
* disk_total: total disk size.
|
||||
* disk_processed: amount disk has been transferred.
|
||||
* disk_remaining: amount disk remaining to transfer.
|
||||
|
||||
Note, the migration is always unbounded job, memoryTotal may be less than the
|
||||
final sum of memoryProcessed + memoryRemaining in the event that the hypervisor
|
||||
has to repeat some memory, such as due to dirtied pages during migration.
|
||||
|
||||
The same is true of the disk numbers. And Disk fields will all be zero when not
|
||||
block migrating.
|
||||
|
||||
For cold migration, only the disk fields will be populated, for the drivers
|
||||
that doesn't expose migration detail, the memory and disk fields will be null.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Add a new API to report the migration status details.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
The `nova.objects.migration.Migration` object would have 6 new fields.
|
||||
|
||||
For the database schema, the following table constructs would suffice ::
|
||||
|
||||
CREATE TABLE migrations(
|
||||
`created_at` datetime DEFAULT NULL,
|
||||
`updated_at` datetime DEFAULT NULL,
|
||||
`deleted_at` datetime DEFAULT NULL,
|
||||
`id` int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
|
||||
`source_compute` varchar(255) DEFAULT NULL,
|
||||
`dest_compute` varchar(255) DEFAULT NULL,
|
||||
`dest_host` varchar(255) DEFAULT NULL,
|
||||
`status` varchar(255) DEFAULT NULL,
|
||||
`instance_uuid` varchar(36) DEFAULT NULL,
|
||||
`old_instance_type_id` int(11) DEFAULT NULL,
|
||||
`new_instance_type_id` int(11) DEFAULT NULL,
|
||||
`source_node` varchar(255) DEFAULT NULL,
|
||||
`dest_node` varchar(255) DEFAULT NULL,
|
||||
`deleted` int(11) DEFAULT NULL,
|
||||
`migration_type` enum('migration','resize','live-migration',
|
||||
'evacuation') DEFAULT NULL,
|
||||
`hidden` tinyint(1) DEFAULT NULL,
|
||||
`memory_total` bigint DEFAULT NULL,
|
||||
`memory_processed` bigint DEFAULT NULL,
|
||||
`memory_remaining` bigint DEFAULT NULL,
|
||||
`disk_total` bigint DEFAULT NULL,
|
||||
`disk_processed` bigint DEFAULT NULL,
|
||||
`disk_remaining` bigint DEFAULT NULL,
|
||||
index(`instance_uuid`),
|
||||
index(`deleted`)
|
||||
);
|
||||
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
* Extend migrations resource to get migrations statistics in a new
|
||||
microversion. Then user can get the progress details of live-migration.
|
||||
|
||||
* GET `GET /servers/{id}/migrations`
|
||||
|
||||
* JSON schema definition for new fields::
|
||||
|
||||
non_negative_integer_with_null = {
|
||||
'type': ['integer', 'null'],
|
||||
'minimum': 0
|
||||
}
|
||||
|
||||
{
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'migrations': {
|
||||
'type': 'array',
|
||||
'items': {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'memory_total': non_negative_integer_with_null,
|
||||
'memory_remaining': non_negative_integer_with_null,
|
||||
'disk_total': non_negative_integer_with_null,
|
||||
'disk_processed': non_negative_integer_with_null,
|
||||
'disk_remainning': non_negative_integer_with_null,
|
||||
..{all existing fields}...
|
||||
}
|
||||
'additionalProperties': False,
|
||||
'required': ['memory_total', 'memory_remaining', 'disk_total',
|
||||
'disk_processed', 'disk_remainning',
|
||||
..{all existing fields}...]
|
||||
}
|
||||
}
|
||||
},
|
||||
'additionalProperties': False,
|
||||
'required': ['migrations']
|
||||
}
|
||||
|
||||
* The example of response body::
|
||||
|
||||
{
|
||||
"migrations": [
|
||||
{
|
||||
"created_at": "2012-10-29T13:42:02.000000",
|
||||
"dest_compute": "compute2",
|
||||
"id": 1234,
|
||||
"server_uuid": "6ff1c9bf-09f7-4ce3-a56f-fb46745f3770",
|
||||
"new_flavor_id": 2,
|
||||
"old_flavor_id": 1,
|
||||
"source_compute": "compute1",
|
||||
"status": "running",
|
||||
"updated_at": "2012-10-29T13:42:02.000000",
|
||||
"memory_total": 1057024,
|
||||
"memory_processed": 3720,
|
||||
"memory_remaining": 1053304,
|
||||
"disk_total": 20971520,
|
||||
"disk_processed": 20880384,
|
||||
"disk_remaining": 91136,
|
||||
},
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
The old top-level resource `/os-migrations` won't be extended anymore, any
|
||||
new features will be go to the `/servers/{id}/migrations`. The old top-level
|
||||
resource `/os-migrations` just keeps for admin query, may replaced by
|
||||
`/servers/{id}/migrations` totally in the future. So we should add
|
||||
link in the old top-level resource `/os-migrations` for guiding people to
|
||||
get the new details of migration resource.
|
||||
|
||||
* Proposes adding new method to get each migration resource
|
||||
|
||||
* GET /servers/{id}/migrations/{id}
|
||||
|
||||
* Normal http response code: 200
|
||||
|
||||
* Expected error http response code
|
||||
|
||||
* 404: the specific in-progress migration can not found.
|
||||
|
||||
* JSON schema definition for the response body::
|
||||
|
||||
{
|
||||
'type': object,
|
||||
'properties': {
|
||||
...{all existing fields}...
|
||||
}
|
||||
'additionalProperties': False,
|
||||
'required': [...{all existing fields}...]
|
||||
}
|
||||
|
||||
* The example of response body::
|
||||
|
||||
{
|
||||
"created_at": "2012-10-29T13:42:02.000000",
|
||||
"dest_compute": "compute2",
|
||||
"id": 1234,
|
||||
"server_uuid": "6ff1c9bf-09f7-4ce3-a56f-fb46745f3770",
|
||||
"new_flavor_id": 2,
|
||||
"old_flavor_id": 1,
|
||||
"source_compute": "compute1",
|
||||
"status": "running",
|
||||
"updated_at": "2012-10-29T13:42:02.000000",
|
||||
"memory_total": 1057024,
|
||||
"memory_processed": 3720,
|
||||
"memory_remaining": 1053304,
|
||||
"disk_total": 20971520,
|
||||
"disk_processed": 20880384,
|
||||
"disk_remaining": 91136,
|
||||
}
|
||||
|
||||
* There is new policy will be added
|
||||
'os_compute_api:servers:migrations:show', and the default permission is
|
||||
admin only.
|
||||
|
||||
* Proposes adding ref link to the `/servers/{id}/migrations/{id}` for
|
||||
`/os-migrations`
|
||||
|
||||
* GET /os-migrations
|
||||
|
||||
* JSON schema definition for the response body::
|
||||
|
||||
{
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'migrations': {
|
||||
'type': 'array',
|
||||
'items': {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'links': {
|
||||
'type': 'array',
|
||||
'items': {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'href': {
|
||||
'type': 'string',
|
||||
'format': 'uri'
|
||||
},
|
||||
'rel': {
|
||||
'type': 'string',
|
||||
'enum': ['self', 'bookmark'],
|
||||
}
|
||||
}
|
||||
'additionalProperties': False,
|
||||
'required': ['href', 'ref']
|
||||
}
|
||||
},
|
||||
...
|
||||
},
|
||||
'additionalProperties': False,
|
||||
'required': ['links', ...]
|
||||
}
|
||||
}
|
||||
},
|
||||
'additionalProperties': False,
|
||||
'required': ['migrations']
|
||||
}
|
||||
|
||||
* The example of response body::
|
||||
|
||||
{
|
||||
"migrations": [
|
||||
{
|
||||
"created_at": "2012-10-29T13:42:02.000000",
|
||||
"dest_compute": "compute2",
|
||||
"dest_host": "1.2.3.4",
|
||||
"dest_node": "node2",
|
||||
"id": 1234,
|
||||
"instance_uuid": "instance_id_123",
|
||||
"new_instance_type_id": 2,
|
||||
"old_instance_type_id": 1,
|
||||
"source_compute": "compute1",
|
||||
"source_node": "node1",
|
||||
"status": "done",
|
||||
"updated_at": "2012-10-29T13:42:02.000000",
|
||||
"links": [
|
||||
{
|
||||
'href': "http://openstack.example.com/v2.1/openstack/servers/0e44cc9c-e052-415d-afbf-469b0d384170/migrations/1234",
|
||||
'ref': 'self'
|
||||
},
|
||||
{
|
||||
'href': "http://openstack.example.com/openstack/servers/0e44cc9c-e052-415d-afbf-469b0d384170/migrations/1234"
|
||||
'ref': 'bookmark'
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"created_at": "2013-10-22T13:42:02.000000",
|
||||
"dest_compute": "compute20",
|
||||
"dest_host": "5.6.7.8",
|
||||
"dest_node": "node20",
|
||||
"id": 5678,
|
||||
"instance_uuid": "instance_id_456",
|
||||
"new_instance_type_id": 6,
|
||||
"old_instance_type_id": 5,
|
||||
"source_compute": "compute10",
|
||||
"source_node": "node10",
|
||||
"status": "done",
|
||||
"updated_at": "2013-10-22T13:42:02.000000"
|
||||
"links": [
|
||||
{
|
||||
'href': "http://openstack.example.com/v2.1/openstack/servers/0e44cc9c-e052-415d-afbf-469b0d384170/migrations/5678",
|
||||
'ref': 'self'
|
||||
},
|
||||
{
|
||||
'href': "http://openstack.example.com/openstack/servers/0e44cc9c-e052-415d-afbf-469b0d384170/migrations/5678"
|
||||
'ref': 'bookmark'
|
||||
}
|
||||
]
|
||||
},
|
||||
]
|
||||
}
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
New python-novaclient command will be available, e.g.
|
||||
|
||||
nova server-migration-list <instance>
|
||||
nova server-migration-show <instance> <migration_id>
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
ShaoHe Feng <shaohe.feng@intel.com>
|
||||
|
||||
Other contributors:
|
||||
Yuntong Jin <yuntong.jin@intel.com>
|
||||
|
||||
Work Items
|
||||
----------
|
||||
* Add migration progress detail fields in DB.
|
||||
* Write migration progress detail fields to DB.
|
||||
* update the migration object in _live_migration_monitor method of the libvirt
|
||||
driver.
|
||||
* The API call to list os-migrations simply return data about the migration
|
||||
objects, i.e. what is in DB.
|
||||
* Implement new commands 'server-migration-list' and 'server-migration-show' to
|
||||
python-novaclient.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Unittest and funtional tests in Nova
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Doc the API change in the API Reference:
|
||||
http://developer.openstack.org/api-ref-compute-v2.1.html
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
os-migrations-v2.1:
|
||||
http://developer.openstack.org/api-ref-compute-v2.1.html#os-migrations-v2.1
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
Mitaka: Introduced
|
||||
285
specs/mitaka/implemented/making_live_migration_api_friendly.rst
Normal file
285
specs/mitaka/implemented/making_live_migration_api_friendly.rst
Normal file
@@ -0,0 +1,285 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
======================================
|
||||
Making the live-migration API friendly
|
||||
======================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/making-live-migration-api-friendly
|
||||
|
||||
The current live-migration API is difficult to use, so we need to make the API
|
||||
more user-friendly and external system friendly.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The current live-migration API requires the user to specify whether block
|
||||
migration should be used with `block_migration` flag. Block migration requires
|
||||
that the source and destination hosts aren't on a piece of shared storage.
|
||||
Live migration without block migration requires the source and destination
|
||||
hosts are on the same shared storage.
|
||||
|
||||
There are two problems for this flag:
|
||||
|
||||
* For external systems and cloud operators, it is hard to know which value
|
||||
should be used for specific destination host. Before the user specifies the
|
||||
value of `block_migration` flag, the user needs to figure out whether the
|
||||
source and destination host on the same shared storage.
|
||||
* When user passes the `host` flag with value None, the scheduler will choose a
|
||||
host for user. If the scheduler selects a destination host which is on the
|
||||
same shared storage with the source host, and user specifies
|
||||
`block_migration` as True, the request will fail. That means scheduler didn't
|
||||
know the topology of storage, so it can't select a reasonable host.
|
||||
|
||||
For the `host` flag, a value of None means the scheduler should choose a host.
|
||||
For ease of use, the 'host' flag can be optional.
|
||||
|
||||
The `disk_over_commit` flag is libvirt driver specific. If the value is True,
|
||||
libvirt virt driver will check the image's virtual size with disk usable size.
|
||||
If the value is False, libvirt virt driver will check the image's actual size
|
||||
with disk usable size. Nova API shouldn't expose any specific hypervisor
|
||||
detail. This flag confuses user as well, as normally the user only wants to use
|
||||
same policy of resource usage as scheduler already does.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
* API Users and external systems can use the live-migration API without
|
||||
having to manually determine the storage topology of the Nova deployment.
|
||||
* API Users should be able to have the scheduler select the destination host.
|
||||
* Users don't want to know whether disk overcommit is needed, Nova shoud just
|
||||
do the right thing.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Make the `block_migration` flag optional, with a default value of None. When
|
||||
the value is None, Nova will detect whether source and destination hosts on
|
||||
shared storage. If they are on shared storage, the live-migration won't do
|
||||
block migration. If they aren't on shared storage, the block migration will be
|
||||
executed.
|
||||
|
||||
Make the `host` flag optional, and the default value is None. The behaviour
|
||||
won't change.
|
||||
|
||||
Remove the `disk_over_commit` flag and remove the disk usage check from libvirt
|
||||
virt driver.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Ideally the Live-migration API will be improved continuously. For the flag
|
||||
`block_migration`, there are two opinions on this:
|
||||
|
||||
* When `block_migration` flag is False, the scheduler will choose a host
|
||||
which is on the shared storage with original host. When the value is True,
|
||||
the scheduler will choose a host which isn't on the shared storage with
|
||||
original host. This need some work for Nova to tracking the shared storage
|
||||
to make scheduler choice right host.
|
||||
* Remove `block_migration` flag totally, the API behaviour is always migrating
|
||||
instance in one storage pool, this is people's choice in most of cases.
|
||||
|
||||
Anyway the shared storage can be tracked when this BP is implemented:
|
||||
https://blueprints.launchpad.net/nova/+spec/resource-providers
|
||||
So that will be future work.
|
||||
|
||||
The logic for `disk_over_commit` does not match how the ResourceTracker does
|
||||
resource counting. Ideally we should have the ResourceTracker consume disk
|
||||
usage, that will be done by another bug fix or proposal.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
|
||||
The block_migration and host flag will be optional, disk_over_commit flag will
|
||||
be removed, the json-schema as below::
|
||||
|
||||
boolean = {
|
||||
'type': ['boolean', 'string', 'null'],
|
||||
'enum': [True, 'True', 'TRUE', 'true', '1', 'ON', 'On', 'on',
|
||||
'YES', 'Yes', 'yes',
|
||||
False, 'False', 'FALSE', 'false', '0', 'OFF', 'Off', 'off',
|
||||
'NO', 'No', 'no'],
|
||||
}
|
||||
|
||||
{
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'os-migrateLive': {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'block_migration': boolean,
|
||||
'host': host,
|
||||
},
|
||||
'additionalProperties': False,
|
||||
},
|
||||
},
|
||||
'required': ['os-migrateLive'],
|
||||
'additionalProperties': False,
|
||||
}
|
||||
|
||||
This change will need a new microversion, and the old version API will keep the
|
||||
same behaviour as before.
|
||||
|
||||
For upgrades, if the user specifies a host which is using an old version node
|
||||
with new API version, the API will return `HTTP BadRequest 400` when
|
||||
`block_migration` or `disk_over_commit` is None. If user didn't specify host
|
||||
and the old version node selected by host, the scheduler will retry to find
|
||||
another host until there is new compute node found or reach the max number of
|
||||
reties.
|
||||
|
||||
Currently the response body is empty. But user needs to know whether nova
|
||||
decided to do block migration. The response body was proposed::
|
||||
|
||||
{
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'block_migration': parameter_types.boolean,
|
||||
'host': host
|
||||
}
|
||||
'required': ['block_migration', 'host'],
|
||||
'additionalProperties': False
|
||||
}
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
User needn't figure out whether the destination host is on the same shared
|
||||
storage or not as the source host anymore before invoking the live-migration
|
||||
API. But this may cause a block migration which will incur more load on the
|
||||
live-migration network, which may be unexpected to the user. If user clearly
|
||||
didn't want to block-migration, user may set specify block_migration to False
|
||||
explicitly. This will be improved in the future.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
The new REST API version won't work for old compute nodes when doing a rolling
|
||||
upgrade. This is because `disk_over_commit` was removed, there isn't valid
|
||||
value provided from API anymore. User only can use old version live-migration
|
||||
API with old compute node.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
The detection of block_migration
|
||||
--------------------------------
|
||||
|
||||
For the virt driver interface, there are two interfaces to check if the
|
||||
destination and source hosts satisfy the migration conditions. They are
|
||||
`check_can_live_migrate_destination` and `check_can_live_migrate_source`. After
|
||||
the check, the virt driver will return `migrate_data` to nova conductor.
|
||||
|
||||
We proposal that when is made with `block_migration` set to None, those two
|
||||
driver interfaces will calculate out the new value for `block_migration` based
|
||||
on the shared storage checksimplemented in the virt driver. The new value of
|
||||
`block_migration` will be returned in the `migrate_data`.
|
||||
|
||||
Currently only three virt drivers implement live-migration. They are
|
||||
libvirt driver, xenapi driver, and hyperv driver:
|
||||
|
||||
For libvirt driver, it already implements the detection of shared storage. The
|
||||
result of the checks are in the dict `dest_check_data`, in values
|
||||
`is_shared_block_storage` and `is_shared_instance_path`. So when the
|
||||
`block_migration` is None, the driver will set `block_migration` to True if
|
||||
`is_shared_block_storage` or `is_shared_instance_path` is True. Otherwise the
|
||||
driver will set `block_migration` to False. Finally the new value of
|
||||
`block_migration` will be returned in `migrate_data`.
|
||||
|
||||
For xenapi driver, the shared storage check is based on aggregate. It is
|
||||
required that the destination host must be in the same aggregate /
|
||||
hypervisor_pool as the source host. So the `block_migration` will be True when
|
||||
the host in that aggregate. Otherwise the `block_migration` is False. Also pass
|
||||
the new value back with `migrate_data`.
|
||||
|
||||
For hyperv driver, although it supports the live-migration, but there isn't any
|
||||
code implementing the `block_migration` flag. So we won't implement it until
|
||||
hyperv support that flag.
|
||||
|
||||
Remove the check of disk_over_commit
|
||||
------------------------------------
|
||||
|
||||
The `disk_over_commit` flag still needs to work with older microversions. For
|
||||
this proposal, we add a None value when the request with a newer microversion.
|
||||
In the libvirt driver, if the value of `disk_over_commit` is None, the driver
|
||||
won't doing any disk usage check, otherwise the check will do the same thing as
|
||||
before.
|
||||
|
||||
The upgrade concern
|
||||
-------------------
|
||||
|
||||
This propose will add new value of `None` for `block_migration` and
|
||||
`disk_over_commit`. When openstack cluster is in the progress of rolling
|
||||
upgrade, the old version compute nodes don't know this new value. So
|
||||
there is a check added in the Compute RPC API. If client can't send the new
|
||||
version Compute RPC API, a fault will be returned.
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Alex Xu <hejie.xu@intel.com>
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Implement the value detection of `block_migration` in the libvirt and xenapi
|
||||
driver.
|
||||
* Implement skip the check of disk usage when the `disk_over_commit` value is
|
||||
None
|
||||
* Make `block_migration`, `host` flags optional, and remove `disk_over_commit`
|
||||
flag in the API.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Unit tests and functional tests in Nova
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Doc the API change in the API Reference:
|
||||
http://developer.openstack.org/api-ref-compute-v2.1.html
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
None
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
Mitaka: Introduced
|
||||
141
specs/mitaka/implemented/no-more-soft-delete.rst
Normal file
141
specs/mitaka/implemented/no-more-soft-delete.rst
Normal file
@@ -0,0 +1,141 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===================
|
||||
No more soft delete
|
||||
===================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/no-more-soft-delete
|
||||
|
||||
There was widespread agreement at the YVR summit not to soft-delete any more
|
||||
things. To codify this, we should remove the SoftDeleteMixin from NovaBase.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Soft deletion of rows imposes a management overhead to later delete or archive
|
||||
those rows. It has also proved less necessary than initially imagined. We would
|
||||
prefer additional soft-deletes were not added and so it does not make sense to
|
||||
automatically inherit the `SoftDeleteMixin` when inheriting from NovaBase.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
As an operator, adding new soft deleted things means I need to extend my
|
||||
manual cleanup to cover those things. If I don't, those tables will become
|
||||
very slow to query.
|
||||
|
||||
As a developer, I don't want to tempt operators to read soft-deleted rows
|
||||
directly. That risks turning the DB schema into an unofficial API.
|
||||
|
||||
As a developer/DBA, providing `deleted` and `deleted_at` columns on tables
|
||||
which are not soft-deleted is confusing. One might also say it's confusing to
|
||||
soft-delete from tables where deleted rows are never read.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
This spec proposes removing the `SoftDeleteMixin` from NovaBase and re-adding
|
||||
it to all tables which currently inherit from NovaBase. The removal of
|
||||
SoftDeleteMixin from those tables which don't need it will be left for future
|
||||
work.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
We could not do this. This means we need an extra two columns on new tables
|
||||
and it makes it slightly easier to start soft-deleting new tables.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
alexisl
|
||||
|
||||
Other contributors:
|
||||
None
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Remove `SoftDeleteMixin` from NovaBase.
|
||||
* Add it to all models which inherited from NovaBase.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
None.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
None.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
None.
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Liberty
|
||||
- Introduced
|
||||
* - Mitaka
|
||||
- Simplified and re-proposed
|
||||
@@ -0,0 +1,177 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================================
|
||||
Remove shared storage flag in evacuate API
|
||||
==========================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/remove-shared-storage-flag-in-evacuate-api
|
||||
|
||||
Today evacuate API expects an onSharedStorage flag to be provided by the admin
|
||||
however this information can be detected by the virt driver as well. To ease
|
||||
the work of the admin and to allow easier automation of the evacuation tasks
|
||||
this spec propose to remove the onSharedStorage flag from the API in a new
|
||||
microversion.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
When an instance needs to be evacuated from a failed host the admin has to
|
||||
check if the instance was stored on shared storage or not to issue the evacuate
|
||||
command properly. The admin wants to rely on the virt driver to detect if
|
||||
the instance data is available on the target host and use it if possible for
|
||||
the evacuation.
|
||||
An external automatic evacuation engine also wants to let nova to decide
|
||||
if the instance can be evacuated without rebuilding it on the target host.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
In compute manager in the rebuild_instance function the on_shared_storage
|
||||
flag is made optional with a previous spec so that the onSharedStorage
|
||||
parameter now can be removed from the evacuate API.
|
||||
|
||||
The evacuate API supports providing a new admin password optionally. This
|
||||
makes the solution a bit more complicated.
|
||||
Nova can only decide if the instance is on shared storage if the target host
|
||||
of the evacuation is already known which means only after the scheduler
|
||||
selected the new host because nova needs to check if the disk of the instance
|
||||
is visible from the target host. However the evacuation API call returns the
|
||||
new admin password in the response. This logic cannot be fully kept if the
|
||||
onSharedStorage flag is removed.
|
||||
|
||||
There are two cases to consider if the onSharedStorage flag is removed:
|
||||
|
||||
* Client doesn't provide admin password. Nova will generate a new password.
|
||||
If nova finds that the instance is on shared storage then
|
||||
the instance will be rebooted and will use the same admin password as before.
|
||||
If nova finds that the instance is not on shared storage then the instance
|
||||
will be recreated and the newly generated admin password will be used.
|
||||
* Client provides admin password.
|
||||
If nova finds that the instance is on shared storage then
|
||||
the password the client provided will be silently ignored. If nova finds
|
||||
that the instance is not on shared storage then the provided password will
|
||||
be injected to the recreated instance.
|
||||
|
||||
This spec propose to
|
||||
|
||||
* Remove the onSharedStorage parameter of the
|
||||
/v2.1/{tenant_id}/servers/{server_id}/action API
|
||||
* Remove adminPass from the response body of the API call. Admin user can still
|
||||
access the generated password via
|
||||
/v2.1/{tenant_id}/servers/{server_id}/os-server-password API
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
For the automation use case the alternative would be to reimplement the
|
||||
checking of the instance availability on the disk in the theoretical external
|
||||
evacuation engine. However this would be a clear code duplication as nova
|
||||
already contains this check in the virt driver.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
The onSharedStorage parameter of the
|
||||
/v2.1/{tenant_id}/servers/{server_id}/action API will be removed.
|
||||
So the related JSON schema would be change to the following::
|
||||
|
||||
{
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'evacuate': {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'host': parameter_types.hostname,
|
||||
'adminPass': parameter_types.admin_password,
|
||||
},
|
||||
'required': [],
|
||||
'additionalProperties': False,
|
||||
},
|
||||
},
|
||||
'required': ['evacuate'],
|
||||
'additionalProperties': False,
|
||||
}
|
||||
|
||||
Also the adminPass will be removed from the response body.
|
||||
This would make the response body empty therefore the API response
|
||||
will not return a response body instead.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
balazs-gibizer
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Remove onSharedStorage from the evacuate REST API
|
||||
* Remove adminPass and therefore the whole response body of the evacuate API
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
Unit and functional test coverage will be provided.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
Admin guide needs to be updated with the new behavior of the evacuate
|
||||
function.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
[1] The bp that made the on_shared_storage optional in compute manager in
|
||||
Liberty https://blueprints.launchpad.net/nova/+spec/optional-on-shared-storage-flag-in-rebuild-instance
|
||||
[2] The code that made the on_shared_storage optional in compute manager in
|
||||
Liberty https://review.openstack.org/#/c/197951/
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Mitaka
|
||||
- Introduced
|
||||
@@ -0,0 +1,192 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===============================================
|
||||
Make os-instance-actions read deleted instances
|
||||
===============================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/os-instance-actions-read-deleted-instances
|
||||
|
||||
Change the os-instance-actions API to read deleted instances so the owner can
|
||||
see the actions performed on their deleted instance.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The os-instance-actions API currently does not read deleted instances [#f1]_.
|
||||
|
||||
Also, instance_actions are not soft deleted when an instance is deleted, so
|
||||
we can still read them out of the DB without needing the read_deleted='yes'
|
||||
flag.
|
||||
|
||||
The point of instance actions is auditing, and in the case of a post-mortem
|
||||
when an instance is deleted, instance_actions would be used for this, but
|
||||
because of the API limitation, you can't get those out of the API using the
|
||||
deleted instance.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
#. Multiple users are in the same project/tenant.
|
||||
#. User A deletes a shared instance.
|
||||
#. User B wants to know what happened to it (or who deleted it).
|
||||
|
||||
User B should be able to lookup the instance actions on the instance since they
|
||||
are in the same project as user A.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Add a microversion change to the os-instance-actions API so that we mutate the
|
||||
context and set the read_deleted='yes' attribute when looking up the instance
|
||||
by uuid.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
* We can assume that operators are listening for nova notifications and storing
|
||||
those off for later lookup in the case that they need to determine who
|
||||
deleted an instance. This is not a great assumption since it relies on an
|
||||
external monitoring system being setup outside of nova, which is optional.
|
||||
|
||||
* Operators can query the database directly to get the instance actions for a
|
||||
deleted instance, but then they have to know the nova data model. And only
|
||||
operators can do that, it doesn't allow for tenant users to do this lookup
|
||||
themselves (so they'd have to open a support ticket to the operator to do
|
||||
the lookup for them).
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
Impacted API: os-instance-actions
|
||||
|
||||
Impacted methods: GET
|
||||
|
||||
The os-instance-actions API only has two GET requests:
|
||||
|
||||
#. index: list the instance actions by instance uuid
|
||||
#. show: show details on an instance action by instance uuid and request id
|
||||
including, if authorized, the related instance action events.
|
||||
|
||||
The request and response values do not change in the API. The expected response
|
||||
codes do not change - there is still a 404 returned if the instance or instance
|
||||
action is not found.
|
||||
|
||||
The only change is that when looking up the instance, we set the
|
||||
read_deleted='yes' flag on the context. This will be done within a conditional
|
||||
block based on the microversion in the request.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
We can bump the max support API version in python-novaclient automatically for
|
||||
this change since it's self-contained in the server side API code, the client
|
||||
does not have to do anything except opt into the microversion.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Matt Riedemann <mriedem@us.ibm.com>
|
||||
|
||||
Other contributors:
|
||||
None
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* If the microversion in the request satisfies the minimum version required,
|
||||
temporarily mutate the context when reading the instance by uuid from the
|
||||
database. For example:
|
||||
|
||||
::
|
||||
|
||||
with utils.temporary_mutation(context, read_deleted='yes'):
|
||||
instance = common.get_instance(self.compute_api, context, server_id)
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None.
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
#. Unit tests will be updated.
|
||||
#. Functional tests (API sample tests) will be provided for the microversion
|
||||
change. The scenarios are basically:
|
||||
|
||||
* Delete an instance and try to get it's instance actions where the
|
||||
microversion requested does not meet the minimum requirement and assert
|
||||
that nothing is returned.
|
||||
* Delete an instance and try to get it's instance actions where the
|
||||
microversion requested does meet the minimum requirement and assert that
|
||||
the related instance actions are returned.
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
* http://docs.openstack.org/developer/nova/api_microversion_history.html will
|
||||
be updated.
|
||||
* http://developer.openstack.org/api-ref-compute-v2.1.html will be updated to
|
||||
point out the microversion change.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* Mailing list: http://lists.openstack.org/pipermail/openstack-dev/2015-November/080039.html
|
||||
|
||||
.. [#f1] API: https://github.com/openstack/nova/blob/12.0.0/nova/api/openstack/compute/instance_actions.py#L56
|
||||
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Mitaka
|
||||
- Introduced
|
||||
233
specs/mitaka/implemented/oslo_db-enginefacade.rst
Normal file
233
specs/mitaka/implemented/oslo_db-enginefacade.rst
Normal file
@@ -0,0 +1,233 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=====================================
|
||||
Use the new enginefacade from oslo_db
|
||||
=====================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/new-oslodb-enginefacade
|
||||
|
||||
Implement the new oslo.db enginefacade interface described here:
|
||||
|
||||
https://blueprints.launchpad.net/oslo.db/+spec/make-enginefacade-a-facade
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The linked oslo.db spec contains the details of the proposal, including its
|
||||
general advantages to all projects. In summary, we transparently track database
|
||||
transactions using the RequestContext object. This means that if there is
|
||||
already a transaction in progress we will use it by default, only creating a
|
||||
separate transaction if explicitly requested.
|
||||
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
These changes will only affect developers.
|
||||
|
||||
* Allow a class of database races to be fixed
|
||||
|
||||
Nova currently only exposes database transactions in nova/db/sqlalchemy/api.py,
|
||||
which means that every db api call is in its own transaction. Although this
|
||||
will remain the same initially, the new interface allows a caller to extend a
|
||||
transaction across several db api calls if they wish. This will enable callers
|
||||
who need these to be atomic to achieve this, which includes the save operation
|
||||
on several Nova objects.
|
||||
|
||||
* Reduce connection load on the database
|
||||
|
||||
Many database api calls currently create several separate database connections,
|
||||
which increases load on the database. By reducing these to a single connection,
|
||||
load on the db will be decreased.
|
||||
|
||||
* Improve atomicity of API calls
|
||||
|
||||
By ensuring that database api calls use a single transaction, we fix a class of
|
||||
bug where failure can leave a partial result.
|
||||
|
||||
* Make greater use of slave databases for read-only transactions
|
||||
|
||||
The new api marks sections of code as either readers or writers, and enforces
|
||||
this separation. This allows us to automatically use a slave database
|
||||
connection for all read-only transactions. It is currently only used when
|
||||
explicitly requested in code.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Code changes
|
||||
------------
|
||||
|
||||
* Decorate the RequestContext class
|
||||
|
||||
nova.RequestContext is annotated with the
|
||||
@enginefacade.transaction_context_provider decorator. This adds several code
|
||||
hooks which provide access to the transaction context via the RequestContext
|
||||
object.
|
||||
|
||||
* Update database apis incrementally
|
||||
|
||||
Database apis will be updated in batches, by function. For example, Service
|
||||
apis, quota apis, instance apis. Invidual calls will be annotated as either
|
||||
readers or writers. Existing transaction management will be replaced. Calls
|
||||
into apis which have not been upgraded yet will continue to explicitly pass the
|
||||
session or connection object.
|
||||
|
||||
* Remove uses of use_slave wherever possible
|
||||
|
||||
The use_slave parameter will be removed from all upgraded database apis, which
|
||||
will involve updating call sites and tests. Where the caller no longer uses the
|
||||
use_slave parameter anywhere, the removal will be propagated as far as
|
||||
possible. The exception will be external interfaces. All uses of use_slave
|
||||
will be removed. External interfaces will continue to accept it, but will not
|
||||
use it.
|
||||
|
||||
* Cells 'api' database calls
|
||||
|
||||
get_api_engine() and get_api_session() will be replaced by a context manager
|
||||
which changes the current transaction manager.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Alternatives were examined during the design of the oslo.db code. The goal of
|
||||
this change is to implement a solution which is common across OpenStack
|
||||
projects.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None.
|
||||
|
||||
This change obsoletes the use_slave parameter everywhere it is used, which
|
||||
includes several apis with external interfaces. We remove it from all internal
|
||||
interfaces. For external interfaces we leave it in place, but ignore it. Slave
|
||||
connections will be used everywhere automatically, whenever possible
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
Nothing obvious.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
By reducing connection load on the database, the change is expected to provide
|
||||
a small performance improvement. However, the primary purpose is correctness.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
The initial phase of this work will be to implement the new engine facade in
|
||||
nova/db/sqlalchemy/api.py only, and the couple of cells callers which access
|
||||
the database outside this module. There will be some minor changes to function
|
||||
signatures in this module due to removing use_slave, but all callers will be
|
||||
updated as part of this work. Callers will not have to consider transaction
|
||||
context if they do not currently do so, as it will be created and destroyed
|
||||
automatically.
|
||||
|
||||
This change will allow developers to explicitly extend database transaction
|
||||
context to cover several database calls. This allows the caller to make
|
||||
multiple database changes atomically.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
mbooth-9
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Enable use of the new api in Nova
|
||||
|
||||
* Migrate api bundles along functional lines:
|
||||
* Service
|
||||
* ComputeNode
|
||||
* Certificate
|
||||
* FloatingIP
|
||||
* DNSDomain
|
||||
* FixedIP
|
||||
* VIF
|
||||
* Instance, InstanceInfoCache, InstanceExtra, InstanceMetadata,
|
||||
InstanceSystemMetadata, InstanceFault, InstanceGroup, InstanceTag
|
||||
* KeyPair
|
||||
* Network
|
||||
* Quota
|
||||
* EC2
|
||||
* BDM
|
||||
* SecurityGroup
|
||||
* ProviderFWRule
|
||||
* Migration
|
||||
* ConsolePool
|
||||
* Flavor
|
||||
* Cells
|
||||
* Agent
|
||||
* Bandwidth
|
||||
* Volume
|
||||
* S3
|
||||
* Aggregate
|
||||
* Action
|
||||
* Task
|
||||
* PCIDevice
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
A version of oslo.db including the new enginefacade api:
|
||||
|
||||
https://review.openstack.org/#/c/138215/
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
This change is intended to have no immediate functional impact. The current
|
||||
tests should continue to pass, except where:
|
||||
|
||||
* An internal API is modified to remove use_slave
|
||||
* The change exposes a bug
|
||||
* The tests assumed implementation details which have changed
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
None.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
https://blueprints.launchpad.net/oslo.db/+spec/make-enginefacade-a-facade
|
||||
197
specs/mitaka/implemented/pause-vm-during-live-migration.rst
Normal file
197
specs/mitaka/implemented/pause-vm-during-live-migration.rst
Normal file
@@ -0,0 +1,197 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===============================================
|
||||
Provide a way to pause VM during live migration
|
||||
===============================================
|
||||
|
||||
Blueprint:
|
||||
https://blueprints.launchpad.net/nova/+spec/pause-vm-during-live-migration
|
||||
|
||||
When using live migrations, an operator might want to have a possibility to
|
||||
increase success chance of migration even at the cost of longer VM downtime.
|
||||
This spec proposes a new nova API for pausing VM during live migration.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The most common use case of live migration is host maintenance for different
|
||||
purposes. It might be, e.g., OpenStack upgrade to newer version or even
|
||||
hardware upgrade. Hypervisors have some features such as CPU throttling or
|
||||
memory compression to make it possible to live migrate every VM to other hosts.
|
||||
However, a VM might run workload that will prevent live migration from
|
||||
finishing. In such case operator might want to pause VM during live migration
|
||||
to stop memory writes on a VM.
|
||||
|
||||
Another use case is imminent host failure where live migration duration might
|
||||
be crucial to keep VMs running regardless of VMs downtime during transition to
|
||||
destination host.
|
||||
|
||||
Currently to pause VM during live migration operator needs to pause VM through
|
||||
libvirt/hypervisor. This pause is transparent for Nova as this is the same that
|
||||
happens during 'pause-and-copy' step during live migration.
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
As an operator of an OpenStack cloud, I would like the ability to pause VM
|
||||
during live migration. This operation prevents VM from dirtying memory and
|
||||
therefore it forces live migration to complete.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
A new API method for pausing VM during live migration. This will make
|
||||
asynchronous RPC call to compute node to pause a VM through libvirt.
|
||||
Also this will introduce new instance action 'live-migration-paused-vm'.
|
||||
The Migration object and MigrationList object will be used to establish which
|
||||
migrations exist, with additional optional data provided by the compute driver.
|
||||
|
||||
This will need an increment to the rpcapi version too.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Alternative is not doing this and let operator pause VM manually through
|
||||
hypervisor.
|
||||
|
||||
Another alternative is to reuse existing pause operation in nova. However, it
|
||||
might bring some confusion to operators. Libvirt preserves VM state that was
|
||||
in effect when live migration started. When live migration completes
|
||||
libvirt reverts VM state to preserved one. Example workflow:
|
||||
|
||||
* VM is active
|
||||
* Operator starts live migration
|
||||
* Libvirt preserves active state of a VM
|
||||
* Operator pauses VM during transition (e.g., nova pause VM)
|
||||
* LM finishes
|
||||
* Libvirt reverts VM state to preserved one - in this case to active.
|
||||
|
||||
Because of such behavior it is not recommended to reuse existing pause
|
||||
operation. It might be confusing for operators that single operation is used
|
||||
for two different purposes.
|
||||
|
||||
Also in the future there might be multiple methods to force end of live
|
||||
migration. This API can be extended to give hints to do things other than
|
||||
pause the VM during live migration.
|
||||
|
||||
This also will be suitable for Tasks API.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None. The Migration objects used are already created and tracked by nova.
|
||||
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
To be added in a new microversion.
|
||||
|
||||
* Force live migration to complete by pausing VM
|
||||
|
||||
`POST /servers/{id}/migrations/{id}/action`
|
||||
|
||||
Body::
|
||||
|
||||
{
|
||||
"force_complete": null
|
||||
}
|
||||
|
||||
Normal http response code: `202 Accepted`
|
||||
No response body is needed
|
||||
|
||||
Expected error http response code: `400 Bad Request`
|
||||
- the instance state is invalid for forcing live migration to complete,
|
||||
i.e., the task state is not 'migrating' or the migration is not in a
|
||||
'running' state and the type is 'live-migration'. Also when live
|
||||
migration cancel action is undergoing.
|
||||
|
||||
Expected error http response code: `403 Forbidden`
|
||||
- Policy violation if the caller is not granted access to
|
||||
'os_compute_api:servers:migrations:force_complete' in policy.json
|
||||
|
||||
Expected error http response code: `404 Not Found`
|
||||
- the instance does not exist
|
||||
|
||||
Because this is async call there might be an error that will not be exposed
|
||||
through API. For instance, hypervisor does not support pausing VM during live
|
||||
migration. Such error will be logged by compute service.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
There will be new notification to indicate start and outcome of pausing VM
|
||||
during ongoing live migration.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
python-novaclient will be extended by new operation to force ongoing live
|
||||
migration to complete by pausing VM during transition to destination host.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Pawel Koniszewski (irc: pkoniszewski)
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Pausing VM during live migration through libvirt
|
||||
* python-novaclient 'nova live-migration-force-complete'
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
* Unit and Functional tests in Nova
|
||||
* Tempest tests if possible to slow down live migration or start never-ending
|
||||
live migration
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
New API needs to be documented:
|
||||
|
||||
* Compute API extensions documentation
|
||||
http://developer.openstack.org/api-ref-compute-v2.1.html
|
||||
|
||||
* nova.compute.api documentation
|
||||
http://docs.openstack.org/developer/nova/api/nova.compute.api.html
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
None
|
||||
138
specs/mitaka/implemented/persist-request-spec.rst
Normal file
138
specs/mitaka/implemented/persist-request-spec.rst
Normal file
@@ -0,0 +1,138 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================
|
||||
Persist RequestSpec object
|
||||
==========================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/persist-request-spec
|
||||
|
||||
Persist the RequestSpec object used for scheduling an instance.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
There are a few times that it would be useful to have the RequestSpec used for
|
||||
originally scheduling an instance where it is not currently available, such as
|
||||
during a resize/migrate. In order to have later scheduling requests operate
|
||||
under the same constraints as the original we should retain the RequestSpec for
|
||||
these later scheduling calls.
|
||||
|
||||
Going forward with cells it will be necessary to store a RequestSpec before an
|
||||
instance is created so that the API can return details on the instance before
|
||||
it has been scheduled.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
* Operators/users want to move an instance through a migration or resize and
|
||||
want the destination to satisfy the same requirements as the source.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
A save() method will be added to the RequestSpec object. This will store the
|
||||
RequestSpec in the database. Since this is also a part of the cells effort it
|
||||
will be possible to stor in both the api and regular nova database. Which
|
||||
database it's stored in on save() will be determined by the context used.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Parts of it could be put into the instance_extra table. Because later this
|
||||
will be persisted in the api database before scheduling and then moved to the
|
||||
cell database after scheduling it is beneficial to just store it in a table
|
||||
that can exist in both.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
A new database table will be added to both the api and cell database. The
|
||||
schema will match what is necessary for the RequestSpec object to be stored.
|
||||
Since it is not yet implemented it's of little use to finalize the design here.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None here, but this will allow for resizes to be scheduled like the original
|
||||
boot request.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
An additional database write will be incurred.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
Same as for users, nothing here but this opens up future changes.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
alaski
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Add a new table to the api and cell/current database
|
||||
* Add the save() method to the RequestSpec object
|
||||
* Call the save() method in the code at the appropriate place
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/request-spec-object
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
New unit tests will be added. This is not externally facing in a way that
|
||||
Tempest can test.
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Devref documentation will be added explaining the existence of this data for
|
||||
use in scheduling.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
None
|
||||
281
specs/mitaka/implemented/rbd-instance-snapshots.rst
Normal file
281
specs/mitaka/implemented/rbd-instance-snapshots.rst
Normal file
@@ -0,0 +1,281 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
======================
|
||||
RBD Instance Snapshots
|
||||
======================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/rbd-instance-snapshots
|
||||
|
||||
When using RBD as storage for glance and nova, instance snapshots are
|
||||
slow and inefficient, resulting in poor end user experience. Using
|
||||
local disk for the upload increases operator costs for supporting
|
||||
instance snapshots.
|
||||
|
||||
As background reading the follow link provides an overview of the
|
||||
snapshotting capabilities available in ceph.
|
||||
|
||||
http://docs.ceph.com/docs/master/rbd/rbd-snapshot/
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
RBD is often used to back glance images and nova disks. When using rbd
|
||||
for nova's disks, nova 'snapshots' are slow, since they create full
|
||||
copies by downloading data from rbd to a local file, uploading it to
|
||||
glance, and putting it back into rbd. Since raw images are normally
|
||||
used with rbd to enable copy-on-write clones, this process removes any
|
||||
sparseness in the data uploaded to glance. This is a problem of user
|
||||
experience, since this slow, inefficient process takes much longer
|
||||
than necessary to let users customize images.
|
||||
|
||||
For operators, this is also a problem of efficiency and cost. For
|
||||
rbd-backed nova deployments, this is the last part that uses
|
||||
significant local disk space.
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
This allows end users to quickly iterate on images, for example to
|
||||
customize or update them, and start using the snapshots far more
|
||||
quickly.
|
||||
|
||||
For operators, this eliminates any need for large local disks on
|
||||
compute nodes, since instance data in rbd stays in rbd. It also
|
||||
prevents lots of wasted space.
|
||||
|
||||
Project Priority
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Instead of copying all the data to local disk, keep it in RBD by
|
||||
taking an RBD snapshot in Nova and cloning it into Glance. Rather
|
||||
than uploading the data, just tell Glance about its location in
|
||||
RBD. This way data stays in the Ceph cluster, and the snapshot is
|
||||
far more rapidly usable by the end user.
|
||||
|
||||
In broad strokes, the workflow is as follows:
|
||||
|
||||
1. Create an RBD snapshot of the ephemeral disk via Nova in
|
||||
the ceph pool Nova is configured to use.
|
||||
|
||||
2. Clone the RBD snapshot into Glance's RBD pool. [7]
|
||||
|
||||
3. To keep from having to manage dependencies between snapshots
|
||||
and clones, deep-flatten the RBD clone in Glance's RBD pool and
|
||||
detach it from the Nova RBD snapshot in ceph. [7]
|
||||
|
||||
5. Remove the RBD snapshot from ceph created in (1) as it is no
|
||||
longer needed.
|
||||
|
||||
6. Update Glance with the location of the RBD clone created and
|
||||
flattend in (2) and (3).
|
||||
|
||||
This is the reverse of how images are cloned into nova instance disks
|
||||
when both are on rbd [0].
|
||||
|
||||
If any of these steps fail, clean up any partial state and fall back
|
||||
to the current full copy method. Failure of the RBD snapshot method
|
||||
will be quick and usually transient in nature. The cloud admin can
|
||||
monitor for these failures and address the underlying CEPH issues
|
||||
causing the RBD snapshot to fail.
|
||||
|
||||
Failures will be reported in the form of stack traces in the nova
|
||||
compute logs.
|
||||
|
||||
There are a few reasons for falling back to full copies instead of
|
||||
bailing out if efficient snapshots fail:
|
||||
|
||||
* It makes upgrades graceful, since nova snapshots still work
|
||||
before glance has enough permissions for efficient snapshots
|
||||
(see Security Impact for glance permission details).
|
||||
|
||||
* Nova snapshots still work when efficient snapshots are not
|
||||
possible due to architecture choices, such as not using rbd as
|
||||
a glance backend, or using different ceph clusters for glance
|
||||
and nova.
|
||||
|
||||
* This is consistent with existing rbd behavior in nova and cinder.
|
||||
If cloning from a glance image fails, both projects fall back
|
||||
to full copies when creating volumes or instance disks.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
The clone flatten step could be handled as a background task in a
|
||||
green thread, or completely asynchronously as a periodic task. This
|
||||
would increase user-facing performance, as the snapshots would be
|
||||
available for use immediately, but it would also introduce
|
||||
race-condition-like issues around deleting dependent images.
|
||||
|
||||
The flatten step could be omitted completely, and glance could be
|
||||
made responsible for tracking the various image dependencies. At
|
||||
the rbd level, an instance snapshot would consist of three things
|
||||
for each disk. This is true of any instance, regardless of whether
|
||||
it was created from a snapshot itself, or is just created from a
|
||||
usual image. In rbd, there would be:
|
||||
|
||||
1. a snapshot of the instance disk
|
||||
|
||||
2. a clone of the instance disk
|
||||
|
||||
3. a snapshot of the clone
|
||||
|
||||
(3) is exposed through glance's backend location.
|
||||
(2) is an internal detail of glance.
|
||||
(1) is an internal detail that nova and glance handle.
|
||||
|
||||
At the rbd level, a disk with snapshots can't be deleted. Hide this
|
||||
from the user if they delete an instance with snapshots by making
|
||||
glance responsible for their eventual deletion, once their dependent
|
||||
snapshots are deleted. Nova does this by renaming instance disks that
|
||||
it deletes in rbd, so glance is aware that they can be deleted.
|
||||
|
||||
When a glance snapshot is deleted, it deletes (3), then (2), and
|
||||
(1). If nova has renamed its parent in rbd with a preset suffix, the
|
||||
instance has been destroyed already, so glance tries to delete the
|
||||
original instance disk. The original instance disk will be
|
||||
successfully deleted when the last snapshot is removed.
|
||||
|
||||
If glance snapshots are created but deleted before the instance is
|
||||
destroyed, nova will delete the instance disks as usual.
|
||||
|
||||
The mechanism nova uses to let glance know it needs to clean up the
|
||||
original disk could be different. It could use an image property with
|
||||
certain restrictions which aren't possible in the current glance api:
|
||||
|
||||
* it must be writeable only once
|
||||
|
||||
* to avoid exposing backend details, it would need to be hidden
|
||||
from end users
|
||||
|
||||
Storing this state in ceph is much easier to keep consistent with
|
||||
ceph, rather than an external database which could become out of sync.
|
||||
It would also be an odd abstraction leak in the glance_store api, when
|
||||
upper layers don't need to be aware of it at all.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
Glance will need to be configured with direct_url support enabled
|
||||
in order for Nova to determine what and where to clone the image
|
||||
from, depending on system configurations, this could leak backend
|
||||
credentials [5]. Devstack has already been updated to switch
|
||||
behaviors when Ceph support is requested [6].
|
||||
|
||||
Documentation has typically recommended using different ceph pools
|
||||
for glance and nova, with different access to each. Since nova
|
||||
would need to be able to create the snapshot in the pool used by
|
||||
glance, it would need write access to this pool as well.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Snapshots of RBD-backed instances would be significantly faster.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
Snapshots of RBD-backed instances would be significantly faster.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
To use this in an existing installation with authx, adding 'allow
|
||||
rwx pool=images' to nova's ceph user capabilities is necessary. The
|
||||
'ceph auth caps' command can be used for this [1]. If these permissions
|
||||
are not updated, nova will continue using the existing full copy
|
||||
mechanism for instance snapshots because the the fast snapshot will fail
|
||||
and nova compute will fall back to the full copy method.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
nic
|
||||
|
||||
Other contributors:
|
||||
jdurgin
|
||||
pbrady
|
||||
nagyz
|
||||
cfb-n/cburgess
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
Implementation: [4]
|
||||
|
||||
The libvirt imagebackend does not currently recognize AMI images
|
||||
as raw (and therefore cloneable) for whatever reason, so this
|
||||
proposed change is of limited utility with a very popular image
|
||||
format. This should be addressed in a separate change.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
You need a Havana or newer version of glance as direct URL was added in
|
||||
Havana.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
The existing tempest tests with ceph in the gate cover instance
|
||||
snapshots generically. As fast snapshots are enabled automatically, there
|
||||
is no need to change the tempest tests. Additionally, unit tests in nova
|
||||
will verify error handling (falling back to full copies if the process
|
||||
fails), and make sure that when configured correctly rbd snapshots and
|
||||
clones are used rather than full copies.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
See the security and other deployer impact sections above.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[0] http://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/rbd-clone-image-handler.html
|
||||
|
||||
[1] Ceph authentication docs: http://ceph.com/docs/master/rados/operations/user-management/#modify-user-capabilities
|
||||
|
||||
[2] Alternative: Glance cleanup patch: https://review.openstack.org/127397
|
||||
|
||||
[3] Alternative: Nova patch: https://review.openstack.org/125963
|
||||
|
||||
[4] Nova patch: https://review.openstack.org/205282
|
||||
|
||||
[5] https://bugs.launchpad.net/glance/+bug/880910
|
||||
|
||||
[6] https://review.openstack.org/206039
|
||||
|
||||
[7] http://docs.ceph.com/docs/master/dev/rbd-layering/
|
||||
239
specs/mitaka/implemented/request-spec-object-mitaka.rst
Normal file
239
specs/mitaka/implemented/request-spec-object-mitaka.rst
Normal file
@@ -0,0 +1,239 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=========================
|
||||
Create RequestSpec Object
|
||||
=========================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/request-spec-object-mitaka
|
||||
|
||||
Add a structured, documented object that represents a specification for
|
||||
launching multiple instances in a cloud. This spec is a follow-up from the
|
||||
previously approved and partially implemented request-spec-object spec.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The main interface into the scheduler, the `select_destinations()` method,
|
||||
accepts a `request_spec` parameter that is a nested dict. This nested dict is
|
||||
constructed in `nova.scheduler.utils.build_request_spec()`, however the
|
||||
structure of the request spec is not documented anywhere and the filters in the
|
||||
scheduler seem to take a laisse faire approach to querying the object during
|
||||
scheduling as well as modifying the `request_spec` object during loops of the
|
||||
`nova.scheduler.host_manager.HostStateManager.get_filtered_hosts()` method,
|
||||
which calls the filter object's `host_passes` object, supplying a
|
||||
`filter_properties` parameter, which itself has a key called `request_spec`
|
||||
that contains the aforementioned nested dict.
|
||||
|
||||
This situation makes it very difficult to understand exactly what is going on
|
||||
in the scheduler, and cleaning up this parameter in the scheduler interface is
|
||||
a pre-requisite to making a properly-versioned and properly-documented
|
||||
interface in preparation for a split-out of the scheduler code.
|
||||
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
This is a pure refactoring effort for cleaning up all the interfaces in between
|
||||
Nova and the scheduler so the scheduler could be split out by the next cycle.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
A new class called `RequestSpec` will be created that models a request to
|
||||
launch multiple virtual machine instances. The first version of the
|
||||
`RequestSpec` object will simply be an objectified version of the current
|
||||
dictionary parameter. The scheduler will construct this `RequestSpec` object
|
||||
from the `request_spec` dictionary itself.
|
||||
|
||||
The existing
|
||||
`nova.scheduler.utils.build_request_spec` method will be removed in favor of a
|
||||
factory method on `nova.objects.request_spec.RequestSpec` that will construct
|
||||
a `RequestSpec` from the existing key/value pairs in the `request_spec`
|
||||
parameter supplied to `select_destinations`.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
None.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
This spec is not focusing on persisting the RequestSpec object but another
|
||||
blueprint (and a spec) will be proposed with this one as dependency for
|
||||
providing a save() method to the RequestSpec object which would allow it to be
|
||||
persisted in (probably) instance_extra DB table.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None, besides making the scheduler call interfaces gradually easier to read
|
||||
and understand.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
The `request_spec` dictionary is currently constructed by the nova-conductor
|
||||
when it calls the `nova.scheduler.utils.build_request_spec()` function, which
|
||||
looks like this:
|
||||
|
||||
.. code:: python
|
||||
|
||||
def build_request_spec(ctxt, image, instances, instance_type=None):
|
||||
"""Build a request_spec for the scheduler.
|
||||
|
||||
The request_spec assumes that all instances to be scheduled are the same
|
||||
type.
|
||||
"""
|
||||
instance = instances[0]
|
||||
if isinstance(instance, obj_base.NovaObject):
|
||||
instance = obj_base.obj_to_primitive(instance)
|
||||
|
||||
if instance_type is None:
|
||||
instance_type = flavors.extract_flavor(instance)
|
||||
# NOTE(comstud): This is a bit ugly, but will get cleaned up when
|
||||
# we're passing an InstanceType internal object.
|
||||
extra_specs = db.flavor_extra_specs_get(ctxt, instance_type['flavorid'])
|
||||
instance_type['extra_specs'] = extra_specs
|
||||
request_spec = {
|
||||
'image': image or {},
|
||||
'instance_properties': instance,
|
||||
'instance_type': instance_type,
|
||||
'num_instances': len(instances),
|
||||
# NOTE(alaski): This should be removed as logic moves from the
|
||||
# scheduler to conductor. Provides backwards compatibility now.
|
||||
'instance_uuids': [inst['uuid'] for inst in instances]}
|
||||
return jsonutils.to_primitive(request_spec)
|
||||
|
||||
As the filter_properties dictionary is hydrated with the request_spec
|
||||
dictionary, this proposal is merging both dictionaries into a single object.
|
||||
|
||||
A possible first version of a class interface for the `RequestSpec`
|
||||
class would look like this, in order to be as close to a straight conversion
|
||||
from the nested dict's keys to object attribute notation:
|
||||
|
||||
.. code:: python
|
||||
|
||||
class RequestSpec(base.NovaObject):
|
||||
|
||||
"""Models the request to launch one or more instances in the cloud."""
|
||||
|
||||
VERSION = '1.0'
|
||||
|
||||
fields = {
|
||||
'image': fields.ObjectField('ImageMeta', nullable=False),
|
||||
'root_gb': fields.IntegerField(nullable=False),
|
||||
'ephemeral_gb': fields.IntegerField(nullable=False),
|
||||
'memory_mb: fields.IntegerField(nullable=False),
|
||||
'vcpus': fields.IntegerField(nullable=False),
|
||||
'numa_topology': fields.ObjectField('InstanceNUMATopology',
|
||||
nullable=True),
|
||||
'project_id': fields.StringField(nullable=True),
|
||||
'os_type': fields.StringField(nullable=True),
|
||||
'availability_zone': fields.StringField(nullable=True),
|
||||
'instance_type': fields.ObjectField('Flavor', nullable=False),
|
||||
'num_instances': fields.IntegerField(default=1),
|
||||
'force_hosts': fields.StringField(nullable=True),
|
||||
'force_nodes': fields.StringField(nullable=True),
|
||||
'pci_requests': fields.ListOfObjectsField('PCIRequest', nullable=True),
|
||||
'retry': fields.ObjectField('Retry', nullable=True),
|
||||
'limits': fields.ObjectField('Limits', nullable=True),
|
||||
'group': fields.ObjectField('GroupInfo', nullable=True),
|
||||
'scheduler_hints': fields.DictOfStringsField(nullable=True)
|
||||
}
|
||||
|
||||
This blueprint targets to provide a new Scheduler API method which would only
|
||||
accept RequestSpec objects in replacement of select_destinations() which would
|
||||
be deprecated and removed in a later cycle.
|
||||
|
||||
That RPC API method could be having the following signature:
|
||||
|
||||
.. code:: python
|
||||
|
||||
def select_nodes(RequestSpec):
|
||||
# ...
|
||||
|
||||
|
||||
As said above in the data model impact section, this blueprint is not targeting
|
||||
to persist this object at the moment.
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
bauzas
|
||||
|
||||
Other contributors:
|
||||
None
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
- Convert all filter classes to operate against the `RequestSpec` object
|
||||
instead the nested `request_spec` dictionary.
|
||||
|
||||
- Change the Scheduler RPC API to accept a Spec object for select_destinations
|
||||
|
||||
- Modify conductor methods to directly hydrate a Spec object
|
||||
|
||||
- Add developer reference documentation for what the request spec models.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
The existing unit tests of the scheduler filters will be modified to access
|
||||
the `RequestSpec` object in the `filter_properties` dictionary.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Update any developer reference material that might be referencing the old
|
||||
dictionary accesses.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
This blueprint is part of an overall effort to clean up, version, and stabilize
|
||||
the interfaces between the nova-api, nova-scheduler, nova-conductor and
|
||||
nova-compute daemons that involve scheduling and resource decisions.
|
||||
230
specs/mitaka/implemented/service-status-notification.rst
Normal file
230
specs/mitaka/implemented/service-status-notification.rst
Normal file
@@ -0,0 +1,230 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=========================================================
|
||||
Add notification for administrative service status change
|
||||
=========================================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/service-status-notification
|
||||
|
||||
Today external system cannot get notification based information about the nova
|
||||
service status. Nova service status can be changed administratively via
|
||||
os-services/disable API.
|
||||
|
||||
Having such a notification helps to measure the length of maintenance windows
|
||||
or indirectly notify users about maintenance actions that possibly effect the
|
||||
operation of the infrastructure.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
Deployer wants to measure the time certain nova services were disable
|
||||
administratively due to troubleshooting or maintenance actions as this
|
||||
information might be part of the agreement between Deployer and End User.
|
||||
|
||||
Deployer wants to measure the time certain nova services was forced down due
|
||||
to an externally detected error as this information might be part of the
|
||||
agreement between Deployer and End User.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
An easy solution for the problem above is to add oslo.messaging notification
|
||||
for the following actions:
|
||||
|
||||
* /v2/{tenant_id}/os-services/disable
|
||||
|
||||
* /v2/{tenant_id}/os-services/enable
|
||||
|
||||
* /v2/{tenant_id}/os-services/disable-log-reason
|
||||
|
||||
* /v2/{tenant_id}/os-service/force-down
|
||||
|
||||
Then ceilometer can receive these notifications and the length of the
|
||||
maintenance window can be calculated via ceilometer queries.
|
||||
|
||||
Alternatively other third party tools like StackTach can receive the new
|
||||
notifications via AMQP.
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
The only alternative is to poll /v2/{tenant_id}/os-services/ API periodically
|
||||
however it means slower information flow and creates load on the nova API
|
||||
and DB services.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
No database schema change is foreseen.
|
||||
|
||||
The following new objects will be added to nova:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@base.NovaObjectRegistry.register
|
||||
class ServiceStatusNotification(notification.NotificationBase):
|
||||
# Version 1.0: Initial version
|
||||
VERSION = '1.0'
|
||||
fields = {
|
||||
'payload': fields.ObjectField('ServiceStatusPayload')
|
||||
}
|
||||
|
||||
@base.NovaObjectRegistry.register
|
||||
class ServiceStatusPayload(base.NovaObject):
|
||||
# Version 1.0: Initial version
|
||||
VERSION = '1.0'
|
||||
fields = {
|
||||
'service': fields.ObjectField('Service')
|
||||
}
|
||||
|
||||
The definition of NotificationBase can be found in the Versioned notification
|
||||
spec [3].
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
A new notification service.status.update will be introduced with INFO priority
|
||||
and the payload of the notification will be the serialized form of the already
|
||||
existing Service versioned object. This notification will be the first that
|
||||
uses versioned object as a payload but there is an initiative to
|
||||
use versioned objects as notification payload for every nova notification [3].
|
||||
This new notification will not support emitting legacy format.
|
||||
|
||||
During the implementation of this spec we will provide the minimum
|
||||
infrastructure to emit versioned notification based on [3] but all the advanced
|
||||
things like sample and doc generation will be done during the implementation
|
||||
[3].
|
||||
|
||||
For example after the following API call::
|
||||
|
||||
PUT /v2/{tenant_id}/os-services/disable-log-reason
|
||||
{"host": "Devstack",
|
||||
"binary": "nova-compute",
|
||||
"disabled_reason": "my reason"}
|
||||
|
||||
|
||||
The notification would contain the following payload::
|
||||
|
||||
{
|
||||
"nova_object.version":"1.0",
|
||||
"nova_object.name":"ServiceStatusPayload",
|
||||
"nova_object.namespace":"nova",
|
||||
"nova_object.data":{
|
||||
"service":{
|
||||
"nova_object.version":"1.19",
|
||||
"nova_object.name":"Service",
|
||||
"nova_object.namespace":"nova",
|
||||
"nova_object.data":{
|
||||
"id": 1,
|
||||
"host": "Devstack"
|
||||
"binary": "nova-compute",
|
||||
"topic": "compute",
|
||||
"report_count": 32011,
|
||||
"disabled": true,
|
||||
"disabled_reason": "my reason,
|
||||
"availability_zone": "nova",
|
||||
"last_seen_up": "2015-10-15 07:29:13",
|
||||
"forced_down": false,
|
||||
"version": 2,
|
||||
}
|
||||
"nova_object.changes":[
|
||||
"disabled",
|
||||
"disabled_reason",
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Please note that the compute_node field will not be serialized into the
|
||||
notification payload as that will bring in a lot of additional data not needed
|
||||
here.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
None
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
balazs-gibizer
|
||||
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Send a new notification if the disabled disabled_reson or forced_down field
|
||||
of the Service object is updated
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
This work is part of the Versioned notification API [3] work. But it is not
|
||||
directly depends on it. On the summit we agreed to add this new notification as
|
||||
the first step of the versioned notification api work to serve us as a carrot
|
||||
motivating the operators to start consuming new versioned notifications.
|
||||
|
||||
Testing
|
||||
=======
|
||||
Besides unit test new functional test cases will be added to cover the
|
||||
new notification
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
None
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[1] This idea has already been discussed on ML
|
||||
http://lists.openstack.org/pipermail/openstack-dev/2015-April/060645.html
|
||||
|
||||
[2] This work is related to but not depends on the bp mark-host-down
|
||||
https://blueprints.launchpad.net/nova/+spec/mark-host-down
|
||||
|
||||
[3] Versioned notification spec https://review.openstack.org/#/c/224755/
|
||||
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Mitaka
|
||||
- Introduced
|
||||
196
specs/mitaka/implemented/service-version-behavior.rst
Normal file
196
specs/mitaka/implemented/service-version-behavior.rst
Normal file
@@ -0,0 +1,196 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
================================
|
||||
Service Version Behavior Changes
|
||||
================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/service-version-behavior
|
||||
|
||||
There are a lot of situations where operators may have multiple
|
||||
versions of nova code running in a single deployment, either
|
||||
intentionally or accidentally. There are several things we can do make
|
||||
this safer and smoother in code to make the operator's life easier.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
When running multiple versions of Nova code, care must be taken to
|
||||
avoid sending RPC messages that are too new (or too old) for some of
|
||||
the services to understand, as well as avoid accessing the database
|
||||
with object models that are not able to handle the potential schema
|
||||
skew.
|
||||
|
||||
Right now, during an upgrade, operators must calculate and set version
|
||||
pins on the relevant RPC interfaces so that newer services (conductor,
|
||||
api, etc) can speak to older services (compute) while a mix of
|
||||
versions are present. This involves a lot of steps, config tweaking,
|
||||
and service restarting. The potential for incorrectly executed or
|
||||
missed steps is high.
|
||||
|
||||
Further, during normal operation, an older compute host that may have
|
||||
been offlined for an extended period of time could be restarted and
|
||||
attempt to join the system after compatibility code (or
|
||||
configurations) have been removed.
|
||||
|
||||
In both of these cases, nova should be able to help identify, avoid,
|
||||
and automate complex tasks that ultimately boil down to just a logical
|
||||
decision based on reported versions.
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
As an operator, I want live upgrades to be easier with fewer required
|
||||
steps and more forgiving behavior from nova.
|
||||
|
||||
As an operator, I want more automated checks preventing an ancient
|
||||
compute node from trying to rejoin after an extended hiatus.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
In Liberty, we landed a global service version counter. This records
|
||||
each service's version in the database, and provides some historical
|
||||
information (such as the compute rpc version at each global version
|
||||
bump). In Mitaka, we should take advantage of this to automate some
|
||||
tasks.
|
||||
|
||||
The first thing we will automate is the compute RPC version
|
||||
selection. Right now, operators set the version pin in the config file
|
||||
during a live upgrade and remove it after the upgrade is complete. We
|
||||
will add an option to set this to "auto", which will select the
|
||||
compute RPC version based on the reported service versions in the
|
||||
database. By looking up the minimum service version, we can consult
|
||||
the SERVICE_VERSION_HISTORY structure to determine what compute RPC
|
||||
version is supported by the oldest nodes. We can make this transparent
|
||||
to other code by doing the lookup in the compute_rpcapi module once at
|
||||
startup, and again on signals like SIGHUP.
|
||||
|
||||
This will only be done if the version pin is set to "auto", requiring
|
||||
operators to opt-in to this new behavior while it is smoke tested. In
|
||||
the case where we choose the version automatically, the decision (and
|
||||
whether it is the latest, or a backlevel version) will be logged for
|
||||
audit purposes.
|
||||
|
||||
The second change thing we will automate is checking of the minimum
|
||||
service version during service record create/update. This will prevent
|
||||
ancient services from joining the deployment if they are too old. This
|
||||
will be done in the Service object, and it will compare its own
|
||||
version to the minimum version of other services in the database. If
|
||||
it is older than all the other nodes, then it will refuse to start. If
|
||||
we refuse to start, we'll log the versions involved and the reason for
|
||||
the refusal visibly to make it clear what happend and what needs
|
||||
fixing.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
We could continue to document both of these procedures and require
|
||||
manual steps for the operators.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
There are no data(base) model impacts prescribed by the work here, as
|
||||
those were added preemptively in Liberty.
|
||||
|
||||
The Service object will gain at least one remotable method for
|
||||
determining the minimum service version.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Checking the minimum version in the database on compute_rpcapi module
|
||||
startup will incur a small performance penalty and additional database
|
||||
load. This will only happen once per startup (or signal) and is
|
||||
expected to be massively less impactful than the effort required to
|
||||
manually perform the steps being automated.
|
||||
|
||||
It would also be trivial for conductor to cache the minimum versions
|
||||
for some TTL in order to avoid hitting the database during a storm of
|
||||
services starting up.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
Deployer impact should be entirely positive. One of the behaviors will
|
||||
be opt-in only initially, and the other is purely intended to prevent
|
||||
the operators from shooting themselves in their feet.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
danms
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Add a minimum version query to the Service object
|
||||
* Automate selection of the compute RPC version when the pin is set to auto
|
||||
* Automate service failure on startup when the service version is too old
|
||||
* Hook re-checking of the minimum version to receiving a SIGHUP
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
As with all things that affect nova service startup, unit tests will
|
||||
be the only way to test that the service fails to startup when the
|
||||
version is too old.
|
||||
|
||||
The compute RPC pin selection can and will be tested by configuring
|
||||
grenade's partial-ncpu job to use "auto" instead of an explicit
|
||||
pin. This will verify that the correct version is selected by the fact
|
||||
that tempest continues to pass with nova configured in that way.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
A bit of documentation will be required for each change, merely to
|
||||
explain the newly-allowed value for the compute_rpc version pin and
|
||||
the potential new behavior of starting an older service.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* https://review.openstack.org/#/c/201733/
|
||||
* http://specs.openstack.org/openstack/nova-specs/specs/liberty/approved/service-version-number.html
|
||||
224
specs/mitaka/implemented/soft-affinity-for-server-group.rst
Normal file
224
specs/mitaka/implemented/soft-affinity-for-server-group.rst
Normal file
@@ -0,0 +1,224 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================================
|
||||
Add soft affinity support for server group
|
||||
==========================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/soft-affinity-for-server-group
|
||||
|
||||
As a tenant I would like to schedule instances on the same host if possible,
|
||||
so that I can achieve collocation. However if it is not possible to schedule
|
||||
some instance to the same host then I still want that the subsequent
|
||||
instances are scheduled together on another host. In this way I can express
|
||||
a good-to-have relationship between a group of instances.
|
||||
|
||||
As a tenant I would like to schedule instances on different hosts if possible.
|
||||
However if it is not possible I still want my instances to be scheduled even
|
||||
if it means that some of them are placed on the same host.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
End User might want to have a less strict affinity and anti-affinity
|
||||
rule than what is today available in server-group API extension.
|
||||
With the proposed good-to-have affinity rule the End User can request nova
|
||||
to schedule the instance to the same host (i.e. stack them) if possible.
|
||||
However if it is not possible (e.g. due to resource limitations) then End User
|
||||
still wants to keep the instances on a small amount of different host.
|
||||
|
||||
With the proposed good-to-have anti-affinity rule the End User can request
|
||||
nova to spread the instances in the same group as much as possible.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
This change would extend the existing server-group API extension with two new
|
||||
policies soft-affinity and soft-anti-affinity.
|
||||
|
||||
When a instance is booted into a group with soft-affinity policy the scheduler
|
||||
will use a new weight AffinityWeight to sort the available hosts according to
|
||||
the number of instances running on them from the same server-group in a
|
||||
descending order.
|
||||
|
||||
When an instance is booted into a group with soft-anti-affinity policy the
|
||||
scheduler will use a new weight AntiAffinityWeight to sort the available hosts
|
||||
according to the number of instances running on them from the same
|
||||
server-group in a ascending order.
|
||||
|
||||
The two new weights will get the necessary information about the number of
|
||||
instances per host through the weight_properties (filter_properties) in
|
||||
a similar way as the GroupAntiAffinityFilter gets the list of hosts used by
|
||||
a group via the filter_properties.
|
||||
|
||||
These new soft-affinity and soft-anti-affinity policies are mutually exclusive
|
||||
with each other and with the other existing server-group policies. This means
|
||||
that a server group cannot be created with more than one policy as every
|
||||
combination of the existing policies (affinity, anti-affinity, soft-affinity,
|
||||
soft-anti-affinity) are contradicting.
|
||||
|
||||
If the scheduler sees a request which requires any of the new weigher classes
|
||||
but those classes are not configured then the scheduler will reject the request
|
||||
with an exception similarly to the case when affinity policy is requested but
|
||||
ServerGroupAffinityFilter is not configured.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Alternatively End User can use the server-group with affinity policy and if
|
||||
the instance cannot be scheduled because the host associated to the group is
|
||||
full then End User can create a new server-group for the subsequent instances.
|
||||
However with large amount of instances that occupy many hosts this manual
|
||||
process can become quite cumbersome.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
No schema change is needed.
|
||||
|
||||
There will be two new possible values soft-affinity and soft-anti-affinity for
|
||||
the policy column of the instance_group_policy table.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
POST: v2/{tenant-id}/os-server-groups
|
||||
The value of the policy request parameter can be soft-affinity and
|
||||
soft-anti-affinity as well. So the new JSON schema will be the following::
|
||||
|
||||
{"type": "object",
|
||||
"properties": {
|
||||
"server_group": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": parameter_types.name,
|
||||
"policies": {
|
||||
"type": "array",
|
||||
"items": [{"enum": ["anti-affinity", "affinity",
|
||||
"soft-anti-affinity",
|
||||
"soft-affinity"]}],
|
||||
"uniqueItems": True,
|
||||
"additionalItems": False}},
|
||||
"required": ["name", "policies"],
|
||||
"additionalProperties": False}},
|
||||
"required": ["server_group"],
|
||||
"additionalProperties": False}
|
||||
|
||||
|
||||
For example the following POST request body will be valid::
|
||||
|
||||
{"server_group": {
|
||||
"name": "test",
|
||||
"policies": [
|
||||
"soft-anti-affinity"]}}
|
||||
|
||||
And will be answered with the following response body::
|
||||
|
||||
{"server_group": {
|
||||
"id": "5bbcc3c4-1da2-4437-a48a-66f15b1b13f9",
|
||||
"name": "test",
|
||||
"policies": [
|
||||
"soft-anti-affinity"
|
||||
],
|
||||
"members": [],
|
||||
"metadata": {}}}
|
||||
|
||||
The above API change will be introduced in a new API microversion.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
balazs-gibizer
|
||||
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Add two new weighers to the filter scheduler. These weights will
|
||||
sort the available hosts by the number of instances from the same
|
||||
server-group.
|
||||
* Update FilterScheduler to reject the request if the new policy is
|
||||
requested but the related weigher is not configured.
|
||||
* Update the server-group API extension to allow soft-affinity and
|
||||
soft-anti-affinity as the policy of a group.
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Unit test coverage will be provided.
|
||||
|
||||
The following functional test coverage will be provided:
|
||||
|
||||
* create groups with soft-affinity and soft-anti-affinity
|
||||
* boot two servers with soft-affinity with enough resource on the same host.
|
||||
Nova shall boot both server to the same host.
|
||||
* boot two servers with soft-affinity but there is not enough resource to boot
|
||||
the second server to the same host as the first server. Nova shall boot the
|
||||
second server to a different host.
|
||||
* boot two servers with soft-anti-affinity and two compute hosts are available
|
||||
with enough resources. Nova shall boot the two servers to two separate hosts.
|
||||
* boot two servers with soft-anti-affinity but only a single compute host is
|
||||
available. Nova shall boot the two servers to the same host.
|
||||
* Rebuild, migrate, evacuate server with soft-affinity
|
||||
* Rebuild, migrate, evacuate server with soft-anti-affinity
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
New weights need to be described in filter_scheduler.rst.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* instance-group-api-extension BP
|
||||
https://blueprints.launchpad.net/nova/+spec/instance-group-api-extension
|
||||
* Group API wiki
|
||||
https://wiki.openstack.org/wiki/GroupApiExtension
|
||||
@@ -0,0 +1,185 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
======================================
|
||||
Split network plane for live migration
|
||||
======================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/split-network-plane-for-live-migration
|
||||
|
||||
This spec is proposed to split the network plane of live migration from
|
||||
management network, in order to avoid the network performance impact caused by
|
||||
data transfer generated by live migration.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
When we do live migration with QEMU/KVM driver, we use hostname of target
|
||||
compute node as the target of live migration. So the RPC call and live
|
||||
migration traffic will be in same network plane. Live migration will have
|
||||
impact on network performance, and this impact is significant when lots of live
|
||||
migration occurs concurrently, even if CONF.libvirt.live_migration_bandwidth
|
||||
is set.
|
||||
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
The OpenStack deployer plan a specific network plane for live migration, which
|
||||
is separated from the management network. As the data transfer of live migrate
|
||||
is flowing in this specific network plane, its impact to network performance
|
||||
will be limited in this network plane and will have no impact for management
|
||||
network. The end user will not notice this change.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Add an new option CONF.my_live_migration_ip in configuration file, set None as
|
||||
default value. When pre_live_migration() execute in destination host, set the
|
||||
option into pre_migration_data, if it's not None. When driver.live_migration()
|
||||
execute in source host, if this option is present in pre_migration_data, the ip
|
||||
address is used instead of CONF.libvirt.live_migration_uri as the uri for live
|
||||
migration, if it's None, then the mechanism remains as it is now.
|
||||
|
||||
This spec focuses on the QEMU/KVM driver, the implementations for other drivers
|
||||
should be completed in separate blueprint.
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Config live migration uri, like this::
|
||||
|
||||
live_migration_uri = "qemu+tcp://%s.INTERNAL/system"
|
||||
|
||||
Then modify the DNS configuration in the OpenStack deployment::
|
||||
|
||||
target_hostname 192.168.1.5
|
||||
|
||||
target_hostname.INTERNAL 172.150.1.5
|
||||
|
||||
But requiring such DNS changes in order to deploy and use OpenStack may not be
|
||||
practical due to organizational procedure limitations at many organizations.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
This feature has no negative impact for security. Split data transfer and
|
||||
management will improve security somewhat by reducing the chance of a
|
||||
management plane denial of service.
|
||||
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
No impact on end user.
|
||||
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Using specifically planed network plane, when live migration, the impact of
|
||||
data transfer on network performance will no longer exist. The impact of live
|
||||
migration on network performance will be limited to its own network plane.
|
||||
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
The added configuration option CONF.my_live_migration_ip will be available for
|
||||
all drivers, the default value is None. Thus, when OpenStack upgrades, the
|
||||
existing live migration mechanism remains, if the option of
|
||||
CONF.my_live_migration_ip has been set, this option will be used for live
|
||||
migration's target uri. If the deployers want to use this function, a separated
|
||||
network plane will have to be planned in advance.
|
||||
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
All drivers can implement this function using the same mechanism.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Rui Chen <chenrui.momo@gmail.com>
|
||||
|
||||
Other contributors:
|
||||
Zhenyu Zheng <zhengzhenyu@huawei.com>
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Add new configuration option CONF.my_live_migration_ip into [DEFAULT] group.
|
||||
|
||||
* Modify the existing implementation of live migration, when
|
||||
pre_live_migration() execute in destination host, set the option into
|
||||
pre_migration_data, if it's not None.
|
||||
* In QEMU/KVM driver when driver.live_migration() execute in source host, if
|
||||
this option is present in pre_migration_data, the ip address is used instead
|
||||
of CONF.libvirt.live_migration_uri as the uri for live migration, if it's
|
||||
None, then the mechanism remains as it is now.
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Changes will be made for live migration, thus related unit tests will be added.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
The instruction for a new configuration option CONF.my_live_migration_ip will
|
||||
be added to the OpenStack Configuration Reference manual.
|
||||
|
||||
The operators can plan a specify network plane for live migration,
|
||||
like: 172.168.*.*, split it from management network (192.168.*.*), then add the
|
||||
option into nova.conf on every nova-compute host according to the planed IP
|
||||
addresses, like this: CONF.my_live_migration_ip=172.168.1.15.
|
||||
|
||||
The default value of new option is None, so the live-migration workflow is as
|
||||
same as the original by default.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
None
|
||||
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
None
|
||||
287
specs/mitaka/implemented/sriov-physical-function-passthrough.rst
Normal file
287
specs/mitaka/implemented/sriov-physical-function-passthrough.rst
Normal file
@@ -0,0 +1,287 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
============================================================
|
||||
Enable passthrough of SR-IOV physical functions to instances
|
||||
============================================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/sriov-physical-function-passthrough
|
||||
|
||||
Nova has supported passthrough of PCI devices with its libvirt driver for a
|
||||
few releases already, during which time the code has seen some stabilization
|
||||
and a few minor feature additions.
|
||||
|
||||
In the case of SR-IOV enabled cards, it is possible to treat any port on the
|
||||
card either as a number of virtual devices (called VFs - virtual functions) or
|
||||
as a full device (PF - physical function).
|
||||
|
||||
Nova's current handling exposes only virtual functions as resources that can
|
||||
be requested by instances - and this is the most common use case by far.
|
||||
However with the rise of the requirements to virtualize network applications,
|
||||
it can be necessary to give instances full control over the port and not just a
|
||||
single virtual function.
|
||||
|
||||
OpenStack is seen as one of the central bits of technology for the NFV
|
||||
use-cases, and a lot of the work has already gone into making OpenStack and
|
||||
Nova NFV enabled, so we want to make sure that we close these small remaining
|
||||
gaps.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Currently it is not possible to pass through a physical function to an
|
||||
OpenStack instance, but some NFV applications need to have full control of the
|
||||
port, while others are happy with using a VF of an SR-IOV enabled card. It is
|
||||
beneficial to be able to do so with the same set of cards, as pre-provisioning
|
||||
resources on the granularity smaller than compute hosts is cumbersome
|
||||
to manage and goes against the goal of Nova to provide on demand
|
||||
resources. We want to be able to give certain instances unlimited access to the
|
||||
port by assigning the PF to it, but revert back to using VFs when the PF is not
|
||||
being used, so as to ensure on-demand provisioning of available resources. This
|
||||
may not be possible with every SR-IOV card and their respective Linux drivers,
|
||||
in which case certain ports will need to be pre-provisioned as either PFs or
|
||||
VFs by administratior ahead of time.
|
||||
|
||||
This in turn means that Nova would have to keep track of which VFs belong to
|
||||
particular PFs and make sure that this is reflected in the way resources are
|
||||
tracked (so even a single VF being used means the related PF is unavailable and
|
||||
vice versa, if a PF is being used, all of it's VFs are marked as used).
|
||||
|
||||
PCI device management code in Nova currently filters out any
|
||||
device that is a physical function (this is currently hard-coded). In
|
||||
addition, modeling of PCI device resources in Nova currently assumes flat
|
||||
hierarchy and resource tracking logic does not understand the relationship
|
||||
between different PCI devices that can be exposed to Nova.
|
||||
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
Certain NFV workloads may need to have the full control of the physical device,
|
||||
in order to use some of the functionality not available to VFs, to bypass some
|
||||
limitations certian cards impose on VFs, or to exclusively use the full
|
||||
bandwidth of the port. However, due to the dynamic nature of the elastic cloud,
|
||||
and the promise of Nova to deliver resources on demand, we do not wish to have
|
||||
to pre-provision certain SR-IOV cards to be used as PFs as this defeats the
|
||||
promise of the infrastructure management tool that allows for quick
|
||||
re-purposing of resources that Nova brings.
|
||||
|
||||
Modern SR-IOV enabled cards along with their drivers usually allow for such
|
||||
reconfiguration to be done on the fly, so once the passthrough of the PF is no
|
||||
longer needed on a specific host (either the instance using it got moved or
|
||||
deleted), the PF is bound back to it's Linux driver, thus enabling the use of
|
||||
VFs provided that initialization steps (if any are needed) are done upon
|
||||
handing the device back. It is not possible to
|
||||
guarantee that this always works however, due to the vast range of equipment
|
||||
and drivers available on the market, so we want to make sure that there is a
|
||||
way to tell Nova that a card is in certain configuration and cannot be assumed
|
||||
to be reconfigurable.
|
||||
|
||||
Additional use cases (that will require further work) will be enabled by having
|
||||
the Nova data model usefully express the relationship between PF and its VFs.
|
||||
Some of them have been proposed as separate specs (see [1]_ and [2]_).
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Two problems we need to solve are:
|
||||
|
||||
1) How to enable requesting a full physical device. This means extending the
|
||||
InstancePCIRequest data model to be able to hold this information. Since
|
||||
the whitelist parsing logic that builds up the Spec objects probes the
|
||||
system and has the information about whether a device is a PF or not, it is
|
||||
enough to add a physical_function field to the PCI alias schema and the
|
||||
PCIRequest object.
|
||||
|
||||
2) Enable scheduling and resource tracking based on the request that can now
|
||||
be for the whole device. This means extending the data model for PCIDevices
|
||||
to hold information about relationship between physical and virtual
|
||||
functions (this relationship is already recorded but not in a suitable
|
||||
format), and also extending the
|
||||
way we expose the aggregate data about PCI devices to the resource tracker
|
||||
(a.k.a. the PCIDeviceStats class) to be able to present PFs and their
|
||||
counts, and to make sure to track the corresponding VFs that become
|
||||
unavailable once the PF is claimed/used.
|
||||
|
||||
In addition to the above, we will want to make sure that whitelist syntax can
|
||||
support passing throught PFs. This will require very few changes it turns out.
|
||||
Currently if a whitelist entry
|
||||
specifies an address or a devname of a PF, the matching code will make sure
|
||||
any of the VFs match. This behavior, combined with allowing a device that is a
|
||||
PF to be tracked by nova (by removing the hard-coded check that skips any PFs)
|
||||
should be sufficient to allow most of the flexibility administrators need.
|
||||
As it is not sufficient for a device to be whitelisted to be requestable by
|
||||
users (it needs to either have an alias that is specified on the flavor),
|
||||
simply defaulting to whitelisting PFs along with all of their VFs if a PF
|
||||
address is whitelisted gives us the flexibility we need, while keeping
|
||||
backwards compatibility.
|
||||
|
||||
As is the case with the current implementation, there is some initial
|
||||
configuration that will be needed on hosts that have PCI devices that can be
|
||||
passed through. In addition to the standard setup needed to enable SR-IOV and
|
||||
configure the cards, and
|
||||
whitelist configuration setup that Nova requires, administrators may also need
|
||||
to add an automated way (such as udev rules) to re-enable VFs, since
|
||||
depending on the driver and the card used, any existing
|
||||
configuration may be lost once a VM is given full control of the port, and the
|
||||
device is unbound from the host driver.
|
||||
|
||||
In order for PFs to work as Neutron ports, some additional work that is outside
|
||||
of scope of this blueprint will be needed. We aim to make internal Nova changes
|
||||
that are needed the focus here and defer on the integration work to a future
|
||||
(possibly cross-project) blueprint. For the libvirt driver, this means that,
|
||||
since there will be no Neutron support
|
||||
at first, the only way to assign such a device would be using the <hostdev>
|
||||
element, and no support for <interface> is in scope for this blueprint.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
There are no real alternatives that cover all of the use cases. An alternative
|
||||
that would cover only the requirement for bandwidth would be to allow for
|
||||
reserving of all VFs of a single PF by a single instance while using only a
|
||||
single VF, effectively reserving the bandwidth. In addition to not being a
|
||||
solution for all the applications, it also does not reduce the complexity of
|
||||
the change much as the relationship between VFs still needs to be modeled in
|
||||
Nova.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
Even though there is a way currently to figure out the PF a single VF belongs
|
||||
to (through the use of `extra_info` free-form field) it may be necessary to add
|
||||
a more "query friendly" relationship, that will allow us to answer the question
|
||||
"given a PCI device record that is a PF, which VF records does it contain".
|
||||
|
||||
It is likely to be implemented as a foreign key relationship to the same table,
|
||||
and objects support will be added, but the actual implementation discussion is
|
||||
better suited for the actual code proposal review.
|
||||
|
||||
It will also be necessary to be able to know relations between individual PFs
|
||||
and VFs in the aggregate view of the PCI device data used in scheduling, so
|
||||
changes to the way PciDeviceStats holds aggregate
|
||||
data. This will also result in changes to the filtering/cliaming logic, the
|
||||
extent of which may impact decisions about the data model so this is
|
||||
best discussed on actual implementation changes.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
There are no API changes required. PCI devices are requested through flavor
|
||||
extra-specs by specifying an alias of a device specification. Currently,
|
||||
device specifications and their aliases are part of the Nova deployment
|
||||
configuration, and thus are deployment specific.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None - non-admin users will continue to use only things exposed to them via
|
||||
flavor extra-specs, which they cannot modify in any way.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Scheduling of instances requiring PCI passthrough devices will be doing more
|
||||
work and on a bit more data than currently in the case of PF requests. It is
|
||||
unlikely that this will have any noticeable performance impact however.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
PCI alias syntax for enabling the PCI devices will become more feature-full, in
|
||||
order to account for specifically requesting a PF.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Nikola Đipanov <ndipanov@redhat.com>
|
||||
|
||||
Other contributors:
|
||||
Vladik Romanovsky <vromanso@redhat.com>
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Re-work the DB models and corresponding objects to have explicit relationship
|
||||
between the PF entry and it's corresponding VFs. Update the claiming
|
||||
logic inside the PCI manager class so that claiming/assigning the PF claims
|
||||
all of it's VFs and vice versa.
|
||||
|
||||
* Change the PCIDeviceStats class to expose PFs in it's pools, and change the
|
||||
claiming/consuming logic to claim appropriate amounts of VFs when a PF is
|
||||
consumed or claimed. Once this work item is complete, all of the scheduling
|
||||
and resource tracking logic will be aware of the PF constraint.
|
||||
|
||||
* Add support for specifying the PF requirement through the pci_alias
|
||||
configuration options, so that it can be requested through flavor
|
||||
extra-specs.
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Changes proposed here only extend existing functionality, so they will require
|
||||
updating the current test suite to make sure new functionality is covered.
|
||||
It is expected that the tests currently in place are to prevent any regression
|
||||
to the existing functionality. No new test suites are required to be added for
|
||||
this functionality, only new test cases.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Documentation for the PCI passthrough features in Nova will need to be updated
|
||||
to reflect the above changes - that is to say - no impact out of the ordinary.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] https://review.openstack.org/#/c/182242/
|
||||
.. [2] https://review.openstack.org/#/c/142094/
|
||||
.. [3] https://blueprints.launchpad.net/nova/+spec/pci-passthrough-whitelist-regex
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
Optional section for Mitaka intended to be used each time the spec
|
||||
is updated to describe new design, API or any database schema
|
||||
updated. Useful to let reader understand what's happened along the
|
||||
time.
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Mitaka
|
||||
- Introduced
|
||||
428
specs/mitaka/implemented/user-settable-server-description.rst
Normal file
428
specs/mitaka/implemented/user-settable-server-description.rst
Normal file
@@ -0,0 +1,428 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=====================================================
|
||||
Allow user to set and retrieve the server Description
|
||||
=====================================================
|
||||
|
||||
The launchpad blueprint is located at:
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/user-settable-server-description
|
||||
|
||||
Allow users to set the description of a server when it is created, rebuilt,
|
||||
or updated. Allow users to get the server description.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Currently, when a server is created, the description is hardcoded to be the
|
||||
server display name. The description cannot be set on a server rebuild.
|
||||
|
||||
Users cannot set the description on the server or retrieve the description.
|
||||
Currently, they need to use other fields, such as the server name or meta-data,
|
||||
to provide a description. These are overloading the name and meta-data
|
||||
fields in a way for which they were not designed. A better way to provide
|
||||
a long human-readable description is to use a separate field. The description
|
||||
can be easily viewed in a server list display.
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
* The End User wishes to provide a description when creating a server.
|
||||
* The End User wishes to provide a description when rebuilding a server.
|
||||
If the user chooses to change the name, a new description may be needed
|
||||
to match the new name.
|
||||
* The End User wishes to get the server's description.
|
||||
* The End User wishes to change the server's description.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
* Nova REST API
|
||||
|
||||
* Add an optional description parameter to the Create Server, Rebuild Server,
|
||||
and Update Server APIs.
|
||||
|
||||
* No default description on Create Server (set to NULL in the database).
|
||||
* If a null description string is specified on the server update or
|
||||
rebuild, then the description is set to NULL in the database
|
||||
(description is removed)
|
||||
* If the description parameter is not specified on the server update
|
||||
or rebuild, then the description is not changed.
|
||||
* An empty description string is allowed.
|
||||
|
||||
* The Get Server Details API returns the description in the JSON response.
|
||||
This can be NULL.
|
||||
* The List Details for Servers API returns the description for each server.
|
||||
A description can be NULL.
|
||||
|
||||
* Nova V2 client
|
||||
|
||||
* Add an optional description parameter to the server create method.
|
||||
* Add an optional description parameter to the server rebuild method.
|
||||
* Add new methods for server set and clear the description. These will
|
||||
implement a new CLI command "nova describe" with the following
|
||||
positional parameters:
|
||||
|
||||
* server
|
||||
* description (Pass in "" to remove the description)
|
||||
|
||||
* Return the description on server show method. This can be null.
|
||||
* If detail is requested, return the description on each server
|
||||
returned by the server list method. A description can be null.
|
||||
|
||||
* Openstack V2.1 compute client
|
||||
|
||||
* NOTE: Changes to the Openstack V2 compute client will be
|
||||
implemented under a bug report, and not under this spec.
|
||||
* Add an optional description parameter to CreateServer.
|
||||
* Add an optional description parameter to RebuildServer.
|
||||
* Add an optional description parameter to SetServer and
|
||||
UnsetServer.
|
||||
* Return the description on ShowServer. This can be null.
|
||||
* If detail is requested, return the description on each server
|
||||
returned by the ListServer. A description can be null.
|
||||
|
||||
Note: A description field already exists in the database, so the change is
|
||||
to add API/CLI support for setting and getting the description.
|
||||
|
||||
Other projects possibly impacted:
|
||||
|
||||
* Horizon could be changed to set and show the server description.
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
None
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None. The database column for description already exists as 255 characters,
|
||||
and is nullable.
|
||||
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
Add the following parameter validation:
|
||||
|
||||
::
|
||||
|
||||
valid_description_regex_base = '[%s]*'
|
||||
valid_description_regex = valid_description_regex_base % (
|
||||
re.escape(_get_printable()))
|
||||
|
||||
description = {
|
||||
'type': ['string', 'null'], 'minLength': 0, 'maxLength': 255,
|
||||
'pattern': valid_description_regex,
|
||||
}
|
||||
|
||||
|
||||
Change the following APIs under a new microversion:
|
||||
|
||||
`Create Server <http://developer.openstack.org/api-ref-compute-v2.1.html#createServer>`_
|
||||
........................................................................................
|
||||
|
||||
New request parameter:
|
||||
|
||||
+---------------------+------+-------------+-----------------------+
|
||||
|Parameter |Style |Type | Description |
|
||||
+=====================+======+=============+=======================+
|
||||
|description(optional)|plain | csapi:string|The server description |
|
||||
+---------------------+------+-------------+-----------------------+
|
||||
|
||||
Add the description to the json request schema definition:
|
||||
|
||||
::
|
||||
|
||||
base_create = {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'server': {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'name': parameter_types.hostname,
|
||||
'description': parameter_types.description,
|
||||
'imageRef': parameter_types.image_ref,
|
||||
'flavorRef': parameter_types.flavor_ref,
|
||||
'adminPass': parameter_types.admin_password,
|
||||
'metadata': parameter_types.metadata,
|
||||
'networks': {
|
||||
'type': 'array',
|
||||
'items': {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'fixed_ip': parameter_types.ip_address,
|
||||
'port': {
|
||||
'type': ['string', 'null'],
|
||||
'format': 'uuid'
|
||||
},
|
||||
'uuid': {'type': 'string'},
|
||||
},
|
||||
'additionalProperties': False,
|
||||
}
|
||||
}
|
||||
},
|
||||
'required': ['name', 'flavorRef'],
|
||||
'additionalProperties': False,
|
||||
},
|
||||
},
|
||||
'required': ['server'],
|
||||
'additionalProperties': False,
|
||||
}
|
||||
|
||||
Error http response codes:
|
||||
|
||||
* 400 (BadRequest) if the description is invalid unicode,
|
||||
or longer than 255 characters.
|
||||
|
||||
|
||||
`Rebuild Server <http://developer.openstack.org/api-ref-compute-v2.1.html#rebuildServer>`_
|
||||
..........................................................................................
|
||||
|
||||
New request parameter:
|
||||
|
||||
+---------------------+------+-------------+-----------------------+
|
||||
|Parameter |Style |Type | Description |
|
||||
+=====================+======+=============+=======================+
|
||||
|description(optional)|plain | csapi:string|The server description |
|
||||
+---------------------+------+-------------+-----------------------+
|
||||
|
||||
Add the description to the json request schema definition:
|
||||
|
||||
::
|
||||
|
||||
base_rebuild = {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'rebuild': {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'name': parameter_types.name,
|
||||
'description': parameter_types.description,
|
||||
'imageRef': parameter_types.image_ref,
|
||||
'adminPass': parameter_types.admin_password,
|
||||
'metadata': parameter_types.metadata,
|
||||
'preserve_ephemeral': parameter_types.boolean,
|
||||
},
|
||||
'required': ['imageRef'],
|
||||
'additionalProperties': False,
|
||||
},
|
||||
},
|
||||
'required': ['rebuild'],
|
||||
'additionalProperties': False,
|
||||
}
|
||||
|
||||
|
||||
Error http response codes:
|
||||
|
||||
* 400 (BadRequest) if the description is invalid unicode,
|
||||
or longer than 255 characters.
|
||||
|
||||
|
||||
`Update Server <http://developer.openstack.org/api-ref-compute-v2.1.html#updateServer>`_
|
||||
........................................................................................
|
||||
|
||||
New request parameter:
|
||||
|
||||
+---------------------+------+----------------------+-----------------------+
|
||||
|Parameter |Style |Type | Description |
|
||||
+=====================+======+======================+=======================+
|
||||
|description(optional)|plain |csapi:ServerForUpdate |The server description |
|
||||
+---------------------+------+----------------------+-----------------------+
|
||||
|
||||
Add the description to the json request schema definition:
|
||||
|
||||
::
|
||||
|
||||
base_update = {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'server': {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'name': parameter_types.name,
|
||||
'description': parameter_types.description,
|
||||
},
|
||||
|
||||
Response:
|
||||
|
||||
* The update API currently returns the details of the updated server. As part
|
||||
of this, the description will now be returned in the json response.
|
||||
|
||||
|
||||
Error http response codes:
|
||||
|
||||
* 400 (BadRequest) if the description is invalid unicode,
|
||||
or longer than 255 characters.
|
||||
|
||||
|
||||
`Get Server Details <http://developer.openstack.org/api-ref-compute-v2.1.html#getServer>`_
|
||||
..........................................................................................
|
||||
Add the description to the JSON response schema definition.
|
||||
|
||||
::
|
||||
|
||||
server = {
|
||||
"server": {
|
||||
"id": instance["uuid"],
|
||||
"name": instance["display_name"],
|
||||
"description": instance["display_description"],
|
||||
"status": self._get_vm_status(instance),
|
||||
"tenant_id": instance.get("project_id") or "",
|
||||
"user_id": instance.get("user_id") or "",
|
||||
"metadata": self._get_metadata(instance),
|
||||
"hostId": self._get_host_id(instance) or "",
|
||||
"image": self._get_image(request, instance),
|
||||
"flavor": self._get_flavor(request, instance),
|
||||
"created": timeutils.isotime(instance["created_at"]),
|
||||
"updated": timeutils.isotime(instance["updated_at"]),
|
||||
"addresses": self._get_addresses(request, instance),
|
||||
"accessIPv4": str(ip_v4) if ip_v4 is not None else '',
|
||||
"accessIPv6": str(ip_v6) if ip_v6 is not None else '',
|
||||
"links": self._get_links(request,
|
||||
instance["uuid"],
|
||||
self._collection_name),
|
||||
},
|
||||
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
The notification changes for this spec will be included as
|
||||
part of the implementation of the Versioned Notification API spec:
|
||||
https://review.openstack.org/#/c/224755/
|
||||
|
||||
* The new versioned notification on instance update will include
|
||||
the description.
|
||||
* The new versioned notification on instance create will include
|
||||
the description.
|
||||
* The new versioned notification on instance rebuild will include
|
||||
the description.
|
||||
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
Changes to python-novaclient and python-openstackclient as described above.
|
||||
|
||||
Horizon can add the description to the GUI.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
chuckcarmack75
|
||||
|
||||
Other contributors:
|
||||
none
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
1) Implement the nova API changes.
|
||||
2) Implement the novaclient and openstackclient changes.
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
* Nova functional tests
|
||||
|
||||
* Add a description to the tests that use the API to create a server.
|
||||
|
||||
* Check that the default description is NULL.
|
||||
|
||||
* Add a description to the tests that use the API to rebuild a server.
|
||||
|
||||
* Check that the description can be changed or removed.
|
||||
* Check that the description is unchanged if not specified on the API.
|
||||
|
||||
* Add a description to the tests that use the API to update a server.
|
||||
|
||||
* Check that the description can be changed or removed.
|
||||
* Check that the description is unchanged if not specified on the API.
|
||||
|
||||
* Check that the description is returned as part of server details for
|
||||
an individual server or a server list.
|
||||
|
||||
* Python nova-client and openstack-client. For the client tests and
|
||||
the CLI tests:
|
||||
|
||||
* Add a description to the tests that create a server.
|
||||
* Add a description to the tests that rebuild a server.
|
||||
* Set and remove the description on an existing server.
|
||||
* Check that the description is returned as part of server details for
|
||||
an individual server or a server list.
|
||||
|
||||
* Error cases:
|
||||
|
||||
* The description passed to the API is longer than 255 characters.
|
||||
* The description passed to the API is not valid printable unicode.
|
||||
|
||||
* Edge cases:
|
||||
|
||||
* The description passed to the API is an empty string. This is allowed.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Documentation updates to:
|
||||
|
||||
* API spec: http://developer.openstack.org/api-ref-compute-v2.1.html
|
||||
including the API samples.
|
||||
* Client: novaclient and openstackclient
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
The request for this feature first surfaced in the ML:
|
||||
|
||||
http://lists.openstack.org/pipermail/openstack-dev/2015-August/073052.html
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Mitaka
|
||||
- Introduced
|
||||
* - Mitaka
|
||||
- Re-submitted to add support for description on Rebuild.
|
||||
|
||||
781
specs/mitaka/implemented/versioned-notification-api.rst
Normal file
781
specs/mitaka/implemented/versioned-notification-api.rst
Normal file
@@ -0,0 +1,781 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================
|
||||
Versioned notification API
|
||||
==========================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/versioned-notification-api
|
||||
|
||||
The notification interface of nova is not well defined and the current
|
||||
notifications define a very inconsistent interface. There is no easy
|
||||
way to see from the notification consumer point of view what is the format
|
||||
and the content of the notification nova sends.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
This is the generic notification envelope format supported by oslo.messaging
|
||||
[1]::
|
||||
|
||||
{
|
||||
"priority": "INFO",
|
||||
"event_type": "compute.instance.update",
|
||||
"timestamp": "2015-09-02 09:13:31.895554",
|
||||
"publisher_id": "api.controller",
|
||||
"message_id": "06d9290b-b9b0-4bd5-9e76-ddf8968a70b4",
|
||||
"payload": {}
|
||||
}
|
||||
|
||||
The problematic fields are:
|
||||
|
||||
* priority
|
||||
* event_type
|
||||
* publisher_id
|
||||
* payload
|
||||
|
||||
|
||||
priority: Nova uses info and error priorities in the current code base except
|
||||
in case of the nova.notification.notify_decorator code where the priority is
|
||||
configurable with the notification_level configuration parameter. However this
|
||||
decorator is only used in the monkey_patch_modules configuration default value.
|
||||
|
||||
|
||||
event_type: oslo allows a raw string to be sent as event_type, nova uses the
|
||||
following event_type formats today:
|
||||
|
||||
* <service>.<object>.<action>.<phase> example: compute.instance.create.end
|
||||
* <object>.<action>.<phase> example: aggregate.removehost.end
|
||||
* <object>.<action> example: servergroup.create
|
||||
* <service>.<action>.<phase> example: scheduler.select_destinations.end
|
||||
* <action> example: snapshot_instance
|
||||
* <module?>.<action> example: compute_task.build_instances
|
||||
|
||||
|
||||
publisher_id: nova uses the following publisher_id formats today:
|
||||
|
||||
* <service>.controller examples: api.controller, compute.controller
|
||||
* <object>.controller example: servergroup.controller
|
||||
* <object>.<object_id> example: aggregate.<aggregate.name> and
|
||||
aggregate.<aggregate_id>. See: [2].
|
||||
|
||||
It seems that the content of publisher_id and event_type overlaps in some
|
||||
cases.
|
||||
|
||||
payload: nova does not have any restriction on the payload field which
|
||||
leads to very many different formats. Sometimes it is a view of an existing
|
||||
nova versioned object e.g. in case of compute.instance.update notification
|
||||
nova dumps the fields of the instance object into the notification after some
|
||||
filtering. In other case nova dumps the exception object or dumps the args and
|
||||
kwargs of a function into the payload. This complex payload format seems to be
|
||||
the biggest problem for notification consumers.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
As a tool developer I want to consume nova notifications to implement my
|
||||
requirements. I want to know what is the format of the notifications and I want
|
||||
to have some way to detect and follow up the changes in the notification format
|
||||
later on.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
This spec is created to agree on the format, content and meaning of the fields
|
||||
in notification sent by nova and to propose way to change the existing
|
||||
notifications to the new format while giving time to the notification
|
||||
consumers to adapt to the change. Also it tries to give a technical solution to
|
||||
keep the notification payload more stable and versioned.
|
||||
|
||||
Current notifications are un-versioned. This spec proposes to transform the
|
||||
un-versioned notification to versioned notifications while keeping the
|
||||
possibility to emit un-versioned notifications for limited time to help the
|
||||
transition for the notification consumers.
|
||||
|
||||
Versioned notifications will have a well defined format which is documented and
|
||||
notification samples will be provided similarly to nova api samples.
|
||||
New versions of a versioned notification will be kept backward compatible.
|
||||
|
||||
To model and version the new notifications nova will use the oslo
|
||||
versionedobject module. To emit such notification nova will continue to use
|
||||
the notifier interface of oslo.messaging module. To convert the notification
|
||||
model to the format that can be fed into the notifier interface nova will use
|
||||
the existing NovaObjectSerializer.
|
||||
|
||||
A single versioned notification will be modeled with a single oslo versioned
|
||||
object but that object can use other new or existing versioned object as
|
||||
payload field.
|
||||
|
||||
However some of the today's notifications cannot be really converted to
|
||||
versioned notifications. For example the notify_decorator dumps the args and
|
||||
kwargs of any function into the notification payload therefore we cannot create
|
||||
a single versioned model for every possible payload it generates. For these
|
||||
notifications a generic, semi-managed, dict based payload can be defined
|
||||
that formulates as much as possible and leaves the rest of the payload
|
||||
un-managed. Adding new semi-managed notifications shall be avoided in the
|
||||
future.
|
||||
|
||||
We want to keep the notification envelope format defined by the notifier
|
||||
interface in oslo.messaging, therefore versioned notifications will have the
|
||||
same envelope on the wire as the un-versioned notifications.
|
||||
Which is the following::
|
||||
|
||||
{
|
||||
"priority": "INFO",
|
||||
"event_type": "compute.instance.update",
|
||||
"timestamp": "2015-09-02 09:13:31.895554",
|
||||
"publisher_id": "api.controller",
|
||||
"message_id": "06d9290b-b9b0-4bd5-9e76-ddf8968a70b4",
|
||||
"payload": {}
|
||||
}
|
||||
|
||||
The main difference between the wire format of the versioned and un-versioned
|
||||
notification is the format of the payload field. The versioned notification
|
||||
wire format will use the serialized format of a versioned object as payload.
|
||||
|
||||
The versioned notification model will define versioned object fields for every
|
||||
fields oslo.messaging notifier interface needs (priority, event_type,
|
||||
publisher_id, payload) so that a single notification can be fully modeled in
|
||||
nova code. However only the payload field will use the default versioned object
|
||||
serialization. The other fields in the envelope will be filled with strings as
|
||||
in the example above.
|
||||
|
||||
The value of the event_type field of the envelope on the wire will be defined
|
||||
by the name of the affected object, the name of the performed action emitting
|
||||
the notification and the phase of the action. For example: instance.create.end,
|
||||
aggregate.removehost.start, filterscheduler.select_destinations.end.
|
||||
The notification model will do basic validation on the content of the
|
||||
event_type e.g. enum for valid phases will be created.
|
||||
|
||||
The value of the the priority field of the envelope on the wire can be selected
|
||||
from the predefined priorities in oslo.messaging (audit, debug, info, warn,
|
||||
error, critical, sample) except 'warning' (use warn instead).
|
||||
The notification model will do validation of the priority by providing an enum
|
||||
with the valid priorities.
|
||||
|
||||
For concrete examples see the Data model impact section.
|
||||
|
||||
Backward compatibility
|
||||
----------------------
|
||||
|
||||
The new notification model can be used to emit the current un-versioned
|
||||
notification as well to provide backward compatibility while the un-versioned
|
||||
notification will be deprecated. Nova might want to restrict adding new
|
||||
un-versioned notification after this spec is implemented.
|
||||
|
||||
A new version of a versioned notification has to be backward compatible with
|
||||
the previous version. Nova will always emit the latest version of a versioned
|
||||
notification and nova will not support pinning back the notification versions.
|
||||
|
||||
Backward compatibility for pre Mitaka notification consumers will be ensured
|
||||
by emitting both the verisoned and the un-versioned notification format on the
|
||||
wire on separate topics. The new notification model will provide
|
||||
a way to emit both old and new wire format from a same notification object.
|
||||
A configuration option will be provided to specify which version of the
|
||||
notifications shall be emitted but asking for the old format only will be
|
||||
deprecated from the beginning. Emitting the un-versioned wire format of a
|
||||
versioned notification will be deprecated along with a proper deprecation
|
||||
message in Mitaka and will be removed in N release.
|
||||
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Version the whole wire format instead of only the payload:
|
||||
|
||||
There seems to be two main alternatives how to generate the actual notification
|
||||
message on the wire from the KeyPairNotification object defined in the Data
|
||||
model impact section.
|
||||
|
||||
Use the current envelope structure defined by the notifier in oslo.messaging
|
||||
[1] and use the versioning of the payload on the wire as proposed in the
|
||||
Data model impact section.
|
||||
|
||||
Pros:
|
||||
|
||||
* No oslo.messaging change is required.
|
||||
* Consumers only need to change the payload parsing code.
|
||||
* Notification envelope in the whole OpenStack ecosystem are the same.
|
||||
|
||||
Cons:
|
||||
|
||||
* The envelope on the wire is not versioned just the payload field of
|
||||
it. However the envelope structure is generic and well defined by
|
||||
oslo.messaging.
|
||||
|
||||
Or alternatively create a new envelope structure in oslo.messaging that already
|
||||
a versioned object and use the serialized form of that object on the wire.
|
||||
If we change oslo.messaging to provide an interface where an object inheriting
|
||||
from NotificationBase object can be passed in and oslo.messaging uses the
|
||||
serialized from of that object as the message directly then KeyPair
|
||||
notification message on the wire would look like the following::
|
||||
|
||||
{
|
||||
"nova_object.version":"1.0",
|
||||
"nova_object.name":"KeyPairNotification",
|
||||
"nova_object.data":{
|
||||
"priority":"info",
|
||||
"publisher":{
|
||||
"nova_object.version":"1.19",
|
||||
"nova_object.name":"Service",
|
||||
"nova_object.data":{
|
||||
"host":"controller",
|
||||
"binary":"api"
|
||||
... # a lot of other fields from the Service object here
|
||||
},
|
||||
"nova_object.namespace":"nova"
|
||||
},
|
||||
"payload":{
|
||||
"nova_object.version":"1.3",
|
||||
"nova_object.name":"KeyPair",
|
||||
"nova_object.namespace":"nova",
|
||||
"nova_object.data":{
|
||||
"id": 1,
|
||||
"user_id":"21a75a650d6d4fb28858579849a72492",
|
||||
"fingerprint": "e9:49:b2:ca:56:8c:25:77:ea:0d:d9:7c:89..."
|
||||
"public_key": "ssh-rsa AAAAB3NzaC1yc2EAA...",
|
||||
"type": "ssh",
|
||||
"name": "mykey5"
|
||||
}
|
||||
},
|
||||
"event_type":{
|
||||
"nova_object.version":"1.0",
|
||||
"nova_object.name":"EventType",
|
||||
"nova_object.data":{
|
||||
"action":"create",
|
||||
"phase":"start",
|
||||
"object":"keypair"
|
||||
},
|
||||
"nova_object.namespace":"nova"
|
||||
}
|
||||
},
|
||||
"nova_object.namespace":"nova"
|
||||
}
|
||||
|
||||
In this case the NotificationBase classes shall be provided by the
|
||||
oslo.messaging.
|
||||
|
||||
Pros:
|
||||
|
||||
* The whole message on the wire are versioned.
|
||||
|
||||
Cons:
|
||||
|
||||
* Needs extensive changes in oslo.messaging in the notification interface code
|
||||
as well as in the notification drivers as today notification drivers depend
|
||||
on the current envelope structure.
|
||||
* It would create a circular dependency between oslo.messaging and
|
||||
oslo.versionedobject
|
||||
* Consumers need to adapt to the top level structure change as well.
|
||||
|
||||
Use a single global notification version:
|
||||
|
||||
The proposal is to use separate version number per notification. Alternatively
|
||||
a single global notification version number can be defined that is bumped every
|
||||
time when a single notification has been changed.
|
||||
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
The following base objects will be defined:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
class NotificationPriorityType(Enum):
|
||||
AUDIT = 'audit'
|
||||
CRITICAL = 'critical'
|
||||
DEBUG = 'debug'
|
||||
INFO = 'info'
|
||||
ERROR = 'error'
|
||||
SAMPLE = 'sample'
|
||||
WARN = 'warn'
|
||||
|
||||
ALL = (AUDIT, CRITICAL, DEBUG, INFO, ERROR, SAMPLE, WARN)
|
||||
|
||||
def __init__(self):
|
||||
super(NotificationPriorityType, self).__init__(
|
||||
valid_values=NotificationPriorityType.ALL)
|
||||
|
||||
|
||||
class NotificationPriorityTypeField(BaseEnumField):
|
||||
AUTO_TYPE = NotificationPriorityType()
|
||||
|
||||
|
||||
@base.NovaObjectRegistry.register
|
||||
class EventType(base.NovaObject):
|
||||
# Version 1.0: Initial version
|
||||
VERSION = '1.0'
|
||||
|
||||
fields = {
|
||||
'object': fields.StringField(),
|
||||
'action': fields.EventTypeActionField(), # will be an enum
|
||||
'phase': fields.EventTypePhaseField(), # will be an enum
|
||||
}
|
||||
|
||||
|
||||
@base.NovaObjectRegistry.register
|
||||
class NotificationBase(base.NovaObject):
|
||||
|
||||
fields = {
|
||||
'priority': fields.NotificationPriorityTypeField(),
|
||||
'event_type': fields.ObjectField('EventType'),
|
||||
'publisher': fields.ObjectField('Service'),
|
||||
}
|
||||
|
||||
def emit(self, context):
|
||||
"""Send the notification. """
|
||||
|
||||
def emit_legacy(self, context):
|
||||
"""Send the legacy format of the notification. """
|
||||
|
||||
Note that the publisher field of the NotificationBase will be used to fill the
|
||||
publisher_id field of the envelope in the wire format by extracting the name of
|
||||
the service and the host the service runs on from the Service object.
|
||||
|
||||
Then here is a concrete example that uses the base object:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@base.NovaObjectRegistry.register
|
||||
class KeyPairNotification(notification.NotificationBase):
|
||||
# Version 1.0: Initial version
|
||||
VERSION = '1.0'
|
||||
fields = {
|
||||
'payload': fields.ObjectField('KeyPair')
|
||||
}
|
||||
|
||||
Where the referred KeyPair object is an already existing versioned object in
|
||||
nova. Then the current keypair notification sending code can be written like:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def _notify(self, context, keypair):
|
||||
event_type = notification.EventType(
|
||||
object='keypair',
|
||||
action=obj_fields.EventTypeActionField.CREATE,
|
||||
phase=obj_fields.EventTypePhaseField.START)
|
||||
publisher = utils.get_current_service()
|
||||
keypair_obj.KeyPairNotification(
|
||||
priority=obj_fields.NotificationPriorityType.INFO,
|
||||
event_type=event_type,
|
||||
publisher=publisher,
|
||||
payload=keypair).emit(context)
|
||||
|
||||
|
||||
|
||||
When defining the payload model for a versioned notification we will try to
|
||||
reuse the existing nova versioned objects like in case of the KeyPair example
|
||||
above. If that is not possible a new versioned object for the payload will be
|
||||
created.
|
||||
|
||||
The wire format of the above KeyPair notification will look like the
|
||||
followings::
|
||||
|
||||
{
|
||||
"priority":"INFO",
|
||||
"event_type":"keypair.create.start",
|
||||
"timestamp":"2015-10-08 11:30:09.988504",
|
||||
"publisher_id":"api:controller",
|
||||
"payload":{
|
||||
"nova_object.version":"1.3",
|
||||
"nova_object.name":"KeyPair",
|
||||
"nova_object.namespace":"nova",
|
||||
"nova_object.data":{
|
||||
"id": 1,
|
||||
"user_id":"21a75a650d6d4fb28858579849a72492",
|
||||
"fingerprint": "e9:49:b2:ca:56:8c:25:77:ea:0d:d9:7c:89:35:36"
|
||||
"public_key": "ssh-rsa AAAAB3NzaC1yc2EAA...",
|
||||
"type": "ssh",
|
||||
"name": "mykey5"
|
||||
}
|
||||
},
|
||||
"message_id":"98f1221f-ded0-4153-b92d-3d67219353ee"
|
||||
}
|
||||
|
||||
For an alternative wire format see the Alternatives section.
|
||||
|
||||
Semi managed notification example
|
||||
---------------------------------
|
||||
|
||||
The nova.exceptions.wrap_exception decorator is used to send notification in
|
||||
case an exception happens during the decorated function. Today this
|
||||
notification has the following structure::
|
||||
|
||||
{
|
||||
event_type: <the named of the decorated function>,
|
||||
publisher_id: <needs to be provided to the decorator via the notifier>,
|
||||
payload: {
|
||||
exception: <the exception object>
|
||||
args: <dict of the call args of the decorated function as gathered
|
||||
by nova.safe_utils.getcallargs expect the ones that has
|
||||
'_pass' in their names>
|
||||
}
|
||||
timestamp: ...
|
||||
message_id: ...
|
||||
}
|
||||
|
||||
|
||||
We can define a following semi managed notification object for it::
|
||||
|
||||
@base.NovaObjectRegistry.register
|
||||
class Exception(base.NovaObject):
|
||||
# Version 1.0: Initial version
|
||||
VERSION = '1.0'
|
||||
fields = {
|
||||
'message': fields.StringField(),
|
||||
'code': fields.IntegerField(),
|
||||
}
|
||||
|
||||
|
||||
@base.NovaObjectRegistry.register
|
||||
class ExceptionPayload(base.NovaObject):
|
||||
# Version 1.0: Initial version
|
||||
VERSION = '1.0'
|
||||
fields = {
|
||||
'exception': fields.ObjectField('Exception'),
|
||||
'args': fields.ArgDictField(),
|
||||
}
|
||||
|
||||
|
||||
@base.NovaObjectRegistry.register
|
||||
class ExceptionNotification(notification.NotificationBase):
|
||||
# Version 1.0: Initial version
|
||||
VERSION = '1.0'
|
||||
fields = {
|
||||
'payload': fields.ObjectField('ExceptionPayload')
|
||||
}
|
||||
|
||||
Where the ArgDictField takes any python object, it uses object serialisation
|
||||
when available, otherwise, a primitive->json conversion,
|
||||
but if that fails, it just stringifies the object.
|
||||
This field does not have a well defined wire format so this part of the
|
||||
notification will not be really versioned, hence the semi versioned name.
|
||||
|
||||
|
||||
send_api_fault notification example
|
||||
-----------------------------------
|
||||
The nova.notifications.send_api_fault function is used to send notification in
|
||||
case of api faults. The current format of the notification is the following::
|
||||
|
||||
{
|
||||
event_type: "api.fault",
|
||||
publisher_id: "api.myhost",
|
||||
payload: {
|
||||
"url": <the request url>,
|
||||
"exception": <the stringified exception object>,
|
||||
"status": <http status code>
|
||||
}
|
||||
timestamp: ...
|
||||
message_id: ...
|
||||
}
|
||||
|
||||
We can define the following managed notification object for it::
|
||||
|
||||
@base.NovaObjectRegistry.register
|
||||
class ApiFaultPayload(base.NovaObject):
|
||||
# Version 1.0: Initial version
|
||||
VERSION = '1.0'
|
||||
fields = {
|
||||
'url': fields.UrlField(),
|
||||
'exception': fields.ObjectField('Exception'),
|
||||
'status': fields.IntegerField(),
|
||||
}
|
||||
|
||||
|
||||
@base.NovaObjectRegistry.register
|
||||
class ApiFaultNotification(notification.NotificationBase):
|
||||
# Version 1.0: Initial version
|
||||
VERSION = '1.0'
|
||||
fields = {
|
||||
'payload': fields.ObjectField('ApiFaultPayload')
|
||||
}
|
||||
|
||||
instance update notification example
|
||||
------------------------------------
|
||||
The nova.notifications.send_update function is used today to send notification
|
||||
about the change of the instance. Here is an example of the current
|
||||
notification format::
|
||||
|
||||
{
|
||||
"priority":"INFO",
|
||||
"event_type":"compute.instance.update",
|
||||
"timestamp":"2015-10-12 14:33:45.704324",
|
||||
"publisher_id":"api.controller",
|
||||
"payload":{
|
||||
"instance_id":"0ab36db7-0770-47de-b34d-45adb17248e7",
|
||||
"user_id":"21a75a650d6d4fb28858579849a72492",
|
||||
"tenant_id":"8cd4a105ae504184ade871e23a2c6d07",
|
||||
"reservation_id":"r-epzg3dq2",
|
||||
"display_name":"vm1",
|
||||
"hostname":"vm1",
|
||||
"host":null,
|
||||
"node":null,
|
||||
"architecture":null,
|
||||
"os_type":null,
|
||||
"cell_name":"",
|
||||
"availability_zone":null,
|
||||
|
||||
"instance_flavor_id":"42"
|
||||
"instance_type_id":6,
|
||||
"instance_type":"m1.nano",
|
||||
"memory_mb":64,
|
||||
"vcpus":1,
|
||||
"root_gb":0,
|
||||
"disk_gb":0,
|
||||
"ephemeral_gb":0,
|
||||
|
||||
"image_ref_url":"http://192.168.200.200:9292/images/34d9b758-e9c8-4162-ba15-78e6ce05a350",
|
||||
"kernel_id":"7fc91b81-2ff1-4bd2-b79b-ec218463253a",
|
||||
"ramdisk_id":"25f19ee8-a350-4d8c-bb53-12d0f834d52f",
|
||||
"image_meta":{
|
||||
"kernel_id":"7fc91b81-2ff1-4bd2-b79b-ec218463253a",
|
||||
"container_format":"ami",
|
||||
"min_ram":"0",
|
||||
"ramdisk_id":"25f19ee8-a350-4d8c-bb53-12d0f834d52f",
|
||||
"disk_format":"ami",
|
||||
"min_disk":"0",
|
||||
"base_image_ref":"34d9b758-e9c8-4162-ba15-78e6ce05a350"
|
||||
},
|
||||
|
||||
"created_at":"2015-10-12 14:33:45.662955+00:00",
|
||||
"launched_at":"",
|
||||
"terminated_at":"",
|
||||
"deleted_at":"",
|
||||
"new_task_state":"scheduling",
|
||||
"state":"building",
|
||||
"state_description":"scheduling",
|
||||
"old_state":"building",
|
||||
"old_task_state":"scheduling",
|
||||
"progress":"",
|
||||
|
||||
"audit_period_beginning":"2015-10-12T14:00:00.000000",
|
||||
"audit_period_ending":"2015-10-12T14:33:45.699612",
|
||||
|
||||
"access_ip_v6":null,
|
||||
"access_ip_v4":null,
|
||||
"bandwidth":{
|
||||
|
||||
},
|
||||
"metadata":{
|
||||
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
We can define the following managed notification object for it::
|
||||
|
||||
@base.NovaObjectRegistry.register
|
||||
class BwUsage(base.NovaObject):
|
||||
# Version 1.0: Initial version
|
||||
VERSION = '1.0'
|
||||
fields = {
|
||||
'label': fields.StringField(),
|
||||
'bw_in': fields.IntegerField(),
|
||||
'bw_out': fields.IntegerField(),
|
||||
}
|
||||
|
||||
|
||||
@base.NovaObjectRegistry.register
|
||||
class FixedIp(base.NovaObject):
|
||||
# Version 1.0: Initial version
|
||||
VERSION = '1.0'
|
||||
fields = {
|
||||
'label': fields.StringField(),
|
||||
'vif_mac': fields.StringField(),
|
||||
'meta': fields.DictOfStringsField(),
|
||||
'type': fields.StringField(), # maybe an enum
|
||||
'version': fields.IntegerField(), # maybe an enum
|
||||
'address': fields.IPAddress()
|
||||
}
|
||||
|
||||
|
||||
@base.NovaObjectRegistry.register
|
||||
class InstanceUpdatePayload(base.NovaObject):
|
||||
# Version 1.0: Initial version
|
||||
VERSION = '1.0'
|
||||
fields = {
|
||||
'instance_id': fields.UUIDField(),
|
||||
'user_id': fields.StringField(),
|
||||
'tenant_id': fields.StringField(),
|
||||
'reservation_id': fields.StringField(),
|
||||
'display_name': fields.StringField(),
|
||||
'host_name': fields.StringField(),
|
||||
'host': fields.StringField(),
|
||||
'node': fields.StringField(),
|
||||
'os_type': fields.StringField(),
|
||||
'architecture': fields.StringField(),
|
||||
'cell_name': fields.StringField(),
|
||||
'availability_zone': fields.StringField(),
|
||||
|
||||
'instance_flavor_id': fields.StringField(),
|
||||
'instance_type_id': fields.IntegerField(),
|
||||
'instance_type': fields.StringField(),
|
||||
'memory_mb': fields.IntegerField(),
|
||||
'vcpus': fields.IntegerField(),
|
||||
'root_gb': fields.IntegerField(),
|
||||
'disk_gb': fields.IntegerField(),
|
||||
'ephemeral_gb': fields.IntegerField(),
|
||||
'image_ref_url': fields.StringField(),
|
||||
|
||||
'kernel_id': fields.StringField(),
|
||||
'ramdisk_id': fields.StringField(),
|
||||
'image_meta': fields.DictOfStringField(),
|
||||
|
||||
'created_at': fields.DateTimeField(),
|
||||
'launched_at': fields.DateTimeField(),
|
||||
'terminated_at': fields.DateTimeField(),
|
||||
'deleted_at': fields.DateTimeField(),
|
||||
|
||||
'new_task_state': fields.StringField(),
|
||||
'state': fields.StringField()
|
||||
'state_description': fields.StringField(),
|
||||
'old_state': fields.StringField(),
|
||||
'old_task_state': fields.StringField(),
|
||||
'progress': fields.IntegerField(),
|
||||
|
||||
"audit_period_beginning": fields.DateTimeField(),
|
||||
"audit_period_ending": fields.DateTimeField(),
|
||||
|
||||
'access_ip_v4': fields.IPV4AddressField(),
|
||||
'access_ip_v6': fields.IPV6AddressField(),
|
||||
'fixed_ips': fields.ListOfFixedIps(),
|
||||
|
||||
'bandwidth': fields.ListOfBwUsages()
|
||||
|
||||
'metadata': fields.DictOfStringField(),
|
||||
|
||||
}
|
||||
|
||||
|
||||
@base.NovaObjectRegistry.register
|
||||
class InstanceUpdateNotification(notification.NotificationBase):
|
||||
# Version 1.0: Initial version
|
||||
VERSION = '1.0'
|
||||
fields = {
|
||||
'payload': fields.ObjectField('InstanceUpdatePayload')
|
||||
}
|
||||
|
||||
|
||||
No db schema changes are foreseen.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
None.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
None.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
See the Proposed change and Data model section.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Sending both un-versioned and versioned wire format for a notification due to
|
||||
keeping backward compatibility in Mitaka will increase the load on the message
|
||||
bus. A config option will be provided to specify which version of the
|
||||
notificatios shall be emited to mitigate this. Also the deployer can use NoOp
|
||||
notification driver to turn the interface off.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
Backward compatibility for pre Mitaka notification consumers will be ensured
|
||||
by emitting both the verisoned and the un-versioned notification format on the
|
||||
wire for every versioned notification using the configured driver. Emitting the
|
||||
un-versioned wire format of a versioned notification will be deprecated along
|
||||
with a proper deprecation message in Mitaka and will be removed in N release.
|
||||
|
||||
A new config option ``notification_format`` will be introduced with three
|
||||
possible values ``versioned``, ``un-versioned``, ``both`` to specify which
|
||||
version of the notifications shall be emited. The ``un-versioned`` value will
|
||||
be deprecated from the beginning to encourage deployers to start consuming
|
||||
versioned notifications. In Mitaka the default version of this config option
|
||||
will be ``both``.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
Developers shall use the notification base classes when implementing a new
|
||||
notification.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
|
||||
Primary assignee:
|
||||
* balazs-gibizer
|
||||
|
||||
Other contributors:
|
||||
* belliott
|
||||
* andrea-rosa-m
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Create the necessary base infrastructure e.g base classes, sample generation,
|
||||
basic test infrastructure, documentation
|
||||
* Create a versioned notifications for an easy old style notification
|
||||
(e.g. keypair notifications) to serve as an example
|
||||
* Create versioned notification for instance.update notification
|
||||
* Create versioned notifications for nova.notification.send_api_fault type of
|
||||
notifications
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Functional test coverage shall be provided for versioned notifications.
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
* Notification samples shall be generated for versioned notifications.
|
||||
* A new devref shall be created that describe how to add new versioned
|
||||
notifications to nova
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* [1] http://docs.openstack.org/developer/oslo.messaging/notifier.html
|
||||
* [2] https://github.com/openstack/nova/blob/master/nova/compute/utils.py#L320
|
||||
* [3] https://github.com/openstack/nova/blob/bc6f30de953303604625e84ad2345cfb595170d2/nova/compute/api.py#L3769
|
||||
* [4] The service status notification will be the first new notification using
|
||||
a versisoned payload https://review.openstack.org/#/c/182350/ . That spec
|
||||
will add only a minimal infrastructure to emit the versioned payload.
|
||||
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Mitaka
|
||||
- Introduced
|
||||
215
specs/mitaka/implemented/virt-driver-cpu-thread-pinning.rst
Normal file
215
specs/mitaka/implemented/virt-driver-cpu-thread-pinning.rst
Normal file
@@ -0,0 +1,215 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===============================================
|
||||
Virt driver pinning guest vCPU threads policies
|
||||
===============================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/virt-driver-cpu-thread-pinning
|
||||
|
||||
This feature aims to implement the remaining functionality of the
|
||||
virt-driver-cpu-pinning spec. This entails implementing support for thread
|
||||
policies.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Some applications must exhibit real-time or near real-time behavior. This
|
||||
is general possible by making use of processor affinity and binding vCPUs to
|
||||
pCPUs. This functionality currently exist in Nova. However, it is also
|
||||
necessary to consider thread affinity in the context of simultaneous
|
||||
multithreading (SMT) enabled systems, such as those with Intel(R)
|
||||
Hyper-Threading Technology. In these systems, competition for shared resources
|
||||
can result in unpredictable behavior.
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
Depending on the workload being executed the end user or cloud admin may wish
|
||||
to have control over how the guest uses hardware threads. To maximise cache
|
||||
efficiency, the guest may wish to be pinned to thread siblings. Conversely
|
||||
the guest may wish to avoid thread siblings. This level of control is of
|
||||
particular importance to Network Function Virtualization (NFV) deployments,
|
||||
which care about maximizing cache efficiency of vCPUs.
|
||||
|
||||
Project Priority
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
The flavor extra specs will be enhanced to support one new parameter:
|
||||
|
||||
* hw:cpu_thread_policy=prefer|isolate|require
|
||||
|
||||
This policy is an extension to the already implemented CPU policy parameter:
|
||||
|
||||
* hw:cpu_policy=shared|dedicated
|
||||
|
||||
The threads policy will control how the scheduler / virt driver places guests
|
||||
with respect to CPU threads. It will only apply if the CPU policy is
|
||||
'dedicated', i.e. guest vCPUs are being pinned to host pCPUs.
|
||||
|
||||
- prefer: The host may or may not have an SMT architecture. This retains the
|
||||
legacy behavior, whereby siblings are prefered when available. This is the
|
||||
default if no policy is specified.
|
||||
- isolate: The host must not have an SMT architecture, or must emulate a
|
||||
non-SMT architecture. If the host does not have an SMT architecture, each
|
||||
vCPU will simply be placed on a different core as expected. If the host
|
||||
does have an SMT architecture (i.e. one or more cores have "thread
|
||||
siblings") then each vCPU will be placed on a different physical core
|
||||
and no vCPUs from other guests will be placed on the same core. As such,
|
||||
one thread sibling is always guaranteed to always be unused.
|
||||
- require: The host must have an SMT architecture. Each vCPU will be
|
||||
allocated on thread siblings. If the host does not have an SMT architecture
|
||||
then it will not be used. If the host has an SMT architecture, but not
|
||||
enough cores with free thread siblings are available, then scheduling
|
||||
will fail.
|
||||
|
||||
The image metadata properties will also allow specification of the threads
|
||||
policy:
|
||||
|
||||
* hw_cpu_thread_policy=prefer|isolate|require
|
||||
|
||||
This will only be honored if the flavor specifies the 'prefer' policy, either
|
||||
explicitly or implicitly as the defalt option. This ensures that the cloud
|
||||
administrator can have absolute control over threads policy if desired.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
None.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None.
|
||||
|
||||
The necessary changes were already completed in the original spec.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
No impact.
|
||||
|
||||
The existing APIs already support arbitrary data in the flavor extra specs.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
No impact.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
No impact.
|
||||
|
||||
The notifications system is not used by this change.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
No impact.
|
||||
|
||||
Support for flavor extra specs is already available in the Python clients.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
The scheduler will incur small further overhead if a threads policy is set
|
||||
on the image or flavor. This overhead will be negligible compared to that
|
||||
implied by the enhancements to support NUMA policy and huge pages. It is
|
||||
anticipated that dedicated CPU guests will typically be used in conjunction
|
||||
with huge pages.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
The cloud administrator will gain the ability to define flavors with explicit
|
||||
threading policy. Although not required by this design, it is expected that
|
||||
the administrator will commonly use the same host aggregates to group hosts
|
||||
for both CPU pinning and large page usage, since these concepts are
|
||||
complementary and expected to be used together. This will minimize the
|
||||
administrative burden of configuring host aggregates.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
It is expected that most hypervisors will have the ability to support the
|
||||
required thread policies. The flavor parameter is simple enough that any Nova
|
||||
driver would be able to support it.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
sfinucan
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Enhance the scheduler to take account of threads policy when choosing
|
||||
which host to place the guest on.
|
||||
|
||||
* Enhance the scheduler to take account of threads policy when mapping
|
||||
vCPUs to pCPUs
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
It is not practical to test this feature using the gate and tempest at this
|
||||
time, since effective testing will require that the guests running the test
|
||||
be provided with multiple NUMA nodes, each in turn with multiple CPUs.
|
||||
|
||||
These features will be validated using a third-party CI (Intel Compute CI).
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
None.
|
||||
|
||||
The documentation changes were made in the previous change.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
Current "big picture" research and design for the topic of CPU and memory
|
||||
resource utilization and placement. vCPU topology is a subset of this
|
||||
work:
|
||||
|
||||
* https://wiki.openstack.org/wiki/VirtDriverGuestCPUMemoryPlacement
|
||||
|
||||
Current CPU pinning validation tests for Intel Compute CI:
|
||||
|
||||
* https://github.com/stackforge/intel-nfv-ci-tests
|
||||
|
||||
Existing CPU Pinning spec:
|
||||
|
||||
* http://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/virt-driver-cpu-pinning.html
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Liberty
|
||||
- Introduced
|
||||
* - Mitaka
|
||||
- Revised to include rework policies, removing two, adding one and
|
||||
clarifying the remainder
|
||||
219
specs/mitaka/implemented/vmware-limits.rst
Normal file
219
specs/mitaka/implemented/vmware-limits.rst
Normal file
@@ -0,0 +1,219 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================================
|
||||
VMware Limits, Shares and Reservations
|
||||
==========================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/vmware-limits-mitaka
|
||||
|
||||
VMware Virtual Center provides options to specify limits, reservations and
|
||||
shares for CPU, memory, disks and network adapters.
|
||||
|
||||
In the Juno cycle support for CPU limits, reservation and shares was added.
|
||||
This blueprint proposes a way of supporting memory, disk and network
|
||||
limits, reservations and shares.
|
||||
|
||||
For limits the utlization will not exceed the limit. Reservations will be
|
||||
guaranteed for the instance. Shares are used to determine relative allocation
|
||||
between resource consumers. In general, a consumer with more shares gets
|
||||
proportionally more of the resource, subject to certain other constraints.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The VMware driver is only able to support CPU limits. Providing admins the
|
||||
ability to provide limits, reservation and shares for memory, disks and
|
||||
network adapters will be a very useful tool for providing QoS to tenants.
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
* This will enable a cloud provider to provide SLA's to customers
|
||||
|
||||
* It will allow tenants to be guaranteed performance
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Due to the different models for different drivers and the API's in which
|
||||
the backends expose we are unable to leverage the same existings flavor
|
||||
extra specs.
|
||||
|
||||
For example for devices libvirt makes use of: 'hw_rng:rate_bytes',
|
||||
'hw_rng:rate_period'.
|
||||
|
||||
In addition to this there are the following disk I/O options are:
|
||||
|
||||
'disk_read_bytes_sec', 'disk_read_iops_sec', 'disk_write_bytes_sec',
|
||||
'disk_write_iops_sec', 'disk_total_bytes_sec', and
|
||||
'disk_total_iops_sec'.
|
||||
|
||||
For bandwidth limitations there is the 'rxtx_factor'. This will not enable
|
||||
us to provide the limits, reservations and shares for vifs. This is used in
|
||||
some bases to pass the information through to Neutron so that the backend
|
||||
network can do the limitations. The following extra_specs can be configured
|
||||
for bandwidth I/O for vifs:
|
||||
|
||||
'vif_inbound_average', 'vif_inbound_burst', 'vif_inbound_peak',
|
||||
'vif_outbound_average', 'vif_outbound_burst' and 'vif_outbound_peak'.
|
||||
|
||||
None of the above of possible for the VMware driver due to VC API's. The
|
||||
following additions below are proposed:
|
||||
|
||||
Limits, reservations and shares will be exposed for the following:
|
||||
|
||||
* memory
|
||||
|
||||
* disks
|
||||
|
||||
* network adapters
|
||||
|
||||
The flavor extra specs for quotas has been extended to support:
|
||||
|
||||
* quota:memory_limit - The memory utilization of a virtual machine will not
|
||||
exceed this limit, even if there are available resources. This is
|
||||
typically used to ensure a consistent performance of virtual machines
|
||||
independent of available resources. Units are MB.
|
||||
|
||||
* quota:memory_reservation - guaranteed minimum reservation (MB)
|
||||
|
||||
* quota:memory_shares_level - the allocation level. This can be 'custom',
|
||||
'high' 'normal' or 'low'.
|
||||
|
||||
* quota:memory_shares_share - in the event that 'custom' is used, this is
|
||||
the number of shares.
|
||||
|
||||
* quota:disk_io_limit - The I/O utilization of a virtual machine will not
|
||||
exceed this limit. The unit is number of I/O per second.
|
||||
|
||||
* quota:disk_io_reservation - Reservation control is used to provide guaranteed
|
||||
allocation in terms of IOPS
|
||||
|
||||
* quota:disk_io_shares_level - the allocation level. This can be 'custom',
|
||||
'high' 'normal' or 'low'.
|
||||
|
||||
* quota:disk_io_shares_share - in the event that 'custom' is used, this is
|
||||
the number of shares.
|
||||
|
||||
* quota:vif_limit - The bandwidth limit for the virtual network adapter.
|
||||
The utilization of the virtual network adapter will not exceed this limit,
|
||||
even if there are available resources. Units in Mbits/sec.
|
||||
|
||||
* quota:vif_reservation - Amount of network bandwidth that is guaranteed to
|
||||
the virtual network adapter. If utilization is less than reservation, the
|
||||
resource can be used by other virtual network adapters. Reservation is not
|
||||
allowed to exceed the value of limit if limit is set. Units in Mbits/sec.
|
||||
|
||||
* quota:vif_shares_level - the allocation level. This can be 'custom',
|
||||
'high' 'normal' or 'low'.
|
||||
|
||||
* quota:vif_shares_share - in the event that 'custom' is used, this is the
|
||||
number of shares.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
The alternative is to create an abstract user concept that could help hide
|
||||
the details and of the difference from end users, and isolate the differences
|
||||
to just the admin users.
|
||||
|
||||
This is really out of the scope of what is proposed and will take a huge
|
||||
cross driver effort. This will not only be relevant for flavors but maybe for
|
||||
images too.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Preventing instances from exhausting storage resources can have a significant
|
||||
performance impact.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
garyk
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* common objects for limits, reservation and shares
|
||||
|
||||
* memory support
|
||||
|
||||
* disk support
|
||||
|
||||
* vif support
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
This will be tested by the VMware CI. We will add tests to validate this.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
This should be documented in the VMware section.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
The vCenter API's can be see the following links:
|
||||
|
||||
* Disk IO: http://goo.gl/uepivS
|
||||
|
||||
* Memory: http://goo.gl/6sHwIA
|
||||
|
||||
* Network Adapters: http://goo.gl/c2amhq
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
None
|
||||
171
specs/mitaka/implemented/vmware-opaque-network-support.rst
Normal file
171
specs/mitaka/implemented/vmware-opaque-network-support.rst
Normal file
@@ -0,0 +1,171 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================================
|
||||
VMware: Expand Support for Opaque Networks
|
||||
==========================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/vmware-expand-opaque-support
|
||||
|
||||
An opaque network was introduced in the vSphere API in version 5.5. This is
|
||||
a network that is managed by a control plane outside of vSphere. The identifier
|
||||
and name of this network is made known to vSphere so that a host and virtual
|
||||
machine ethernet device can be connected to them.
|
||||
|
||||
The initial code was added to support the NSX-MH (multi hypervisor) Neutron
|
||||
plugin. This was in commit 2d7520264a4610068630d7664eeff70fb5e8c681. That
|
||||
support would require the configuration of a global integration bridge and
|
||||
ensuring that the network was connected to that bridge. This approach is
|
||||
similar to the way in which this is implemented in the libvirt VIF driver.
|
||||
|
||||
In the Liberty cycle, a new plugin was added to the openstack/vmware-nsx
|
||||
repository, this is called NSXv3. This is to support a new NSX backend. This
|
||||
is a multi-hypervisor plugin. The support for libvirt, Xen etc. already exists.
|
||||
|
||||
This spec will deal with the compute integration for the VMware VC driver.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
This spec will deal with the configuration of the Opaque network for the NSXv3
|
||||
Neutron driver.
|
||||
|
||||
Use Cases
|
||||
----------
|
||||
|
||||
This is required for the NSXv3 plugin. Without it Nova will be unable to attach
|
||||
a ethernet device to a virtual machine.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
The change is self contained within the VMware driver code and just related to
|
||||
how the ethernet device backing is configured. This is only when the Neutron
|
||||
virtual port is of the type 'ovs'. The NSXv3 plugin will ensure that the port
|
||||
type is set to 'ovs'. The VC driver will need to treat this port type.
|
||||
|
||||
When the type is 'ovs' there are two different flows:
|
||||
|
||||
* If the configuration flag 'integration_bridge' is set. This is for the
|
||||
NSX-MH plugin. This requires that the backing type opaqueNetworkId be set
|
||||
as the 'integration_bridge'; the backing type opaqueNetworkType be set as
|
||||
'opaque'.
|
||||
|
||||
* If the flag is not set then this is the NSXv3 plugin. This requires that
|
||||
the backing value opaqueNetworkId be set as the neutron network UUID; the
|
||||
backing type opaqueNetworkType will have value 'nsx.LogicalSwitch'; and the
|
||||
backing externalId has the neutron port UUID.
|
||||
|
||||
.. note::
|
||||
|
||||
* The help for the configuration option 'integration_bridge' will be updated
|
||||
to reflect the values for the different plugins.
|
||||
|
||||
* A log warning will appear if the invalid VC version is used.
|
||||
|
||||
* The above should be done regardless of this support.
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
None
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
The NSXv3 support will be greenfield.
|
||||
|
||||
The NSX-MH will be deprecated in favor of the NSXv3 plugin. As a result of
|
||||
this we will set the default 'integration_bridge' value as None. This means
|
||||
that a user running the existing NSX-MH will need to make sure that this value
|
||||
is set. This is something that will be clearly documented.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
garyk
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
The implementation of the changes in Nova can be seen at:
|
||||
https://review.openstack.org/#/c/165750/.
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
This code depends on the Neutron driver NSXv3 added in the Liberty cycle.
|
||||
This code can be found at https://github.com/openstack/vmware-nsx/blob/master/vmware_nsx/plugins/nsx_v3/plugin.py
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
The code is tested as part of the Neutron CI testing.
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
We will need to make sure that the release notes are updated to explain the
|
||||
configuration of CONF.vmware.integration_bridge config. As mentioned above
|
||||
that is only relevant to the NSX-MH as the code will be changed to support
|
||||
the NSXv3.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* https://www.vmware.com/support/developer/converter-sdk/conv55_apireference/vim.OpaqueNetwork.html
|
||||
|
||||
* https://review.openstack.org/#/c/165750/
|
||||
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
None
|
||||
191
specs/mitaka/implemented/volume-ops-when-shelved.rst
Normal file
191
specs/mitaka/implemented/volume-ops-when-shelved.rst
Normal file
@@ -0,0 +1,191 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==============================
|
||||
Volume Operations When Shelved
|
||||
==============================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/volume-ops-when-shelved
|
||||
|
||||
Currently attach, detach and swap volume operations are allowed when
|
||||
an instance is paused, stopped and soft deleted, but are
|
||||
not allowed when an instance has been shelved. These operations are
|
||||
possible when an instance is shelved so we should enable them.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The attach, detach and swap volume operations are not allowed when an
|
||||
instance is in the shelved or shelved_offloaded states. From a user's
|
||||
perspective this is at odds with the fact these operations can be
|
||||
performed on instances in other inactive states.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
As a cloud user I want to be able to detach volumes from my shelved instance
|
||||
and use them elsewhere, without having to unshelve the instance first.
|
||||
|
||||
As a cloud user I want to be able to perform all the volume operations on
|
||||
a shelved instance that I can when it is stopped, paused or soft deleted.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Shelved instances can be in one of two possible states: shelved and
|
||||
shelved_offloaded (ignoring transitions during shelving and unshelving).
|
||||
When in shelved the instance is still on a host but inactive. When in
|
||||
shelved_offloaded the instance has been removed from the host and the
|
||||
resources it was using there are released.
|
||||
|
||||
Volume operations on an instance in the shelved state are similar to
|
||||
any other state when on the host. The operations can be enabled by allowing
|
||||
them at the compute API for this state. The existing compute manager code
|
||||
does handle this case already; it is merely disabled in the API.
|
||||
|
||||
The shelved_offloaded state is different. In this case the instance is not
|
||||
on any host, so functions to attach and detach need to be implemented in
|
||||
the API in the same way that the code to detach volumes for deletion is done.
|
||||
These will only perform the steps to manage the block device mappings and
|
||||
register with cinder. Any actual attachment to a host will be completed
|
||||
when the instance is unshelved as usual.
|
||||
|
||||
The compute api attach volume code makes an rpc call to the hosting compute
|
||||
manager to select a name for the device, which includes a call into the virt
|
||||
driver. This can not be done when the instance is offloaded
|
||||
because it is not on a host.
|
||||
|
||||
In fact, devices names are set when an instance is booted
|
||||
and there is no guarantee that a name provided by the user will be
|
||||
respected. So the new attach method for the shelved_offloaded state will
|
||||
defer name selection until the instance is unshelved. This avoids the need
|
||||
to call a compute manager at all.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
The only clear alternative is to not allow volumes to be attached or detached
|
||||
when an instance is shelved.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
The attach, detach and swap operations will
|
||||
be allowed when the instance is in the shelved and shelved_offloaded states.
|
||||
Instead of returning the existing HTTP error 409 (Conflict)
|
||||
the return values will be the same as they are for other valid states.
|
||||
|
||||
This change will require an API microversion increment.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
pmurray
|
||||
|
||||
Other contributors:
|
||||
andrea-rosa-m
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
The following changes will be required:
|
||||
|
||||
#. Change the guards on the attach, detach and swap functions in the compute
|
||||
API to allow them when the instance is in the shelved state.
|
||||
#. Add functions to attach, detach and swap volumes that are be executed
|
||||
locally at the API when the instance is in the shelved offloaded state.
|
||||
#. Add code to handle device names on unshelve (devices attached in
|
||||
shelved_offloaded will have had name selection deferred to unshelve).
|
||||
#. Change the guards on the attach, detach and swap functions to allow them
|
||||
when the instance is in the shelved_offloaded state.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
This spec is a step towards allowing boot volumes to be attached and
|
||||
detached when in the shelved_offloaded state (see [1]). But this spec
|
||||
also provides useful functionality on its own.
|
||||
This spec adds more opportunity to get race conditions due to
|
||||
conflicting parallel operations, it is important to note that those races
|
||||
are not introduced by this change but already exist in nova and they are
|
||||
going to be addressed by a different change, please see [2] for more
|
||||
information.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Most of the attach and detach functionality can be tested with unit tests.
|
||||
In particular the shelved state is the same as shutdown or stopped.
|
||||
|
||||
New unit tests will be needed for the new attach and detach functions in the
|
||||
shelved offloaded state.
|
||||
|
||||
A tempest test will be added to check that the sequence of shelving,
|
||||
detaching/attaching volumes and then unshelving leads to a running
|
||||
instance with the expected volumes correctly attached.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
This spec will affect cloud users. They will now be able to perform volume
|
||||
operations on shelved instances.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[1] https://blueprints.launchpad.net/openstack/?searchtext=detach-boot-volume
|
||||
|
||||
[2] https://review.openstack.org/216578
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Mitaka
|
||||
- Introduced
|
||||
Reference in New Issue
Block a user