Move implemented mitaka specs

Note that resource-classes was already moved but the redirects
file wasn't updated, that's fixed here.

There are some partial blueprints that were marked completed in
mitaka and are still being worked in newton, like the config
option work. I've moved those to implemented here also.

Change-Id: I16f279b4794127cb7abc40ffc22cc237702d14ed
This commit is contained in:
Matt Riedemann
2016-03-29 20:34:31 -04:00
parent a5a097016c
commit 6f4faa9637
37 changed files with 37 additions and 0 deletions

View File

@@ -0,0 +1,338 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
================================================
Provide a way to abort an ongoing live migration
================================================
Blueprint:
https://blueprints.launchpad.net/nova/+spec/abort-live-migration
At present, intervention at the hypervisor level is required to cancel
a live migration. This spec proposes adding a new operation on the
instance object to cancel a live migration of that instance.
Problem description
===================
It may be that an operator decides, after starting a live migration,
that they would like to cancel it. Effectively this would mean
rolling-back any partial migration that has happened and leaving the
instance on the source node. It may be that the migration is taking too
long, or some operational problem is discovered with the target node.
As the set of operations that can be performed on an instance during
live migration is restricted (only delete is currently allowed), it may
be that an instance owner has requested that their instance be
made available urgently.
Currently aborting a live migration requires intervention at the
hypervisor level, which Nova recognises and resets the instance state.
Use Cases
----------
As an operator of an OpenStack cloud, I would like the ability to
query, stop and roll back an ongoing live migration. This is required
for a number of reasons.
1. The migration may be failing to complete due to the instance's
workload. In some cases the solution to this issue may be to pause
the instance but in other cases the migration may need to be
abandoned or at least postponed.
2. The migration may be having an adverse impact on the instance,
i.e. the instance owner may be observing degraded performance of
their application and be requesting that the cloud operator address
this issue.
3. The instance migration may be taking too long due to the large
amount of data to be copied (i.e. the instance's ephemeral disk is
very full) and the cloud operator may have consulted with the
instance owner and decided to abandon the live migration and employ
a different strategy. For example, stop the instance, perform the
hypervisor maintenance, then restart the instance.
Proposed change
===============
New API operations on the instance object are proposed which can be used
to obtain details of migration operations on the instance and abort
an active operation. This will include a GET to obtain details of
migration operations. If the instance does not exist (or is not
visible to the tenant id being used) or has not been the subject of any
migrations the GET will return a 404 response code. If the GET
returns details of an active migration, a DELETE can be used to abort
the migration operation. Again, if the instance does not exist (as in
the case where it has been deleted since the GET call) or no migration
is in progress (i.e. it is ended since the GET call) the DELETE will
return a 404 response code. Otherwise it will return a 202 response
code.
Rolling back a live migration should be very quick, as the source host
is still active until the migration finishes. However this depends on
the approach implemented by the virtualization driver. For example Qemu
is planning to implement a 'post copy' feature -
https://www.redhat.com/archives/libvir-list/2014-December/msg00093.html
In this situation a cancellation request should be declined because
rolling back to the source node would be more work than completing the
migration. In fact it is probably impossible! Nova would need to be
involved in the switch from pre-copy to post-copy so that it could
switch the networking to the target host. Thus nova would know that the
instance has switched and decline any cancellation requests. If the
instance migration were to encounter difficulties completing during the
post copy the instance would need to be paused to allow the migration
to complete.
The GET /servers/{id}/migrations operation will entail the API server
verifying the existence and task state of the instance. If the
instance does not exist (or is not visible to the user invoking this
operation) a 404 response code will be returned. Otherwise the API
server will return details of all the running migration operations for
the instance. It will use an new method on the migration class called
get_by_instance_and_status specifying the instance uuid and status of
running. If no migration objects are returned an empty list will be
returned in the API response. If one or more migration object is
returned then the new_instance_type_id and old_instance_type_id fields
will be used to retrieve flavor objects for the relevant flavors to
obtain the falvor id. These values will be included in the response
as new_flavor_id and old_flavor_id. This will mean that a user will be
able to use this information to obtain details of the flavors.
The DELETE /servers/{id}/migrations/{id} operation will entail the API
server calling the migration_get method on the migration class to
verify the existence of an ongoing live migration operation on the
instance. It will then call a method on the ServersController class
called live_migrate_abort
If the invoking user does not have authority to perform the operation
(as defined in the policy.json file) then a 403 response code will be
returned. The policy.json file will be updated to define the
live_migrate_abort as accessible to cloud admin users only.
If the API server determines that the operation can proceed it will
send an async message to the compute manager and return a 202
response code to the user.
The compute manager will emit a notification message indicating that
the live_migrate_abort operation has started. It will then invoke a
method on the driver to abort the migration. If the driver is unable
to perform this operation a new exception called
'AbortMigrationNotSupported' will be returned.
The compute manager method invoked will be wrapped with the decorators
that cause it to generate instance action and notification events. The
exception generated here would be processed by those wrappers and thus
the user would be able to query the instance actions to discover the
outcome of the cancellation operation.
Note the instance task state will not be updated by the
live_migrate_abort operation. If the operator were to execute the
operation multiple times the subsequent invocations would simply fail.
In the case of the libvirt driver it will obtain the domain object for
the target instance and invoke job abort on it. If there is no job
active an error will be returned. This could occur if the instance
migration has recently finished or has completed the libvirt migration
and is executing the post migration phase. It could also occur if the
migration is still executing the pre migration phase. Finally, if it
could mean the libvirt job has failed but nova has not updated the
task state. In all of these cases an exception will be returned to the
compute manager to indicate that the operation was unsuccessful.
If the libvirt job abort operation succeeds then the thread performing
the live migration will receive an error from the libvirt driver and
perform the live migration rollback steps, including reseting the
instance's task state to none.
Alternatives
------------
One alternative is not doing this, leaving it up to operators to roll
up their sleeves and get to work on the hypervisor.
The topic of cancelling an ongoing live migration has been mooted
before in Nova, and has been thought of as being suitable for a
"Tasks API" for managing long-running tasks [#]_. There is not
currently any Tasks API, but if one were to be added to Nova, it would
be suitable.
Data model impact
-----------------
None
REST API impact
---------------
To be added in a new microversion.
* Obtain details of live migration operations on an instance that have
a status of running. There should only be one migration per instance
in this state but the API call supports returning more than one.
The operation will return the id of the active migration operation
for the instance.
`GET /servers/{id}/migrations`
Body::
None
Normal http response code: `200 OK`
Body::
{
"migrations": [
{
"created_at": "2013-10-29T13:42:02.000000",
"dest_compute": "compute3",
"id": 6789,
"server_uuid": "6ff1c9bf-09f7-4ce3-a56f-fb46745f3770",
"new_flavor_id": 2,
"old_flavor_id": 1,
"source_compute": "compute2",
"status": "running",
"updated_at": "2013-10-29T14:42:02.000000",
}
]
}
Expected error http response code: `404 Not Found`
- the instance does not exist
Expected error http response code: `403 Forbidden`
- Policy violation if the caller is not granted access to
'os_compute_api:servers:migrations:index' in policy.json
* Stop an in-progress live migration
The operation will return the instance task state to none.
`DELETE /servers/{id}/migrations/{id}`
Body::
None
Normal http response code: `202 Accepted`
No response body is needed
Expected error http response code: `404 Not Found`
- the instance does not exist
Expected error http response code: `403 Forbidden`
- Policy violation if the caller is not granted access to
'os_compute_api:servers:migrations:delete' in policy.json
Expected error http response code: `400 Bad Request`
- the instance state is invalid for cancellation, i.e. the task
state is not 'migrating' or the migration is not in a running
state and the type is 'live-migration'
Security impact
---------------
None
Notifications impact
--------------------
Emit notification messages indicating the start and outcome of the
migration cancellation operation.
Other end user impact
---------------------
A new python-novaclient command will be available, e.g.
nova live-migration-abort <instance>
Performance Impact
------------------
None
Other deployer impact
---------------------
None
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Paul Carlton (irc: paul-carlton2)
Other assignees:
Claudiu Belu
Work Items
----------
* python-novaclient 'nova live-migration-abort'
* Cancel live migration API operation
* Cancelling a live migration per hypervisor
* libvirt
* hyper-v
* vmware
Dependencies
============
None
Testing
=======
Unit tests will be added using fake virt driver to simulate a live
migration. The fake driver implementation will simply wait for the
cancelation. We also want to test attempts to cancel a migration
during pre or post migration, which can be done using a fake
implementation of those steps that will also wait for an indication
that the cancel attempt has been performed.
The functional testing will utilize the new live migration CI job.
An instance with memory activity and a large disk will be used so we
can test all aspects of live migration, including aborting the live
migration.
Documentation Impact
====================
New API needs to be documented:
* Compute API extensions documentation
http://developer.openstack.org/api-ref-compute-v2.1.html
* nova.compute.api documentation
http://docs.openstack.org/developer/nova/api/nova.compute.api.html
References
==========
Some details of how this can be done with libvirt:
https://www.redhat.com/archives/libvirt-users/2014-January/msg00008.html
.. [#] http://lists.openstack.org/pipermail/openstack-dev/2015-February/055751.html
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Mitaka
- Introduced

View File

@@ -0,0 +1,177 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=====================
Add os-win dependency
=====================
https://blueprints.launchpad.net/nova/+spec/add-os-win-library
Hyper-V is involved in many of OpenStack components (nova, neutron, cinder,
ceilometer, etc.) and will be involved with other components in the future.
A common library has been created, named os-win, in order to reduce the code
duplication between all these components (utils classes, which interacts
directly with Hyper-V through WMI), making it easier to maintain, review and
propose new changes to current and future components.
Problem description
===================
There are many Hyper-V utils modules duplicated across several projects,
which can be refactored into os-win, reducing the code duplication and making
it easier to maintain. Plus, the review process will be simplified, as
reviewers won't have to review Hyper-V related code, in which not everyone is
proficient.
Use Cases
---------
This blueprint impacts Developers and Reviewers.
Developers will be able to submit Hyper-V related commits directly to os-win.
Reviewers will not have to review low level Hyper-V related code. Thus, the
amount of code that needs to be reviewed will be reduced by approximately 50%.
Proposed change
===============
In order to implement this blueprint, minimal changes are necessary, as the
behaviour will stay the same.
The primary changes that needs to be done on nova are as follows:
* add os-win in requirements.txt
* replace ``nova.virt.hyperv.vmutils.HyperVException`` references to
``os_win.HyperVException``
* replace all ``nova.virt.hyperv.utilsfactory`` imports used by the
`HyperVDriver` with ``os_win.utilsfactory``
* remove all utils modules and their unit tests in ``nova.virt.hyperv``, since
they will no longer be used.
* other trivial changes, which are to be seen in the implementation.
Changes that needs to be done on other projects:
* add os-win in global-requirements.txt [1]
Alternatives
------------
Originally, os-win was planned to be part of Oslo, it was suggested that os-win
should be a standalone project, as otherwise the Oslo team would also have to
maintain in and there aren't many / anyone that specializes in Windows /
Hyper-V related code.
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
os-win dependency will have to be installed in order for the HyperVDriver to be
used.
Developer impact
----------------
In a typical scenario, a blueprint implementation for the Hyper-V Driver will
require 2 parts:
* os-win commit, adding Hyper-V related utils required in order to implement
the blueprint.
* nova commit, implementing the blueprint and using the changes made in os-win.
If a nova commit requires a newer version of os-win, the patch to
global-requirements should be referenced with Depends-On in the commit message.
For bugfixes, there are chances that they require 2 patches: one for nova and
one for os-win. The backported bugfix must be a squashed version of the 2
patches, referencing both commit IDs in the commit message::
(cherry picked from commit <nova-commit-id>)
(cherry picked from commit <os-win-commit-id)
If the bugfix requires only one patch to either project, backporting will
proceed as before.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Claudiu Belu <cbelu@cloudbasesolutions.com>
Other contributors:
Lucian Petrut <lpetrut@cloudbasesolutions.com>
Work Items
----------
As described in the `Proposed change` section.
Dependencies
============
Adds os-win library as a dependency.
Testing
=======
* Unit tests
* Hyper-V CI
Documentation Impact
====================
The Hyper-V documentation page [3] will have to be updated to include os-win
as a dependency.
References
==========
[1] os-win added to global-requirements.txt:
https://review.openstack.org/#/c/230394/
[2] os-win repository:
https://github.com/openstack/os-win
[3] Hyper-V virtualization platform documentation page:
http://docs.openstack.org/liberty/config-reference/content/hyper-v-virtualization-platform.html
History
=======
Mitaka: Introduced

View File

@@ -0,0 +1,241 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=======================================================================
Show the 'project_id' and 'user_id' information in os-server-groups API
=======================================================================
https://blueprints.launchpad.net/nova/+spec/add-project-id-and-user-id
Show the 'project_id' and 'user_id' information of the server
groups in os-server-groups API. This fix will allow admin user
to identify server group easier.
Problem description
===================
The os-server-groups API currently allows admin user to list server
groups for all projects and the response body doesn't contain project
id information of each server group, it will be hard to identify which
server group belong to which project in multi-tenant env.
Use Cases
---------
As a cloud administrator, I want to easily identify which server group
belongs to which project when sending GET request.
Proposed change
===============
Add a new API microversion to the os-server-groups API extension such that if:
* The version on the API 'list' request satisfies the minimum version include
the 'project_id' and 'user_id' information of server groups in the
response data.
* The version on the API 'show' request satisfies the minimum version include
the 'project_id' and 'user_id' information of server groups in the response
data.
* The version on the API 'create' request satisfies the minimum version
include the 'project_id' and 'user_id' information of server groups in
the response data.
Alternatives
------------
None
Data model impact
-----------------
None
REST API impact
---------------
The proposed change updates the GET response data in the os-server-groups
API extension to include the 'project_id' and 'user_id' field if the request
has a minimum supported version.
The proposed change also updates the POST response data in the
os-server-groups API extension to include the 'project_id' and 'user_id'
field if the request has a minimum supported version.
* Modifications for the method
* Add project id information to the current response data.
* Add user id information to the current response data.
* GET requests response data will be affected.
* POST requests response data will be affected.
* Example use case:
Request:
GET --header "X-OpenStack-Nova-API-Version: 2.12" \
http://127.0.0.1:8774/v2.1/e0c1f4c0b9444fa086fa13881798144f/os-server-groups
Response:
::
{
"server_groups": [
{
"user_id": "ed64bccd0227444fa02dbd7695769a7d",
"policies": [
"affinity"
],
"name": "test1",
"members": [],
"project_id": "b8112a8d8227490eba99419b8a8c2555",
"id": "e64b6ae1-4d05-4faa-9f53-72c71f8e6f1a",
"metadata": {}
},
{
"user_id": "9128b975e91846f882eb63dc35c2ffd8",
"policies": [
"anti-affinity"
],
"name": "test2",
"members": [],
"project_id": "b8112a8d8227490eba99419b8a8c2555",
"id": "b1af831c-69b5-4d42-be44-d710f2b8954c",
"metadata": {}
}
]
}
Request:
GET --header "X-OpenStack-Nova-API-Version: 2.12" \
http://127.0.0.1:8774/v2.1/e0c1f4c0b9444fa086fa13881798144f/os-server-groups/
e64b6ae1-4d05-4faa-9f53-72c71f8e6f1a
Response:
::
{
"user_id": "ed64bccd0227444fa02dbd7695769a7d",
"policies": [
"affinity"
],
"name": "test1",
"members": [],
"project_id": "b8112a8d8227490eba99419b8a8c2555",
"id": "e64b6ae1-4d05-4faa-9f53-72c71f8e6f1a",
"metadata": {}
}
Request:
POST --header "X-OpenStack-Nova-API-Version: 2.12" \
http://127.0.0.1:8774/v2.1/e0c1f4c0b9444fa086fa13881798144f/os-server-groups \
-d {"server_group": { "name": "test", "policies": [ "affinity" ] }}
Response:
::
{
"user_id": "ed64bccd0227444fa02dbd7695769a7d",
"policies": [
"affinity"
],
"name": "test",
"members": [],
"project_id": "b8112a8d8227490eba99419b8a8c2555",
"id": "e64b6ae1-4d05-4faa-9f53-72c71f8e6f1a",
"metadata": {}
}
* There should not be any impacts to policy.json files for this change.
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
* The python-novaclient server-group-list, server-group-show
server-group-create command will be updated to handle microversions
to show the 'project_id' and 'user_id' information in it's output
if the requested microversion provides that infomation.
Performance Impact
------------------
None
Other deployer impact
---------------------
None; if a deployer is using the required minimum version of the API to get
the 'project_id' and 'user_id' data they can begin using it, otherwise they
won't see a change.
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Zhenyu Zheng <zhengzhenyu@huawei.com>
Work Items
----------
* Add a new microversion and change
nova/api/openstack/compute/server_groups.py to use it to determine
if the 'project_id' and 'user_id' information of the server group
should be returned.
Dependencies
============
None
Testing
=======
* Unit tests and API samples functional tests in the nova tree.
* There are currently not any compute API microversions tested in Tempest
beyond v2.1. We could add support for testing the new version in Tempest
but so far the API is already at least at v2.10 without changes to Tempest.
Documentation Impact
====================
* nova/api/openstack/rest_api_version_history.rst document will be updated.
* api-ref at https://github.com/openstack/api-site will be updated.
References
==========
* Originally reported as a bug:
https://bugs.launchpad.net/python-novaclient/+bug/1481210

View File

@@ -0,0 +1,172 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
Boot From UEFI image
==========================================
https://blueprints.launchpad.net/nova/+spec/boot-from-uefi
The nova compute libvirt driver does not support booting from UEFI images.
This is a problem because there is a slow but steady trend for OSes to move
to the UEFI format and in some cases to make the UEFI format their only
format. Microsoft Windows is moving in this direction and Clear Linux is
already in this category. Given this, we propose enabling UEFI boot with
the libvirt driver. Additionally, we propose using the well tested and
battle hardened Open Virtual Machine Firmware (OVMF) as the VM firmware
for x86_64.
Unified Extensible Firmware Interface (UEFI) is a standard firmware designed
to replace BIOS. Booting a VM using UEFI/OVMF is supported by libvirt since
version 1.2.9.
OVMF is a port of Intel's tianocore firmware to qemu virtual machine, in other
words this project enables UEFI support for Virtual Machines.
Problem description
===================
Platform vendors have been increasingly adopting UEFI for the platform firmware
over traditional BIOS. This, in part, is leading to OS vendors also shifting to
support or provide UEFI images. However, as adoption of UEFI for OS images
increases, it has become apparent that OpenStack through its Nova compute
Libvirt driver, does not support UEFI image boot. This is problematic and needs
to be resolved.
Use Cases
----------
1. User wants to launch a VM with UEFI. In this case the user needs to be able
to tell Nova everything that is needed to launch the desired VM. The only
additional information that should be required is new image properties
indicating which kind of firmware type will be used, uefi or bios.
Proposed change
===============
Add missing elements when generating XML definition in libvirt driver to
support OVMF firmware. Add also a new image metadata value to specify which
firmware type will be used.
The following is the new metadata value.
* 'hw_firmware_type': fields.EnumField()
This indicates that which kind of firmware type will be used to boot VM.
This property can be set to 'uefi' or 'bios'. 'uefi' will indicate that
uefi firmware will be used. If the property is not set, 'bios' firmware
will be used.
Alternatives
------------
None
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
The following packages should be added to the system:
* ovmf
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
qiaowei-ren
Other contributors:
Victor Morales <victor.morales@intel.com>
Xin Xiaohui <xiaohui.xin@intel.com>
Work Items
----------
The primary work items are
* Add the 'hw_firmware_type' field to the ImageMetaProps object
* Update the libvirt guest XML configuration when the UEFI image
property is present
Dependencies
============
This spec only implements uefi boot for x86_64 and arm64. And this
spec will depend on the following libraries:
* libvirt >= 1.2.9
* OVMF from EDK2
Testing
=======
Would need new unit tests. Without some kind of functional testing,
there is a warning emitted when this is used saying it's untested
and therefore considered experimental.
Documentation Impact
====================
Some minor additions for launching a UEFI image with Nova, note on
extra config option and metadata property, Operator / installation
information for the UEFI firmware. In addition, hypervisor support
matrix should be also updated.
References
==========
* http://www.linux-kvm.org/downloads/lersek/ovmf-whitepaper-c770f8c.txt
* https://libvirt.org/formatdomain.html#elementsOSBIOS
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Mitaka
- Introduced

View File

@@ -0,0 +1,160 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=======================================
Database connection switching for cells
=======================================
https://blueprints.launchpad.net/nova/+spec/cells-db-connection-switching
In order for Nova API to perform queries on cell databases, the database
connection information for the target cell must be used. Nova API must
pass the cell database connection information to the DB API layer.
Problem description
===================
In Cells v2, instead of using a nova-cells proxy, nova-api will interact
directly with the database and message queue of the cell for an instance.
Instance -> cell mappings are stored in a table in the API level database.
Each InstanceMapping refers to a CellMapping, and the CellMapping contains
the connection information for the cell. We need a way to communicate the
database connection information from the CellMapping to the DB layer, so
when we update an instance, it will be updated in the cell database where
the instance's data resides.
Use Cases
----------
* Operators want to partition their deployments into cells for scaling, failure
domain, and buildout reasons. When partitioned, we need a way to route
queries to the cell database for an instance.
Proposed change
===============
We propose to store the database connection information for a cell in the
RequestContext where it can be used by the DB API layer to interact with
the cell database. Currently, there are two databases that can be used at
the DB layer: 'main' and 'api' that are selected by the caller by method
name. We will want to consolidate the two methods into one that takes a
parameter to choose which EngineFacade to use. The field 'db_connection'
will be added to RequestContext to store the key to use for looking up the
EngineFacade.
When a request comes in, nova-api will look up the instance mapping in the
API database. It will get the database information from the instance's
CellMapping and store a key based on it in the RequestContext 'db_connection'
field. Then, the DB layer will look up the EngineFacade object for interacting
with the cell database using the 'db_connection' key stored in the
RequestContext.
Alternatives
------------
One alternative would be to add an argument to DB API methods to optionally
take database connection information to use instead of the configuration
setting and pass it when taking action on objects. This would require changing
the signatures of all the DB API methods to take the keyword argument or
otherwise finding a way to let all of the DB API methods derive from such an
interface. There is also precedent of allowing use of a field in the
RequestContext to communicate "read_deleted" to the DB API model_query.
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
The database connection field in the RequestContext could contain sensitive
data.
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
This change on its own does not introduce a performance impact. The overall
design of keeping only mappings in the API DB and instance details in the
cell databases introduces an additional database lookup for the cell database
connection information. This can however be addressed by caching mappings.
Other deployer impact
---------------------
None
Developer impact
----------------
This change means that developers should be aware that cell database connection
information is contained in the RequestContext and be mindful that it could
contain sensitive data. Developers will need to use the interfaces for getting
database connection information from a CellMapping and setting it in a
RequestContext in order to interact query a cell database.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
melwitt
Other contributors:
dheeraj-gupta4
Work Items
----------
* Add a database connection field to RequestContext
* Add a context manager to nova.context that populates a RequestContext with
the database connection information given a CellMapping
* Modify nova.db.sqlalchemy.api get_session and get_engine to use the database
connection information from the context, if it's set
Dependencies
============
* https://blueprints.launchpad.net/nova/+spec/cells-v2-mapping
* https://blueprints.launchpad.net/nova/+spec/cells-instance-mapping
Testing
=======
Since no user visible changes will occur with this change, the current suite of
Tempest or functional tests should be sufficient.
Documentation Impact
====================
Developer documentation could be written to describe how to use the new
interfaces.
References
==========
* https://etherpad.openstack.org/p/kilo-nova-cells

View File

@@ -0,0 +1,374 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=========================
Centralize Config Options
=========================
Include the URL of your launchpad blueprint:
https://blueprints.launchpad.net/nova/+spec/centralize-config-options
Nova has around 800 config options*. Those config options are the interface
to the cloud operators. Unfortunately they often lack a good documentation
which
* explains their impact,
* shows their interdependency to other config options and
* explains which of the Nova services they influence.
This cloud operator interface needs to be consolidated and one way of doing
this is, to move the config options from their declaration in multiple modules
to a few centrally managed modules. These centrally managed modules should
also provide the bigger picture of the configuration surface we provide. This
got already discussed on the ML [1].
\* see the "nova.flagmappings" file which get generated in the
"openstack-manuals" project for the "configuration reference" manual.
Problem description
===================
Same as above
Use Cases
----------
* As an end user I'm not affected by this change and won't notice a difference.
* As a developer I will find all config options in one place and will add
further config options to that central place.
* As a cloud operator I will see more helpful descriptions on the config
options. The default values, names, sections won't change in any way and
my ``nova.conf`` files will work as before.
Proposed change
===============
The change consists of two views,
* a technical one, which describes how the refactoring is done in terms
of code placement
* and a quality view, which describes the standard a good config option
help text has to fulfill.
Technical View
--------------
There was a proof of concept in Gerrit which shows the intention [2]. The
steps are as followed:
#. There will be a new package called ``nova/conf``.
#. This package contains a module for each natural grouping (mostly the
section name) in the ``nova.conf`` file. For example:
* ``nova/conf/default.py``
* ``nova/conf/ssl.py``
* ``nova/conf/cells.py``
* ``nova/conf/libvirt.py``
* [...]
#. All ``CONF.import_opt(...)`` calls get removed from the functional modules
as they don't serve their purpose anymore. That's because after the import
of ``nova.conf``, all config options will be available.
#. All ``CONF.register_opts(...)`` calls get moved to the modules
``nova/conf/<module-name>.py``. By that these modules can control
themselves under which group name the options get registered. The module
``nova/conf/__init__.py`` imports those modules and triggers the
registration with ``<module-name>.register_opts(CONF)``. This allows the
usage of::
import nova.conf
CONF = nova.conf.CONF
if CONF.<section>.<config-option>:
# do something
Which means that the normal functional code, which uses the config options
doesn't need to get changed for this.
#. There will only be one ``nova/conf/opts.py`` module which is necessary to
build the ``nova.conf.sample`` file. This ``opts.py`` module is the single
point of entry for that. All other ``opts.py`` will be removed at the end,
for example the ``nova/virt/opts.py`` file.
Quality View
------------
Operators will work with this interface, so the documentation has to be
precise and non-ambiguous. So let's have a view at some negative examples and
why I consider them not sufficient. After that, the changed positive example
should show which direction we should go. This section will close with a
generic template for config options which should be implemented during this
refactoring.
**Negative Examples:**
The following example is from the *serial console* feature::
cfg.StrOpt('base_url',
default='ws://127.0.0.1:6083/',
help='Location of serial console proxy.'),
It lacks the description which services use this, how one can decide to
use another port and what the impact this has.
Another example from the *image cache* feature::
cfg.IntOpt('image_cache_manager_interval',
default=2400,
help='Number of seconds to wait between runs of the '
'image cache manager. Set to -1 to disable. '
'Setting this to 0 will run at the default rate.'),
On the plus side, it shows the possible values and their impact, but does
not describe which service consumes this and if it has interdependencies
to other config options.
**Positive Example:**
Here is an example how this could look like for a config option of the
*serial console* feature::
serial_opt_base_url = cfg.StrOpt('base_url',
default='ws://127.0.0.1:6083/',
help="""The token enriched URL which is
returned to the end user to connect to the nova-serialproxy service.
This URL is the handle an end user will get (enriched with a token at
the end) to establish the connection to the console of a guest.
Services which consume this:
* ``nova-compute``
Possible values:
* A string which is a URL
Related options:
* The IP address must be identical to the address to which the
``nova-serialproxy`` service is listening (see option
``serialproxy_host`` in section ``[serial_console]``).
* The port must be the same as in the option ``serialproxy_port``
of section ``[serial_console]``.
* If you choose to use a secured websocket connection, start this
option with ``wss://`` instead of the unsecured ``ws://``.
The options ``cert`` and ``key`` in the ``[DEFAULT]`` section
have to be set for that.'"""),
serial_console_group = cfg.OptGroup(name="serial_console",
title="The serial console feature",
help="""The serial console feature
allows you to connect to a guest in case a graphical console like VNC or
SPICE is not available.""")
CONF.register_opt(serial_opt_base_url, group=serial_console_group)
Another example can be made for the *image cache* feature::
cfg.IntOpt('image_cache_manager_interval',
default=2400,
min=-1,
help="""Number of seconds to wait between runs of
the image cache manager.
The image cache manager is responsible for ensuring that local disk doesn't
fill with backing images that aren't currently in use. It should be noted
that if local disk is too full to start a new instance, and cleaning the
image cache would free enough space to make the hypervisor node usable then
the hypervisor node wont be usable until the next run of the image cache
manager. In other words, the cache manager is not run more frequently as
a hypervisor node becomes resource constrained.
Services which consume this:
* ``nova-compute``
Possible values:
* ``-1`` Disables the cleaning of the image cache.
* ``0`` Runs the cleaning at the default rate.
* Other values greater than ``0`` describes the number of seconds
between two cleanups
Related options:
* None
"""),
**Generic Template**
Based on the positive example above, the generic template a config option
should fulfill to be descriptive to the operators would be::
help="""#A short description what it does. If it is a unit (e.g. timeout)
# describe the unit which is used (seconds, megabyte, mebibyte, ...)
# A long description what the impact and scope is. The operators should
# know the expected change in the behavior of Nova if they tweak this.
Services which consume this:
# A list of services which consume this option. Operators should not
# read code to know which one of the services will change its behavior.
# Nor should they set this in every ``nova.conf`` file to be sure.
Possible values:
# description of possible values. Especially if this is an option
# with numeric values (int, float), describe the edge cases (like the
# min value, max value, 0, -1).
Related options:
# Which other config options have to be considered when I change this
# one? If it stand solely on its own, use "None"
"""),
Alternatives
------------
The ML discussion [2] concluded that the following ideas wouldn't work for us:
#. *Move all of the config options into one single ``flags.py`` module.*
It was reasoned that this file would be vastly huge and that merge
conflicts for the contributors would be unavoidable.
#. *Ship the config options in data files with the code rather than being*
*inside the Python code itself.* It was reasoned that this could cause a
missing update of the config options description if it was used in a
different way than before.
#. *Don't use config options directly in the functional code. Make a*
*dependency injection to the object which needs the configured value*
*and depend only on that objects attributes.* Yes, this is the one with
the most benefit in terms of testability, clean code, OOP practices and
so on. The outcome of this blueprint is also to get a feeling how that
approach could be done in the end. A first proof of concept [3] was a bit
cumbersome.
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
#. It could also be that we like to deprecate options because they don't get
used anymore.
#. Otherwise the deployer should get more and more happy about helpful texts
and descriptions.
Developer impact
----------------
#. Contributors which are actively working on config options could have merge
conflicts and need to rebase.
#. New config options should directly be added to the new central place at
``nova/conf/<section>.py``.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Markus Zoeller (markus_z)
https://launchpad.net/~mzoeller
Other contributors:
None (but highly welcome)
Work Items
----------
#. create folder ``nova/conf`` with modules for each ``nova.conf`` section
#. move options from a functional module to the section module from above
#. enhance the help texts from config options and option groups.
Dependencies
============
#. Depending on the outcome of the discussion of [4] which proposes to enrich
the config option object by interdependencies, we could use that. But this
blueprint doesn't have a hard dependency on that.
#. Depending on the outcome of the discussion of [5] which proposes to enrich
the config option object by allowing to format the help text with a markup
language, we could use that. But this blueprint doesn't have a hard
dependency on that.
Testing
=======
The ``nova.conf`` sample gets generated as part of the ``docs`` build.
If this fails we know that something went wrong.
Documentation Impact
====================
None
References
==========
[1] MailingList "openstack-dev"; July 2015; "Streamlining of config options
in nova":
http://lists.openstack.org/pipermail/openstack-dev/2015-July/070306.html
[2] Gerrit; PoC; "DO NOT MERGE: Example of config options reshuffle":
https://review.openstack.org/#/c/214581
[3] Gerrit; PoC; "DO NOT MERGE: replace global CONF access by object":
https://review.openstack.org/#/c/218319
[4] Launchpad; oslo.config; blueprint "option-interdependencies"
https://blueprints.launchpad.net/oslo.config/+spec/option-interdependencies
[5] Launchpad; oslo.config; blueprint "help-text-markup"
https://blueprints.launchpad.net/oslo.config/+spec/help-text-markup
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Mitaka
- Introduced

View File

@@ -0,0 +1,315 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=======================================================
Check the destination host when migrating or evacuating
=======================================================
https://blueprints.launchpad.net/nova/+spec/check-destination-on-migrations
Provide a way to make sure that resource allocation is consistent for all
operations, even if a destination host is provided.
Problem description
===================
Live migrations and evacuations allow the possibility to either specify a
destination host or not. The former option totally bypasses the scheduler by
calling the destination Compute RPC API directly.
Unfortunately, there are some cases when migrating a VM, it breaks the
scheduler rules so it so it potentially breaks future boot requests due
to some constraints not enforced when migrating/evacuating (like allocation
ratios).
We should modify that logic to explicitly call the Scheduler any time a move
(ie. either a live-migration or an evacuation) is requested (whether the
destination host is provided or not) so that the Scheduler would verify the
destination host thru all the enabled filters and if successful consume the
instance usage from its internal HostState.
That said, we also understand that there are usecases where an
operator wants to move an instance manually and not call the scheduler, even
if the operator knows that he explicitly breaks scheduler rules (eg. a
filter not passing, an affinity policy violated or an instance taking an
already allocated pCPU in the context of CPU pinning).
Use Cases
----------
Some of the normal usecases (verifying the destination) could be :
As an operator, I want to make sure that the destination host I'm providing
when live migrating a specific instance would be correct and wouldn't break my
internal cloud because of a discrepancy between how I calculate the destination
host capacity and how the scheduler is taking in account memory allocation
ratio (see the References section below)
As an operator, I want to make sure that live-migrating an instance to a
specific destination wouldn't impact my existing instances running on that
destination host because of some affinity that I missed.
Proposed change
===============
This spec goes beyond what the persist-request-spec blueprint [1] by making
sure that before each call to select_destinations(), the RequestSpec object is
read from the current instance to schedule and will make sure that after the
result of select_destinations(), the RequestSpec object will be persisted.
That way, we will be able to get the original RequestSpec from the
corresponding instance from the user creating the VM including the scheduler
hints. Given that, we propose to amend the RequestSpec object to include a new
field called ``requested_destination`` which would be a ComputeNode object (at
least having the host and hypervisor_hostname fields set) and would be set by
the conductor for each method (here live-migrate and rebuild_instance
respectively) accepting an optional destination host.
Note that this new field would nothing have in common with a migration object
or an Instance.host field, since it would just be a reference to an equivalent
scheduler hint saying 'I want to go there' (and not the ugly force_hosts
information passed as an Availability Zone hack...).
It will be the duty of the conductor (within the live_migrate and evacuate
methods) to get the RequestSpec related to the instance, add the
``requested_destination`` field, set the related Migration object to
``scheduled`` and call the scheduler's ``select_destinations`` method.
The last step would be of course to store the updated RequestSpec object.
If the requested destination is unacceptable for the scheduler, then the
conductor will change the Migration status to ``conflict``.
The idea behind that is that the Scheduler would check that field in the
_schedule() method of FilterScheduler and would then just call the filters only
for that destination.
As the RequestSpec object blueprint cares about backwards compatibility by
providing the legacy ``request_spec`` and ``filter_properties`` to the old
``select_destinations`` API method, we wouldn't pass the new
``requested_destination`` field as a key for the request_spec.
Since this BP also provides a way for operators to bypass the Scheduler, we
will amend the API for all migrations including a destination host by adding an
extra request body argument called ``force`` (accepting True or False,
defaulted to False) and the corresponding CLI methods will expose that
``force`` option. If the microversion asked by the client is older than the
version providing the field, then it won't be passed (neither True or False,
rather the key won't exist) to the conductor so the conductor won't call the
scheduler - to keep the existing behaviour (see the REST API section below for
further details).
In order to keep track of those forced calls, we propose to log as an instance
action the fact that the migration has been forced so that the operator could
potentially reschedule the instance later on if he wishes. For that, we propose
to add two new possible actions, called ``FORCED_MIGRATE`` (when live-migrating
) and ``FORCED_REBUILD`` (when evacuating)
That way means that an operator can get all the instances having either
``FORCED_MIGRATE`` or ``FORCED_REBUILD`` just by calling the
/os-instance-actions API resource for each instance, and we could also later
add a new blueprint (out of that spec scope) for getting the list of instances
having the last specific action set to something (here FORCED_something).
Alternatives
------------
We could just provide a way to call the scheduler for having an answer if the
destination host is valid or not, but it wouldn't consume the instance usage
which is from our perspective the key problem with the existing design.
Data model impact
-----------------
None.
REST API impact
---------------
The proposed change just updates the POST request body for the
``os-migrateLive`` and ``evacuate`` actions to include the
optional ``force`` boolean field defaulted to False if the request has a
minimum version.
Depending on whether the ``host`` and ``force`` fields are set or null, the
actions and return codes are:
- If a host parameter is supplied in the request body, the scheduler will now
be asked to verify that the requested target compute node is actually able to
accommodate the request, including honouring all previously-used scheduler
hints. If the scheduler determines the request cannot be accommodated by the
requested target host node, the related Migration object will change the
``status`` field to ``conflict``.
- If a host parameter is supplied in the request body, a new --force parameter
may also be supplied in the request body. If present, the scheduler shall
**not** be consulted to determine if the target compute node can be
accommodated, and no Migration object will be updated.
- If --force parameter is supplied in the request body but the host parameter
is either null (for live-migrate) or not provided (for evacuate), then an
HTTP 400 Bad Request will be served to the user.
Of course, since it's a new request body attribute, it will get a new API
microversion, meaning that if the attribute is not provided, the scheduler
won't be called by the conductor (to keep the existing behaviour where setting
a host bypasses the scheduler).
* JSON schema definition for the body data of ``os-migrateLive``:
::
migrate_live = {
'type': 'object',
'properties': {
'os-migrateLive': {
'type': 'object',
'properties': {
'block_migration': parameter_types.boolean,
'disk_over_commit': parameter_types.boolean,
'host': host,
'force': parameter_types.boolean
},
'required': ['block_migration', 'disk_over_commit', 'host'],
'additionalProperties': False,
},
},
'required': ['os-migrateLive'],
'additionalProperties': False,
}
* JSON schema definition for the body data of ``evacuate``:
::
evacuate = {
'type': 'object',
'properties': {
'evacuate': {
'type': 'object',
'properties': {
'host': parameter_types.hostname,
'force': parameter_types.boolean,
'onSharedStorage': parameter_types.boolean,
'adminPass': parameter_types.admin_password,
},
'required': ['onSharedStorage'],
'additionalProperties': False,
},
},
'required': ['evacuate'],
'additionalProperties': False,
}
* There should be no policy change as we're not changing the action by itself
but rather just providing a new option.
Security impact
---------------
None.
Notifications impact
--------------------
None.
Other end user impact
---------------------
Python-novaclient will accept a ``force`` option for the following methods :
- evacuate
- live-migrate
Performance Impact
------------------
A new RPC call will be done by default when migrating or evacuating
but it shouldn't really impact the performance since it's the normal behaviour
for a general migration. In order to leave that RPC asynchronous from the API
query, we won't give the result of the check within the original request, but
rather modify the Migration object status (see the REST API impact section
above).
Other deployer impact
---------------------
None.
Developer impact
----------------
None.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
sylvain-bauza
Work Items
----------
- Read any existing RequestSpec before calling ``select_destinations()`` in all
the conductor methods calling it
- Amend RequestSpec object with ``requested_destination`` field
- Modify conductor methods for evacuate and live_migrate to fill in
``requested_destination``, call ``scheduler_client.select_destinations()``
and persist the amended RequestSpec object right after the call.
- Modify FilterScheduler._schedule() to introspect ``requested_destination``
and call filters for only that host if so.
- Extend the API (and bump a new version) to add a ``force`` attribute for both
above API resources with the appropriate behaviours.
- Bypass the scheduler if the flag is set and log either ``FORCED_REBUILD`` or
``FORCED_MIGRATE`` action.
- Add a new ``force`` option to python-novaclient and expose it in CLI for both
``evacuate`` and ``live-migrate`` commands
Dependencies
============
As said above in the proposal, since scheduler hints are part of the request
and are not persisted yet, we need to depend on persisting the RequestSpec
object [1] before calling ``select_destinations()`` so that a future migration
would read that RequestSpec and provide it again.
Testing
=======
API samples will need to be updated and unittests will cover the behaviour.
In-tree functional tests will be amended to cover that option.
Documentation Impact
====================
As said, API samples will be modified to include the new attribute.
References
==========
[1] http://specs.openstack.org/openstack/nova-specs/specs/liberty/approved/persist-request-spec.html
Lots of bugs are mentioning the caveat we described above. Below are the ones
I identified and who will be closed once the spec implementation lands :
- https://bugs.launchpad.net/nova/+bug/1451831
Specifying a destination node with nova live_migration does not take into
account overcommit setting (ram_allocation_ratio)
- https://bugs.launchpad.net/nova/+bug/1214943
Live migration should use the same memory over subscription logic as instance
boot
- https://bugs.launchpad.net/nova/+bug/1452568
nova allows to live-migrate instance from one availability zone to another

View File

@@ -0,0 +1,194 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
============================================================
Add ability to support discard/unmap/trim for Cinder backend
============================================================
https://blueprints.launchpad.net/nova/+spec/cinder-backend-report-discard
Currently, libvirt/qemu has support for a discard option when attaching a
volume to an instance. With this feature, the unmap/trim command can be sent
from guest to the physical storage device.
A cinder back-end will report a connection capability that Nova will use
in attaching a volume.
Problem description
===================
Currently there is no way for Nova to know if a Cinder back end supports
discard/trim/unmap functionality. Functionality is being added in Cinder
to supply this information. The spec seeks to add the ability to consume
that information.
Use Cases
---------
If a Cinder backend uses media that can make use of discard functionality
there should be a way to do this. This will improve long term performance
of such back ends.
Proposed change
===============
Code will be added to check for a 'discard' property returned to Nova from
the Cinder attach API. When present and set to True we will modify the config
returned by the libvirt volume driver to contain::
driver_discard = "unmap"
This will only give the desired support if the instance is configured with a
interface and bus type that will support Trim/Unmap commands. In the case where
it is possible to detect that discard will not actually work for the instance
we will log a warning, but continue on with the attach anyway.
Currently the virtio-blk backend does not support discard.
There will be several ways to get an instance that will support discard, one
example is to use the virtio-scsi storage interface with a scsi bus type. To
create an instance with this support it must be booted from an image
configured with ``hw_scsi_model=virtio-scsi`` and ``hw_disk_bus=scsi``.
It is important to note that the nova.conf option hw_disk_discard is NOT read
for this feature. We rely entirely on Cinder to specify whether or not discard
should be used for the volume.
Alternatives
------------
Alternatives include adding discard for all drives if the operator has set
hw_disk_discard but it was decided this was not a good way to solve the
problem as you could not mix different underlying volume providers easily.
We could also hot-plug a SCSI controller that is capable of supporting discard
when attaching Cinder volumes. This would allow for mixing a non-trim boot
disk from an image and then attaching a Cinder volume that would get the
benefits. The risk is that the instance may not be able to actually support
doing UNMAP.
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
There will be a performance gain for back ends that benefit from having
discard functionality.
See https://en.wikipedia.org/wiki/Trim_(computing) for more info.
Other deployer impact
---------------------
Deployers wanting to use this feature with their Cinder backend will need to
ensure the instances are configured with a SCSI model and bus that support
discard. This includes IDE, AHCI, and Xen disks. virtio-blk is the only
backend missing this support.
A simple way to enable this is to modify Glance images to contain the
following properties::
hw_scsi_model=virtio-scsi
hw_disk_bus=scsi
In addition compute nodes will need to be using libvirt 1.0.6 or higher and
QEMU 1.6.0 or higher.
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
* Patrick East
Work Items
----------
* Modify volume attach code in libvirt driver to check for the new Cinder
connection property.
* Add unit tests for new functionality, modify any existing as needed.
* Configure Pure Storage 3rd party CI system to enable the feature and
validate it as a Cinder CI. This configuration change will be made available
to any other 3rd party CI maintainer to allow additional systems to test with
this feature enabled.
Dependencies
============
Cinder Blueprint (Completed and released in Liberty):
https://blueprints.launchpad.net/cinder/+spec/cinder-backend-report-discard
Testing
=======
Unit tests needs to include all permutations of the discard
flag from Cinder.
We could enable one of the jenkins jobs to be configured to enable this. A nice
starting point would maybe be the Ceph jobs. Potentially a Tempest test could
be added behind a config option to validate volume attachments do get the
correct discard settings.
Documentation Impact
====================
We may want to add documentation to the Cloud Administrator Guide on how to
utilize this feature.
References
==========
Cinder Blueprint:
https://blueprints.launchpad.net/cinder/+spec/cinder-backend-report-discard
Cinder Spec:
http://specs.openstack.org/openstack/cinder-specs/specs/liberty/cinder-backend-report-discard.html
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Mitaka
- Introduced

View File

@@ -0,0 +1,229 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
Get valid server state
==========================================
https://blueprints.launchpad.net/nova/+spec/get-valid-server-state
When a compute service fails, the power states of the hosted VMs are not
updated. A normal user querying his or her VMs does not get any indication
about the failure. Also there is no indication about maintenance.
Problem description
===================
VM query do not give needed information to the user about a compute host that
is failed/unreachable, nova-compute service that is failed/stopped or
nova-compute service that is explicitly marked as failed or disabled. The user
should get the information about nova-compute state when querying his or her
VMs to get better understanding about the situation.
Use Cases
---------
As a user I want to be able to have accurate VM state information even when the
compute service fails or host is down, so I can do quick actions for my VMs.
Mostly the failure information is critical to a user having HA type of VMs that
needs to make a quick switch over for service. Other thing is for user or admin
to do something for the VMs on the host. Action might be case and deployment
specific, as some admin actions can be automated for external service and some
left to user. Normally user can just do just delete or create for a VM.
As a user I want to get information about maintenance, so I can do actions for
my VMs. As user get information about host being in maintenance (service=
disabled), user knows to plan what to do for his or her VMs as host may be
rebooted soon.
Proposed change
===============
A new ``host_status`` field will be added to the ``/servers/{server_id}`` and
``/servers/detail`` endpoints. ``host_status`` will be ``UP`` if nova-compute's
state is up, ``DOWN`` if nova-compute is forced_down, ``UNKNOWN`` if
nova-compute last_seen_up is not up-to-date and ``MAINTENANCE`` if
nova-compute's state disabled. Needed information can be retriewed by host
API and servicegroup API if new policy allows. forced_down flag handling is
described in this spec:
http://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/mark-host-down.html
A new policy element will be added to control access to ``host_status``. This
can be used both to prevent this host-based data being disclosed as well as to
eliminate the performance impact of this feature.
Alternatives
------------
When returning the VM power_state, check the service status for the host. If
the service is ``forced_down``, return ``UNKNOWN`` instead. This would be an
API-only change, it is NOT proposed that we update the DB value to
``UNKNOWN``. This means we retain a record of the VM power state independent
of the service state, which may be interesting in case the host lost network
rather than power. Community feedback indicated that as the power_state is only
true for a point in time anyway, technically the state is always ``UNKNOWN``.
``os-services/force-down`` could mark all VMs managed by the affected service
as ``UNKNOWN`` in db. This would sometimes be wrong as a VM can be up even if
its host is unreachable. This would make also a need to remove this state data
in case VM evacuated to another compute node.
A possible extension is a host ``NEEDS_MAINTENANCE`` state, which would show
that maintenance is required soon. This would allow users who monitor this info
to prepare their VMs for downtime and enter maintenance at a time convenient
for them.
An extension could be added for filtering ``/servers`` and ``/servers/detail``
endpoints response message by ``host_status``.
Data model impact
-----------------
None
REST API impact
---------------
GET ``/v2.1/{tenant_id}/servers/{server_id}`` and ``/v2.1/{tenant_id}/servers/
detail`` will return ``host_status`` field if "os_compute_api:servers:show:
host_status" policy is defined for the user. This will require a microversion.
Case where nova-compute enabled and reporting normally::
GET /v2.1/{tenant_id}/servers/{server_id}
200 OK
{
"server": {
"host_status": "UP",
...
}
}
Case where nova-compute enabled, but not reporting normally::
GET /v2.1/{tenant_id}/servers/{server_id}
200 OK
{
"server": {
"host_status": "UNKNOWN",
...
}
}
Case where nova-compute enabled, but forced_down::
GET /v2.1/{tenant_id}/servers/{server_id}
200 OK
{
"server": {
"host_status": "DOWN",
...
}
}
Case where nova-compute disabled::
GET /v2.1/{tenant_id}/servers/{server_id}
200 OK
{
"server": {
"host_status": "MAINTENANCE",
...
}
}
This may be presented by python-novaclient as::
+-------+------+--------+------------+-------------+----------+-------------+
| ID | Name | Status | Task State | Power State | Networks | Host Status |
+-------+------+--------+------------+-------------+----------+-------------+
| 9a... | vm1 | ACTIVE | - | RUNNING | xnet=... | UP |
+-------+------+--------+------------+-------------+----------+-------------+
New policy element to be added to allow assigning permission to see
host_status:
::
"os_compute_api:servers:show:host_status": "rule:admin_api"
Security impact
---------------
Normal users may be able to correlate host states across multiple VMs to draw
conclusions about the cloud topology. This can be prevented by not granting the
policy.
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
An additional database query will be required to look up the service when a
server detail request is received.
Other deployer impact
---------------------
None
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee: Tomi Juvonen
Other contributors: None
Work Items
----------
* Expose host_status as detailed.
* Update python-novaclient.
Dependencies
============
None
Testing
=======
Unit and functional test cases needs to be added.
Documentation Impact
====================
API change needs to be documented:
* Compute API extensions documentation.
http://developer.openstack.org/api-ref-compute-v2.1.html
References
==========
* https://blueprints.launchpad.net/nova/+spec/mark-host-down
* OPNFV Doctor project: https://wiki.opnfv.org/doctor

View File

@@ -0,0 +1,387 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
===========================
Nova Signature Verification
===========================
https://blueprints.launchpad.net/nova/+spec/nova-support-image-signing
OpenStack currently does not support signature validation of uploaded signed
images. Equipping Nova with the ability to validate image signatures will
provide end users with stronger assurances of the integrity of the image data
they are using to create servers. This change will use the same data model for
image metadata as the accompanying functionality in Glance, which will allow
the end user to sign images and verify these image signatures upon upload [1].
Problem description
===================
Currently, OpenStack's protection against unexpected modification of images is
limited to verifying an MD5 checksum. While this may be sufficient for
protecting against accidental modifications, MD5 is a hash function, not an
authentication primitive [2], and thus provides no protection against
deliberate, malicious modification of images. An image could potentially be
modified in transit, such as when it is uploaded to Glance or transferred to
Nova. An image that is modified could include malicious code. Providing
support for signature verification would allow Nova to verify the signature
before booting and alert the user of successful signature verification via a
future API change. This feature will secure OpenStack against the following
attack scenarios:
* Man-in-the-Middle Attack - An adversary with access to the network between
Nova and Glance is altering image data as Nova downloads the data from
Glance. The adversary is potentially incorporating malware into the image
and/or altering the image metadata.
* Untrusted Glance - In a hybrid cloud deployment, Glance is hosted on
machines which are located in a physically insecure location or is hosted by
a company with limited security infrastructure. Adversaries may be able to
compromise the integrity of Glance and/or the integrity of images stored by
Glance through physical access to the host machines or through poor network
security on the part of the company hosting Glance.
Please note that our threat model considers only threats to the integrity of
images while they are in transit between the end user and Glance, while they
are at rest in Glance and while they are in transit between Glance and Nova.
This threat model does not include, and this feature therefore does not
address, threats to the integrity, availability, or confidentiality of Nova.
Use Cases
---------
* A user wants a high degree of assurance that a customized image which they
have uploaded to Glance has not been accidentally or maliciously modified
prior to booting the image.
With this proposed change, Nova will verify the signature of a signed image
while downloading that image. If the image signature cannot be verified, then
Nova will not boot the image and instead place the instance into an error
state. The user will begin to use this feature by uploading the image and the
image signature metadata to Glance via the Glance API's image-create method.
The required image signature metadata properties are as follows:
* img_signature - A string representation of the base 64 encoding of the
signature of the image data.
* img_signature_hash_method - A string designating the hash method used for
signing. Currently, the supported values are SHA-224, SHA-256, SHA-384 and
SHA-512. MD5 and other cryptographically weak hash methods will not be
supported for this field. Any image signed with an unsupported hash
algorithm will not pass validation.
* img_signature_key_type - A string designating the signature scheme used to
generate the signature.
* img_signature_certificate_uuid - A string encoding the certificate
uuid used to retrieve the certificate from the key manager.
The image verification functionality in Glance uses the signature_utils
module to verify this signature metadata before storing the image. If the
signature is not valid or the metadata is incomplete, this API method will
return a 400 error status and put the image into a "killed" state. Note that,
if the signature metadata is simply not present, the image will be stored as
it would normally.
The user would then create an instance from this image using the Nova API's
boot method. If the verify_glance_signatures flag in nova.conf is set to
'True', Nova will call out to Glance for the image's properties, which include
the properties necessary for image signature verification. Nova will pass the
image data and image properties to the signature_utils module, which will
verify the signature. If signature verification fails, or if the image
signature metadata is either incomplete or absent, booting the instance will
fail and Nova will log an exception. If signature verification succeeds, Nova
will boot the instance and log a message indicating that image signature
verification succeeded along with detailed information about the signing
certificate.
Proposed change
===============
The first component in this change is the creation of a standalone module
responsible for the bulk of the functionality necessary for image signature
verification. This module will primarily consist of three public-facing
methods: an initializing method, an updating method, and a verifying method.
The initializing method will take the signing certificate uuid and the
specified hash method as inputs. This method will then fetch the signing
certificate by interfacing with the key manager through Castellan, extract the
public key, store the public key, certificate and hash method as attributes
and return an instance of the signature verification module. As the image's
data is downloaded, the signature verification module will be updated by
passing chunks of image to the verifying module via the update method. When
all chunks of image data have been passed to the verifier, the service
desiring verfication will call the verify method, passing it the image
signature. More specifically, this module will apply the public key to the
signature, and compare this result to the result of applying the hash
algorithm to the image data. This workflow is essentially a wrapped version of
the workflow by which signature verification occurs in pyca/cryptography.
We then propose an initial implementation by incorporating this module into
Nova's control flow for booting instances from images. Upon downloading an
image, Nova will check whether the verify_glance_signatures configuration flag
is set in nova.conf. If so, the module will perform image signature
verification using image properties passed to Nova by Glance. If this fails,
or if the image signature metadata is incomplete or missing, Nova will not
boot the image. Instead, Nova will throw an exception and log an error. If the
signature verification succeeds, Nova will proceed with booting the instance.
The next component will be to add functionality to the pyca/cryptography
library which will validate a given certificate chain against a pool of given
root certificates which are known to be trusted. This algorithm for validating
chains of certificates against a set of trusted root certificates is a
standard, and has been outlined in RFC 5280 [3].
Once the certificate validation functionality has been added to the
pyca/cryptography library, we will amend the signature_utils module by
incorporating certificate validation into the signature verification workflow.
We will implement functionality in the signature_utils module which will use
GET requests to dynamically fetch the certificate chain for a given
certificate. Any service using the signature_utils module will now call the
signature_utils module's initializing method with an additional parameter: a
list of references representing a pool of trusted root certificates. This
module will then use its certificate chain fetching functionality to build the
certificate chain for the signing certificate, fetch the root certificates
through Castellan, and will verify this chain against the trusted root
certificates using the functionality in the pyca/cryptography library. If the
chain fails validation, then an exception will be thrown and signature
verification will fail. Nova will retrieve the root certificate references
necessary to call the updated functionality of the signature_utils module by
reading the references in from a root_certificate_references configuration
option in nova.conf.
Future API changes are necessary to mitigate attacks that are possible when
Glance is untrusted; such as Glance returning a different signed image than the
image that was requested. Possible changes include the following extensions:
* Modify the REST API to accept a specific signature required to verify the
integrity of the image. If the specified signature cannot be verified, then
Nova refuses to boot the image and returns an appropriate error message to
the end user. This change builds upon a spec that allows overriding image
properties at boot time [4].
* Modify the REST API to provide metadata back to the end user for successful
boot requests. This metadata would include the signing certificate ownership
information and a base64 encoding of the signature. The user can use an out-
of-band mechanism to manually verify that the encoded version of the
signature matches the expected signature.
The first approach is preferred since it may be fully automated whereas the
second approach requires manual verification by the end user.
The certificate references will be used to access the certificates from a key
manager through the interface provided by Castellan.
Alternatives
------------
An alternative to signing the image's data directly is to support signatures
which are created by signing a hash of the image data. This introduces
unnecessary complexity to the feature by requiring an additonal hashing stage
and an additional metadata option. Due to the Glance community's performance
concerns associated with hashing image data, we initially pursued an
implementation which produced the signature by signing an MD5 checksum which
was already computed by Glance. This approach was rejected by the Nova
community due to the security weaknesses of MD5 and the unnecessary complexity
of performing a hashing operation twice and maintaining information about both
hash algorithms.
An alternative to using pyca/cryptography for the hashing and signing
functionality is to use PyCrypto. We are electing to use pyca/cryptography
based on both the shift away from PyCrypto in OpenStack's requirements and the
recommendations of cryptographers reviewing the accompanying Glance spec [5].
An alternative to using certificates for signing and signature verification
would be to use a public key. However, this approach presents the significant
weakness that an attacker could generate their own public key in the key
manager, use this to sign a tampered image, and pass the reference to their
public key to Nova along with their signed image. Alternatively, the use of
certificates provides a means of attributing such attacks to the certificate
owner, and follows common cryptographic standards by placing the root of trust
at the certificate authority.
An alternative to using the verify_glance_signatures configuration flag to
specify that Nova should perform image signature verification is to use
"trusted" flavors to specify that individual instances should be created from
signed images. The user, when using the Nova CLI to boot an instance, would
specify one of these "trusted" flavors to indicate that image signature
verification should occur as part of the control flow for booting the
instance. This may be added in a later change, but will not be included in the
initial implementation. If added, the trusted flavors option will work
alongside the configuration option approach. In this case, Nova would perform
image signature verification if either the configuration flag is set, or if
the user has specified booting an instance of the "trusted" flavor.
Supporting the untrusted Glance use case requires future modifications to the
REST API as previously described. An alternative to the proposed approach uses
a "sign-the-hash" method for signatures instead of signing the image content
directly. In this case, Nova's REST API can be modified to allow the user to
specify a hash algorithm and expected hash value as part of the boot command.
If the actual hash value does not match, then Nova will not boot the image.
Signing the hash instead of the image directly is useful because hashes are
commonly provided for cloud images and users can obtain these hashes
out-of-band.
Data model impact
-----------------
The accompanying work in Glance introduced additional Glance image properties
necessary for image signing. The initial implementation in Nova will introduce
a configuration flag indicating whether Nova should perform image signature
verification before booting an image. The updated implementation which
includes certificate validation will introduce an addtional configuration flag
for specifying the trusted root certificates.
REST API impact
---------------
A future change will modify the request or response to the boot command. This
change supports the untrusted Glance use cases by giving the user additional
assurance that the desired image has been booted.
Security impact
---------------
Nova currently lacks a mechanism to validate images prior to booting them. The
checksum included with an image protects against accidental modifications but
provides little protection against an adversary with access to Glance or to
the communication network between Nova and Glance. This feature facilitates
the creation of a logical trust boundary between Nova and Glance; this trust
boundary permits the end user to have high assurance that Nova is booting an
image signed by a trusted user.
Although Nova will use certificates to perform this task, the certificates
will be stored by a key manager and accessed via Castellan.
Notifications impact
--------------------
None
Other end user impact
---------------------
If the verification of a signature fails, then Nova will not boot an instance
from the image, and an error message will be logged. The user would then have
to edit the image's metadata through the Glance API, the Nova API, or the
Horizon interface; or reinitiate an upload of the image to Glance with the
correct signature metadata in order to boot the image.
Performance Impact
------------------
This feature will only be used if the verify_glance_signatures configuration
flag is set.
When signature verification occurs there will be latency as a result of
retrieving certificates from the key manager through the Castellan interface.
There will also be CPU overhead associated with hashing the image data and
decrypting a signature using a public key.
Other deployer impact
---------------------
In order to use this feature, a key manager must be deployed and configured.
Additionally, Nova must be configured to use a root certificate which has a
root of trust that can respond to an end user's certificate signing requests.
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
dane-fichter
Other contributors:
brianna-poulos
joel-coffman
Reviewers
---------
Core reviewer(s):
None
Work Items
----------
The feature will be implemented in the following stages:
* Create standalone signature_utils module which handles interfacing with a
key manager through Castellan and verifying signatures.
* Add functionality to Nova which calls the standalone module when Nova
uploads a Glance image and the verify_glance_signatures configuration flag
is set.
* Add certificate validation functionality to the pyca/cryptography library.
* Add functionality to the signature_utils module which fetches certificate
chains. Incorporate this method, along with the pyca/cryptography library's
certificate validation functionality into the signature_utils module's
functionality for verifying image signatures.
* Amend the initial implementation in Nova to utilize this change by allowing
Nova to fetch root certificate references and pass them to the image
signature verification method.
* Implement a REST API change to respond to a successful boot request with
information relevant to the signing data and/or implement a REST API change
to allow the end user to specify the expected signature at boot time.
Dependencies
============
The pyca/cryptography library, which is already a Nova requirement, will be
used for hash creation and signature verification. The certificate validation
portion of this change is dependent upon adding certificate validation
functionality to the pyca/cryptography library.
In order to simplify the interaction with the key manager and allow multiple
key manager backends, this feature will use the Castellan library [6]. Since
Castellan currently only supports integration with Barbican, using Castellan
in this feature indirectly requires Barbican. In the future, as Castellan
supports a wider variety of key managers, our feature will require minimal
upkeep to support these key managers; we will simply update Nova's and
Glance's requirements to use the latest Castellan version.
Testing
=======
Unit tests will be sufficient to test the functionality implemented in Nova.
We will need to implement Tempest and functional tests to test the
interoperability of this feature with the accompanying functionality in
Glance.
Documentation Impact
====================
Instructions for how to use this functionality will need to be documented.
References
==========
Cryptography API: https://pypi.python.org/pypi/cryptography/0.2.2
[1] https://review.openstack.org/#/c/252462/
[2] https://en.wikipedia.org/wiki/MD5#Security
[3] https://tools.ietf.org/html/rfc5280#section-6.1
[4] https://review.openstack.org/#/c/230382/
[5] https://review.openstack.org/#/c/177948/
[6] http://git.openstack.org/cgit/openstack/castellan

View File

@@ -0,0 +1,245 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=========================================
Support triggering crash dump in a server
=========================================
https://blueprints.launchpad.net/nova/+spec/instance-crash-dump
This spec adds a new API to trigger crash dump in a server (instance or
baremetal) by injecting a driver-specific signal to the server.
Problem description
===================
For now, we can not trigger crash dump in a server from nova. But users need
this functionality for some debug purpose.
If OS occurs a bug(kernel panic), it triggers the kernel crash dump by itself.
But if the OS is *stalling*, we need to trigger crash dump from hardware.
Different platforms could have different ways to trigger crash dump in a
server. And Nova drivers need to implement them.
For x86 platform, using NMI(Non-maskable Interruption) could trigger crash dump
in OS. User should configure the OS to trigger crash dump when it receives an
NMI. In Linux, it can be done by::
$ echo 1 > /proc/sys/kernel/panic_on_io_nmi
Many hypervisors support injecting NMI to instance.
* Libvirt supports the command "virsh inject-nmi" [1].
* Ipmitool supports the command "ipmitool chassis power diag" [2].
* Hyper-V Cmdlets supports the command
"Debug-VM -InjectNonMaskableInterrupt" [3].
So we should add an API to inject NMI to server in driver level. Libvirt driver
has implemented such an API [6]. And so will ironic driver for baremetal. And
then add an Nova API to trigger crash dump in server.
This should be optional for drivers.
Use Cases
---------
An end user needs an interface to trigger crash dump in his servers. By the
trigger, the kernel crash dump mechanism dumps the production memory image as
dump file, and reboot the kernel again. After that, the end user can get the
dump file in his server's disk, and investigate the problem reason based on the
file.
This spec only implement the process of triggering crash dump. Where the dump
file will be depends on how the user configures the dump mechanism in his
server. Take Linux as an example:
* If user configures kdump to store dump file on local disk, then user needs to
reboot the server and access the dump file on local disk.
* If user configures kdump to copy dump file to NFS storage, then user could
find the dump file on NFS storage without rebooting the server.
Proposed change
===============
* Add a libvirt driver API to inject NMI to an instance.
(Already merged in Liberty. [6])
* Add an ironic driver API to inject NMI to a baremetal.
* Add a Nova API to trigger crash dump in server using the driver API
introduced above. If the hypervisor doesn't support injecting NMI,
NotImplementedError will be raised. This method does not modify instance's
task_state or vm_state.
* A new instance action will be introduced.
Alternatives
------------
None
Data model impact
-----------------
None
REST API impact
---------------
* Specification for the method
* A description of what the method does suitable for use in user
documentation
* Trigger crash dump in a server.
* Method type
* POST
* Normal http response code
* 202: Accepted
* Expected error http response code
* badRequest(400)
* When RPC doesn't support this API, this error will be returned. If a
driver does not implement the API, the error is handled by the new
instance action because the API is asynchronous.
* itemNotFound(404)
* There is no instance or baremetal which has the specified uuid.
* conflictingRequest(409)
* The server status must be ACTIVE, PAUSED, RESCUED, RESIZED or ERROR.
If not, this code is returned.
* If the specified server is locked, this code is returned to a user
without administrator privileges. When using the kernel dump
mechanism, it causes a server reboot. So, only administrators can
send an NMI to a locked server as other power actions.
* URL for the resource
* /v2.1/servers/{server_id}/action
* Parameters which can be passed via the url
* A server uuid is passed.
* JSON schema definition for the body data
::
{
"trigger_crash_dump": null
}
* JSON schema definition for the response data
* When the result is successful, no response body is returned.
* When an error occurs, the response data includes the error message [5].
* This REST API will require an API microversion.
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
* A client API for this new API will be added to python-novaclient
* A CLI for the new API will be added to python-novaclient. ::
nova trigger-crash <server>
Performance Impact
------------------
None
Other deployer impact
---------------------
The default policy for this API is for admin and owners by default.
Developer impact
----------------
This spec will implement the new API in libvirt driver, ironic driver, and
nova itself.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Tang Chen (tangchen)
Other contributors:
shiina-horonori (hshiina)
Work Items
----------
* Add a new REST API.
* Add a new driver API.
* Implement the API in libvirt driver.
* Implement the API in ironic driver.
Dependencies
============
This spec is related to the blueprint in ironic.
* https://blueprints.launchpad.net/ironic/+spec/enhance-power-interface-for-soft-reboot-and-nmi
Testing
=======
Unit tests will be added.
Documentation Impact
====================
* The new API should be added to the documentation.
* The support matrix below will be updated because this functionality is
optional to drivers.
http://docs.openstack.org/developer/nova/support-matrix.html
References
==========
[1] http://linux.die.net/man/1/virsh
[2] http://linux.die.net/man/1/ipmitool
[3] https://technet.microsoft.com/en-us/library/dn464280.aspx
[4] https://review.openstack.org/#/c/183456/
[5] http://docs.openstack.org/developer/nova/v2/faults.html
[6] https://review.openstack.org/#/c/202380/
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Liberty
- Introduced
* - Mitaka
- Change API action name, and add ironic driver plan

View File

@@ -0,0 +1,265 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
===============================
Neutron DNS Using Nova Hostname
===============================
https://blueprints.launchpad.net/nova/+spec/neutron-hostname-dns
Users of an OpenStack cloud would like to look up their instances by name in an
intuitive way using the Domain Name System (DNS).
Problem description
===================
Users boot an instance using Nova and they give that instance an "Instance
Name" as it is called in the Horizon interface. That name is used as the
foundation for the hostname from the perspective of the operating system
running in the instance. It is reasonable to expect some integration of this
name with DNS.
Neutron already enables DNS lookup for instances using an internal dnsmasq
instance. It generates a generic hostname based on the private IP address
assigned to the system. For example, if the instance is booted with
*10.224.36.4* then the hostname generated is *host-10-224-36-4.openstacklocal.*
The generated name from Neutron is not presented anywhere in the API and
therefore cannot be presented in any UI either.
Use Cases
----------
#. DNS has a name matching the hostname which is something that sudo looks for
each time it is run [#]_. Other software exists which wants to be able to
look up the hostname in DNS. Sudo still works but a number of people
complain about the warning generated::
$ sudo id
sudo: unable to resolve host vm-1
uid=0(root) gid=0(root) groups=0(root)
#. The End User has a way to know the DNS name of a new instance. These names
are often easier to use than the IP address.
#. Neutron can automatically share the DNS name with an external DNS system
[#]_ such as Designate. This isn't in the scope of this blueprint but is
something that cannot be done without it.
.. [#] https://bugs.launchpad.net/nova/+bug/1175211
.. [#] https://review.openstack.org/#/c/88624/
Proposed change
===============
This blueprint will reconcile the DNS name between Nova and Neutron. Nova will
pass the *hostname* to the Neutron API as part of any port create or update
using a new *dns_name* field in the Neutron API. Neutron DHCP offers will use
the instance's hostname. Neutron DNS will reply to queries for the new
hostname.
To handle existing installations, Neutron will fall back completely to the
current behavior in the event that a dns_name is not supplied on the port.
Nova will pass its sanitized hostname when it boots using an existing Neutron
port by updating the port with the dns_name field. This will be augmented in
the following ways:
#. Nova will pass the VM hostname using a new *dns_name* field in the port
rather than the *name* field on create or update. The handling of the
hostname will be consistent with cloud-init.
- If Nova is creating the port, or updating a port where dns_name is not
set, then it sets dns_name to the VM hostname.
- If an existing port is passed to Nova with dns_name set then Nova will
reject that as an invalid network configuration and fail the request,
unless the hostname and the port's dns_name match. Nova will not attempt
to adopt the name from the port. This is confusing to the user and a
source of errors if a port is reused between instances.
#. Nova will recognize an error from the Neutron API server and retry without
*dns_name* if it is received. This error will be returned if Neutron has
not been upgraded to handle the dns_name field. This check will be
well-documented in the code as a short-term issue and will be deprecated in
a following release. Adding this check will save deployers from having to
coordinate deployment of Nova and Neutron.
#. Neutron will insure the dns_name passed to it for DNS label validity and
also for uniqueness within the scope of the configured domain name. If it
fails, then both the port create and the instance boot will fail. Neutron
will *only* begin to fail port creations *after* it has been upgraded with
the corresponding changes *and* the user has enabled DNS resolution on the
network by associating a domain name other than the default openstack.local.
This will avoid breaking existing work-flows that might use unacceptable DNS
names.
.. NOTE:: If the user updates the dns_name on the Neutron port after the VM has
already booted then there will be an inconsistency between the hostname in
DNS and the instance hostname. This blueprint will not do any special
handling of this case. The user should not be managing the hostname through
both Nova and Neutron. I don't see this as a big concern for user
experience.
Alternatives
------------
Move Validation to Nova
~~~~~~~~~~~~~~~~~~~~~~~
Duplicate name detection could be attempted in Nova. I've seen duplicate names
in the wild. Nova likely does not have the information necessary to check for
duplicate names within the appropriate scope. For example, I would like to
check duplicate names per domain across networks, this will be difficult for
Nova.
Move Port Creation Earlier
~~~~~~~~~~~~~~~~~~~~~~~~~~
It may be better if Nova could attempt port creation with Neutron before the
API operation completes so that the API operation will fail if the port
creation fails. In the current design, the Nova API call will succeed and the
port creation failure will cause the instance to go to an error state. I
believe the thing preventing this is the use case where a bare-metal instance
is being booted. In that use case, Nova must wait until the instance has been
scheduled before it can get the mac address of the interface to give to port
creation.
This change will make for a better user experience in the long run. However,
this work is out of the scope of this blueprint and can be done as follow up
work independently. One possibility that should be explored is to allow
updating the Neutron port with the mac address when it is known.
Send Neutron DNS name back to Nova
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I briefly considered a design where instead of returning an error to Nova,
Neutron would accept whatever Nova passed as the hostname. If it failed
validation then Neutron would fall back to its old behavior and generate a DNS
name based on the IP address. This IP address would've been fed back to Nova
through the existing port status notifications that Neutron already sends back
to Nova. It would then be written in to the Nova database so that it can be
shown to the user.
Feedback from the community told me that this would create a poor user
experience because the system would be making a decision to ignore user input
without a good mechanism for communicating that back to the user.
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
This will provide a better user experience overall. With the hostname being
fed to Neutron, it will be available in the DNS in Neutron and optionally -- in
the future -- in DNSaaS externally, as specified in [#]_. This improves the
integration of these services from the user's point of view.
.. [#] https://review.openstack.org/#/c/88624/
Performance Impact
------------------
If the Nova upgrade is deployed before the corresponding Neutron upgrade then
there will be a period of time where Nova will make two calls to Neutron for
every port create. The first call will fail and then Nova will make a second
call without the *dns_name* field which will be expected to pass like before.
To avoid undue performance impact in situations where the Nova upgrade is
deployed but Neutron is not upgraded for a significant period of time, a
configuration option will be implemented to enable or disable the behavior
described in the previous paragraph. The default value will be disabled.
Other deployer impact
---------------------
This change was carefully designed to allow new Nova and Neutron code to be
deployed independently. The new feature will be available when both upgrades
are complete.
DNS names will only be passed for new instances after this feature is enabled.
Nova will begin passing dns_name to Neutron after an upgrade only for new
instances.
If Neutron is upgraded before Nova, there is no problem because the dns_name
field is not required and behavior defaults to old behavior.
If Nova is upgraded before Neutron then Nova will see errors from the
Neutron API when it tries passing the dns_name field. Once again, Nova
should recognize this error and retry the operation without the dns_name.
The deployer should be aware of the `Performance Impact`_ discussed.
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
`miguel-lavalle <https://launchpad.net/~minsel>`_
Other contributors:
`zack-feldstein <https://launchpad.net/~zack-feldstein>`_
Work Items
----------
#. Modify existing proposal to pass hostname using *dns_name* field rather
than *host*.
#. Handle expected errors by retrying without dns_name set.
Dependencies
============
In order for this to work end to end, the corresponding changes in Neutron
merged during the Liberty cycle.
https://blueprints.launchpad.net/neutron/+spec/internal-dns-resolution
Testing
=======
Tempest tests should be added or modified for the following use cases
- An instance created using the nova API can be looked up using the instance
name.
Documentation Impact
====================
Mention in the documentation that instance names will be used for DNS. Be
clear that it will be the Nova *hostname* that will be used. Also, detail the
scenarios where instance creation will fail.
#. It will only fail when DNS has been enabled for the Neutron network by
associating a domain other than openstack.local.
#. An invalid DNS label was given.
#. Duplicate names were found on the same domain.
References
==========
None

View File

@@ -0,0 +1,202 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
Libvirt: AIO mode for disk devices
==========================================
https://blueprints.launchpad.net/nova/+spec/libvirt-aio-mode
Libvirt and qemu provide two different modes for asynchronous IO (AIO),
"native" and "threads". Right now nova always uses the default thread mode.
Depending on the disk type that is used for backing guest disks,
it can be beneficial to use the native IO mode instead.
Problem description
===================
Storage devices that are presented to instances can be backed by a variety
of different storage backends. The storage device can be an image residing
in the file system of the hypervisor, it can be a block device which
is passed to the guest or it can be provided via a network. Images can have
different formats (raw, qcow2 etc.) and block devices can be backed by
different hardware (ceph, iSCSI, fibre channel etc.).
These different image formats and block devices require different settings
in the hypervisor for optimizing IO performance. Libvirt/qemu offers a
configurable asynchronous IO mode which increases performance when it
is set correctly for the underlying image/block device type.
Right now nova sticks with the default setting, using userspace threads
for asynchronous IO.
Use Cases
----------
A deployer or operator wants to make sure that the users get the best
possible IO performance based on the hardware and software stack that is
used.
Users may have workloads that depend on optimal disk performance.
Both users and deployers would prefer that the nova libvirt driver
automatically picks the asynchronous IO mode that best fits the
underlying hardware and software.
Proposed change
===============
The goal is to enhance the nova libvirt driver to let it choose the disk
IO mode based on the knowledge it already has about the device in use.
For cinder volumes, different LibvirtVolumeDriver implementations exist
for the different storage types. A new interface will be added to let
the respective LibvirtVolumeDriver choose the AIO mode.
For ephemeral storage, the XML is generated by LibvirtConfigGuestDisk,
which also allows to distinguish between file, block and network
attachment of the guest disk.
Restrictions on when to use native AIO mode
-------------------------------------------
* Native AIO mode will not be enabled for sparse images as it can cause
Qemu threads to be blocked when filesystem metadata need to be updated.
This issue is much more unlikely to appear when using preallocated
images. For the full discussion, see the IRC log in `[4]`_.
* AIO mode has no effect if using the in-qemu network clients (any disks
that use <disk type='network'>). It is only relevant if using the
in-kernel network drivers (source: danpb)
In the scenarios above, the default AIO mode (threads) will be used.
Cases where AIO mode is beneficial
----------------------------------
* Raw images and pre-allocated images in qcow2 format
* Cinder volumes that are located on iSCSI, NFS or FC devices.
* Quobyte (reported by Silvan Kaiser)
Alternatives
------------
An alternative implementation would be to let the user specify the AIO mode
for disks, similar to the current configurable caching mode which allows to
distinguish between file and block devices. However, the AIO mode that
fits best for a given storage type does not depend on the workload
running in the guest, and it would be beneficial not to bother the operator
with additional configuration parameters.
Another option would be to stick with the current approach - using the
libvirt/qemu defaults. As there is no single AIO mode that fits best for
all storage types, this would leave many users with inefficient settings.
Data model impact
-----------------
No changes to the data model are expected, code changes would only impact the
libvirt/qemu driver and persistent data are not affected.
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
IO performance for instances that run on backends which allow to exploit
native IO mode will be improved. No adverse effect on other components.
Other deployer impact
---------------------
None
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
alexs-h
Work Items
----------
* Collect performance data for comparing AIO modes on different storage types
* Implement AIO mode selection for cinder volumes
* Implement AIO mode selection for ephemeral storage
Dependencies
============
None
Testing
=======
Unit tests will be provided that verify the libvirt XML changes generated
by this feature.
Also, CI systems that run libvirt/qemu would use the new AIO mode
configuration automatically.
Documentation Impact
====================
Wiki pages that cover IO configuration with libvirt/qemu as a hypervsior
should be updated.
References
==========
* _`[1]` General overview on AIO:
http://www.ibm.com/developerworks/library/l-async/
* _`[2]` Best practices: Asynchronous I/O model for KVM guests
https://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaat/liaatbpkvmasynchio.htm
* _`[3]` Libvirt and QEMU Performance Tweaks for KVM Guests
"http://wiki.mikejung.biz/KVM/_Xen#AIO_Modes"
* _`[4]` qemu irc log
http://paste.openstack.org/show/480498/
History
=======
None

View File

@@ -0,0 +1,329 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
======================================
Libvirt hardware policy from libosinfo
======================================
https://blueprints.launchpad.net/nova/+spec/libvirt-hardware-policy-from-libosinfo
When launching an instance Nova needs to make decisions about how to configure
the virtual hardware. Currently these decisions are often hardcoded, or driven
by nova.conf settings, and sometimes by Glance image properties. The goal of
this feature is to allow the user to specify the guest OS type and then drive
decisions from this fact, using the libosinfo database.
Problem description
===================
When launching an instance Nova needs to make decisions about how to configure
the virtual hardware in order to optimize operation of the guest OS. The right
decision inevitably varies depending on the type of operating system being
run. The right decision for a Linux guest, might be the wrong decision for a
Windows guest or vica-verca. The most important example is the choice of the
disk and network device models. All Linux guests want to use virtio, since it
offers by far the best performance, but this is not available out of the box in
Windows images so is a poor default for them. A second example is whether the
BIOS clock is initialized with UTC (preferred by UNIX) or localtime (preferred
by Windows). Related to the clock are various timer policy settings which
control behaviour when the hypervisor cannot keep up with the required
interrupt injection rate. The Nova defaults work for Linux and Windows, but
are not suitable for some other proprietary operating systems.
While it is possible to continue to allow overrides of config via glance
image properties this is not an particularly appealing approach. A number of
the settings are pretty low level and so not the kind of thing that a cloud
application should directly expose to users. The more hypervisor specific
settings are placed on a glance image, the harder it is for one image to be
used to boot VMs across multiple different hypervisors. It also creates a
burden on the user to remember a long list of settings they must place on the
images to obtain optimal operation.
Historically most virtualization applications have gone down the route of
creating a database of hardware defaults for each operating system. Typically
though, each project has tried to reinvent the wheel each time duplicating
each others work leading to a plethora of incomplete & inconsistent databases.
The libosinfo project started as an attempt to provide a common solution for
virtualization applications to use when configuring virtual machines. It
provides a user extendible database of information about operating systems,
including facts such as the supported device types, minimum resource level
requirements, installation media and more. Around this database is a C API for
querying information, made accessible to non-C languages (including python) via
the magic of GObject Introspection. This is in use by the virt-manager and
GNOME Boxes applications for configuring KVM and Xen guests and is easily
consumable from Nova's libvirt driver.
Use Cases
----------
The core goal is to make it simpler for an end user to boot a disk image with
the optimal virtual hardware configuration for the guest operating system.
Consider that Nova is configured to use virtio disk & network devices by
default, so optimize performance for the common Linux guests. In modern
Linux though, there is the option of using a better virtio SCSI driver.
Currently the user has to set properties like
# glance image-update \
--property hw_disk_bus=scsi \
--property hw_scsi_model=virtio-scsi \
...other properties...
name-of-my-fedora21-image
There's a similar issue if the user wants to run guests which do not
support virtio drivers at all:
# glance image-update \
--property hw_disk_bus=ide \
--property hw_nic_model=e1000 \
...other properties...
name-of-my-windows-xp-image
We also wish to support per-OS timer drift policy settings and do not
wish to expose them as properties, since it would be even more onerous
on the user. eg
# glance image-update \
--property hw_rtc_policy=catchup \
--property hw_pit_policy=delay \
...other properties...
name-of-my-random-os-image
With this feature, in the common case it will be sufficient to just inform
Nova of the operating system name
# glance image-update \
--property os_name=fedora21 \
name-of-my-fedora-image
Project Priority
-----------------
None.
Proposed change
===============
There is an existing 'os_type' glance property that can be used to indicate
the overall operating system family (windows vs linux vs freebsd). This is too
coarse to be able to correctly configure all the different versions of these
operating systems. ie the right settings for Windows XP are not the same as the
right settings for Windows 2008. The intention is to declare support for a
new standard property 'os_name'. The acceptable values for this property will
be taken from the libosinfo database, either of these attributes:
* 'short-id' - the short name of the OS
eg fedora21, winxp, freebsd9.3
* 'id' - the unique URI identifier of the OS
eg http://fedoraproject.org/fedora/21, http://microsoft.com/win/xp,
http://freebsd.org/freebsd/9.3
For example the user can set one of:
'''
# glance image-update \
--property os_name=fedora21 \
name-of-my-fedora-image
# glance image-update \
--property os_name=http://fedoraproject.org/fedora/21 \
name-of-my-fedora-image
When building the guest configuration, the Nova libvirt driver will look
for this 'os_name' property and query the libosinfo database to locate
the operating system records. It will then use this to choose the default
disk bus and network model. If available it will also lookup clock and
timer settings, but this requires further development in libosinfo before
it can be used.
In the case that libosinfo is not installed on the compute host, the
current Nova libvirt driver functionality will be unchanged.
It may be desirable to add a new nova.conf setting in the '[libvirt]'
section to turn on/off the use of libosinfo for hardware configuration.
This would make it easier for the cloud admin to control behaviour
without having to change which RPMs/packages are installed. eg
'''
[libvirt]
hardware_config=default|fixed|libosinfo
Where
* default - try to use libosinfo, otherwise fallback to fixed defaults
* fixed - always use fixed defaults even if libosinfo is installed
* libosinfo - always use libosinfo and abort if not installed
In the future it might be possible to automatically detect what operating
system is present inside a disk image using libguestfs. This would remove
the need to even set the 'os_name' image property, and thus allow people to
obtain optimal guest performance out of the box with no special config tasks
required. Such auto-detection is out of scope for this blueprint though.
Alternatives
------------
A 1st alternative would be for Nova to maintain its own database of preferred
hardware settings for each operating system. This is the trap most previous
virtualization applications have fallen into. This has a significant burden
because of the huge variety of operating systems in existence. It is
undesirable to attempt to try to reinvent the libosinfo wheel which is already
mostly round in shape.
An 2nd alternative would be for Nova to expose glance image properties for
every single virtual hardware configuration aspect that needs to vary per
guest operating system type. This would mean the user is required to have a
lot of knowledge about low level hardware configuration which goes against
the general cloud paradigm. It is also a significant burden to remember to
set so many values.
Data model impact
-----------------
There will be no database schema changes.
There will be a new standard glance image property defined which will be stored
in the existing database tables, and should be considered a long term supported
setting.
REST API impact
---------------
There are no API changes required. The existing glance image property support
is sufficient to achieve the goals of this blueprint.
Security impact
---------------
Since this is simply about tuning the choice of virtual hardware settings there
should not be any impact on security of the host / cloud system.
Notifications impact
--------------------
No change.
Other end user impact
---------------------
The end user will need to know about the 'os_name' glance property and the list
of permissible values, as defined by the libosinfo project. This is primarily a
documentation task.
Performance Impact
------------------
Broadly speaking there should be no performance impact on the operation of the
OpenStack services themselves. Some choices of guest hardware, however, might
impose extra CPU overhead on the hypervisors. Since users already have the
ability to choose different disk/net models directly, this potential
performance impact is not a new (or significant) concern. It falls under the
general problem space of achieving strong separation between guest virtual
machines via resource utilization limits.
Other deployer impact
---------------------
There is likely to be a new configuration option in the nova.conf file under
the libvirt group. Most deployers can ignore this and leave it on its default
value which should just "do the right thing" in normal operation. It is there
as a override to force a specific usage policy.
Deployers may wish to install the libosinfo library on their compute nodes, in
order to allow Nova libvirt driver to use this new feature. If they do not
install the libosinfo library, operation of Nova will be unchanged vs previous
releases. Installation can be done with the normal distribution package
management tools. It is expected that OpenStack specific provisioning tools
will eventually choose to automate this during cloud deployment.
In the case of private cloud deployments, the cloud administrator may wish to
provide additional libosinfo database configuration files, to optimize any
custom operating systems their organization uses.
Developer impact
----------------
Maintainers of other virtualization drivers may wish to engage with the
libosinfo project to collaborate on extending its database to be suitable for
use with more virtualization technologies beyond KVM and Xen. This would
potentially enable its use with other virt drivers within Nova. It is none the
less expected that the non-libvirt virt drivers will simply ignore this new
feature in the short-to-medium term at least.
The new 'os_name' property might be useful for VMWare which has a mechanism for
telling the VMWare hypervisor what guest operating system is installed in a VM.
This would entail defining some mapping between libosinfo values and the VMWare
required values, which is a fairly straightforward task.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
vladikr
Other contributors:
berrange
Work Items
----------
* Integrate with libosinfo for setup of default disk/network device
models in the Nova libvirt driver
* Extend devstack to install the libosinfo & object introspection packages
* Work with libosinfo community to define metadata for clock and timer
preferences per OS type
* Extend Nova libvirt driver to configure clock/timer base on libosinfo
database
Dependencies
============
The Nova libvirt driver will gain an optional dependency on the libosinfo
project/library. This will be accessed by the GObject introspection Python
bindings. On Fedora / RHEL systems this will entail installation of the
'libosinfo' packages and either the 'pyobject2' or 'python3-gobject' packages
(yes, both Python 2 and 3 are supported). Other modern Linux distros also
have these packages commonly available.
Note that although the GObject Introspection framework was developed under the
umbrella of the GNOME project, it does not have any direct requirements for the
graphical desktop infrastructure. It is part of their low level gobject library
which is a reusable component leveraged by many non-desktop related projects
now.
Testing
=======
The unit tests will of course cover the new code.
To test in Tempest would need a gate job which has the suitable packages
installed. This can be achieved by updating devstack to install the necessary
bits. Some new tests would need to be created to set the new glance image
property and then verify that the guest virtual machine has received the
expected configuration changes.
Documentation Impact
====================
The new glance image property will need to be documented. It is also likely
that we will want to document the list of valid values for this property.
Alternatively document how the user can go about learning the valid values
defined by libosinfo.
References
==========
* http://libosinfo.org
* https://wiki.gnome.org/action/show/Projects/GObjectIntrospection
* https://live.gnome.org/PyGObject

View File

@@ -0,0 +1,352 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
===========================
Libvirt real time instances
===========================
https://blueprints.launchpad.net/nova/+spec/libvirt-real-time
The CPU pinning feature added to the ability to assign guest virtual CPUs
to dedicated host CPUs, providing guarantees for CPU time and improved worst
case latency for CPU scheduling. The real time feature builds on that work
to provide stronger guarantees for worst case scheduler latency for vCPUs.
Problem description
===================
The CPU pinning feature allowed guest vCPUs to be given dedicated access to
individual host pCPUs. This means virtual instances will no longer suffer
from "steal time" where their vCPU is pre-empted in order to run a vCPU
belonging to another guest. Removing overcommit eliminates the high level
cause of guest vCPU starvation, but guest vCPUs are still susceptible to
latency spikes from various areas in the kernel.
For example, there are various kernel tasks that run on host CPUs, such as
interrupt processing that can preempt guest vCPUs. QEMU itself has a number
of sources of latency, due to its big global mutex. Various device models
have sub-optimal characteristics that will cause latency spikes in QEMU,
as may underling host hardware. Avoiding these problems requires that the
host kernel and operating system be configured in a particular manner, as
well as the careful choice of which QEMU features to exercise. It also
requires that suitable scheduler policies are configured for the guest
vCPUs.
Assigning huge pages to a guest ensures that guest RAM cannot be swapped out
on the host, but there are still other arbitrary memory allocations for the
QEMU emulator. If parts of QEMU get swapped out to disk, then can have an
impact on the performance of the realtime guest.
Enabling realtime is not without cost. In order to meet the strict worst
case requirements for CPU latency, overall throughput of the system must
necessarily be compromised. As such it is not reasonable to have the
real time feature unconditionally enabled for an OpenStack deployment.
It must be an opt-in that is used only in the case where the guest workload
actually demands it.
As an indication of the benefits and tradeoffs of realtime, it is useful
to consider some real performance numbers. With bare metal and dedicated
CPUs but non-realtime scheduler policy, worst case latency is on the order
of 150 microseconds, and mean latency is approx 2 microseconds. With KVM
and dedicated CPUs and a realtime scheduler policy, worst case latency
is 14 microseconds, and mean latency is < 10 microseconds. This shows
that while realtime brings significant benefits in worst case latency,
the mean latency is still significantly higher than that achieved on
bare metal with non-realtime policy. This serves to re-inforce the point
that realtime is not something to unconditionally use, it is only
suitable for specific workloads that require latency guarantees. Many
apps will find dedicated CPUs alone to be sufficient for their needs.
Use Cases
---------
Tenants who wish to run workloads where CPU execution latency is important
need to have the guarantees offered by a real time KVM guest configuration.
The NFV appliances commonly deployed by members of the telco community are
one such use case, but there are plenty of other potential users. For example,
stock market trading applications greatly care about scheduling latency, as
may scientific processing workloads.
It is expected that this feature would predominently be used in private
cloud deployments. As well as real-time compute guarantees, tenants will
usually need corresponding guarantees in the network layer between the
cloud and the service/system it is communicating with. Such networking
guarantees are largely impractical to achieve when using remote public
clouds across the internet.
Proposed change
===============
The intention is to build on the previous work done to enable use of NUMA
node placement policy, dedicated CPU pinning and huge page backed guest
RAM.
The primary requirement is to have a mechanism to indicate whether realtime
must be enabled for an instance. Since real time has strict pre-requisites
in terms of host OS setup, the cloud administrator will usually not wish
to allow arbitrary use of this feature. Realtime workloads are likely to
comprise a subset of the overall cloud usage, so it is anticipated that
there will be a mixture of compute hosts, only some of which provide a
realtime capability.
For this reason, an administrator will need to make use of host aggregates
to partition their compute hosts into those which support real time and
those which do not.
There will then need to be a property available on the flavor
* hw:cpu_realtime=yes|no
which will indicate whether instances booted with that flavor will be
run with a realtime policy. Flavors with this property set to 'yes'
will need to be associated with the host aggregate that contains hosts
supporting realtime.
A pre-requisite for enabling the realtime feature on a flavor is that
it must also have 'hw:cpu_policy' is set to 'dedicated'. ie all real
time guests must have exclusive pCPUs assigned to them. You cannot give
a real time policy to vCPUs that are susceptible to overcommit, as that
would lead to starvation of the other guests on that pCPU, as well as
degrading the latency guarantees.
The precise actions that a hypervisor driver takes to configure a guest
when real time is enabled are implementation defined. Different hypevisors
will have different configuration steps, but the commonality is that all
of them will be providing vCPUs with an improved worst case latency
guarantee, as compared to non-realtime instances. The tenant user does
not need to know the details of how the requirements are met, merely
that the cloud can support the necessary latency guarantees.
In the case of the libvirt driver with the KVM hypervisor, it is expected
that setting the real time flavor will result in the following guest
configuration changes
* Entire QEMU and guest RAM will be locked into memory
* All vCPUs will be given a fixed realtime scheduler priority
As well as the vCPU workload, most hypervisors have one or more other
threads running in the control plane which do work on behalf of the
virtual machine. Most hypervisors hide this detail from users, but
the QEMU/KVM hypervispor exposes it via the concept of emulator
threads. With the initial support for dedicated CPUs, Nova was set
to confine the emulator threads to run on the same set of pCPUs
that the guest's vCPUs are placed. This is highly undesirable in
the case of realtime, because these emulator threads will be
doing work that can impact latency guarantees. There is thus a
need to place emulator threads in a more fine precise fashion.
Most guest OS will run with multiple vCPUs and have at least one of
their vCPUs dedicated to running non-realtime house keeping tasks.
Given this, the intention is that the emulator threads be co-located
with the vCPU that is running non-realtime tasks. This will in turn
require another tunable, which can be set either on the flavor, or
on the image. This will indicate which vCPUs will have realtime policy
enabled:
* hw:cpu_realtime_mask=^0-1
This indicates that all vCPUs, except vCPUs 0 and 1 will have
a realtime policy. ie vCPUs 0 and 1 will remain non-realtime.
The vCPUs which have a non-realtime policy will also be used to
run the emulator thread(s). At least one vCPU must be reserved
for non-realtime workloads, it is an error to configure all
vCPUs to be realtime. If the property is not set, then the
default behaviour will be to reserve vCPU 0 for non-realtime
tasks. This property will be overridable on the image too via
the hw_cpu_realtime_mask property.
In the future it may be desirable to allow emulator threads to
be run on a host pCPU that is completely separate from those
running the vCPUs. This would, for example, allow for running
of guest OS, where all vCPUs must be real-time capable, and so
cannot reserve a vCPU for real-time tasks. This would require
the scheduler to treat the emulator threads as essentially being
a virtual CPU in their own right. Such an enhancement is considered
out of scope for this blueprint in order to remove any dependency
on scheduler modifications. It will be dealt with in a new blueprint
* https://blueprints.launchpad.net/nova/+spec/libvirt-emulator-threads-policy
A significant portion of the work required will be documenting the
required compute host and guest OS setup, as much of this cannot be
automatically performed by Nova itself. It is anticipated that the
developers of various OpenStack deployment tools will use the
documentation to extend their tools to be able to deploy realtime
enabled compute hosts. This is out of scope of this blueprint,
however, which will merely document the core requirements. Tenants
building disk images will also need to consume this documentation
to determine how to configure their guest OS.
Alternatives
------------
One option would be to always enable a real time scheduler policy when the
guest is using dedicated CPU pinning and always enable memory locking when
the guest has huge pages. As explained in the problem description, this is
highly undesirable as an approach. The real time guarantees are only achieved
by reducing the overall throughput of the system. So unconditionally enabling
realtime for hosts / guests which do not require it would significantly waste
potential compute resources. As a result it is considered mandatory to have
an opt-in mechanism for enabling real time.
Do nothing is always an option. In the event of doing nothing, guests would
have to put up with the latencies inherent in non-real time scheduling, even
with dedicated pCPUs. Some of those latencies could be further mitigated by
careful host OS configuration, but extensive performance testing as shown that
even with carefully configured host and dedicated CPUs, worst case latencies
for a non-realtime task will be at least a factor of x10 worse than when
realtime is enabled. Thus not supporting realtime guests within OpenStack
will exclude Nova from use in a variety of scenarios, forcing users to
deployment alternative non-openstack solutions, or requiring openstack
vendors to fork the code and ship their own custom realtime solutions. Neither
of these are attractive options for OpenStack users or vendors in the long
term, as it would either loose user share, or balkanize the openstack
ecosystem.
Data model impact
-----------------
None required
REST API impact
---------------
None required
Security impact
---------------
The enablement of real time will only affect the pCPUs that are assigned to
the guest. Thus if the tenant is already permitted to use dedicated pCPUs
by the operator, enabling real time does not imply any further privileges.
Thus real time is not considered to introduce any new security concerns.
Notifications impact
--------------------
None
Other end user impact
---------------------
The tenant will have the ability to request real time via an image property.
They will need to carefully build their guest OS images to take advantage
of the realtime characteristics. They will to obtain information from their
cloud provider as to the worst case latencies their deployment is capable
of satisfying, to ensure that it can achieve the requirements of their
workloads.
Performance Impact
------------------
There will be no new performance impact to Nova as a whole. This is building
on the existing CPU pinning and huge pages features, so the scheduler logic is
already in place. Likewise the impact on the host is restricted to pCPUs which
are already assigned to a guest.
Other deployer impact
---------------------
The operator will have the ability to define real time flavors by setting a
flavor extra spec property.
The operator will likely wish to make use of host aggregates to assign a
certain set of compute nodes for use in combination with huge pages and CPU
pinning. This is a pre-existing impact from those features, and real time does
not alter that.
Developer impact
----------------
Other virt drivers may wish to support the flavor/image properties for
enabling real time scheduling of their instances, if their hypervisor has
such a feature.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
sahid
Other contributors:
berrange
Work Items
----------
The primary work items are
* Add the 'hw_cpu_realtime_mask' field to the ImageMetaProps object
* Update the libvirt guest XML configuration when the real time flavor or
image properties are present
* Update the Nova deployment documentation to outline what host OS setup
steps are required in order to make best use of the real time feature
Dependencies
============
* The libvirt project needs to add support for the XML feature to enable
real time scheduler priority for guests. Merged as of 1.2.13
* The KVM/kernel project needs to produce recommendations for optimal
host OS setup. Partially done - see KVM Forum talks. Collaboration
will be ongoing during development to produce Nova documentation.
If the libvirt emulator threads policy blueprint is implemented, then
the restriction that real-time guests must be SMP can be lifted, to
allow for UP realtime guests. This is not a strict pre-requisite
though, merely a complementary piece of work to allow real-time to
be used in a broader range of scenarios.
* https://blueprints.launchpad.net/nova/+spec/libvirt-emulator-threads-policy
* https://review.openstack.org/225893
Testing
=======
None of the current OpenStack community test harnesses check the performance
characteristics of guests deployed by Nova, which is what would be needed to
validate this feature.
The key functional testing requirement is around correct operation of
the existing Nova CPU pinning and huge pages features and their
scheduler integration. This is outside the scope of this particular
blueprint.
Documentation Impact
====================
The deployment documentation will need to be updated to describe how to setup
hosts and guests to take advantage of real time scheduler prioritization.
Since this is requires very detailed knowledge of the system, it is expected
that the feature developers will write the majority of the content for this
documentataion, as the documentation team cannot be expected to learn the
details required.
References
==========
* KVM Forum 2015: Real-Time KVM (Rik van Riel)
* https://www.youtube.com/watch?v=cZ5aTHeDLDE
* http://events.linuxfoundation.org/sites/events/files/slides/kvmforum2015-realtimekvm.pdf
* KVM Forum 2015: Real-Time KVM for the Masses (Jan Kiszka)
* https://www.youtube.com/watch?v=SyhfctYqjc8
* http://events.linuxfoundation.org/sites/events/files/slides/KVM-Forum-2015-RT-OpenStack_0.pdf
* KVM Forum 2015: Realtime KVM (Paolo Bonzini)
* https://lwn.net/Articles/656807/
* Linux Kernel Realtime
* https://rt.wiki.kernel.org/index.php/Main_Page

View File

@@ -0,0 +1,413 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
Report more live migration progress detail
==========================================
Blueprint:
https://blueprints.launchpad.net/nova/+spec/live-migration-progress-report
When live migrations take a long time, an operator might want to take some
actions on it, such as pause the VM being migrated or cancel the live
migrations operation, or do some performance optimization.
All these actions will need based on the judgment of migration progress detail.
This spec proposes adding more progress detail report for live migration
in os-migrations API.
Problem description
===================
Some busy enterprise workloads hosted on large sized VM, such as SAP ERP
Systems, VMs running memory write intensive workloads, this may lead migration
not converge.
Now nova can not report more details of migration statistics, such as how many
data are transferred, how many data are remaining.
Without those details, the operator may not decide how to take the next action
on the migration.
Use Cases
----------
* As an operator of an OpenStack cloud, I would like to know the detail of the
migration, then I can pause/cancel or do some performance optimization.
* Some other projects, such as watcher project, want to make a strategy to
optimize performance dynamically during live migration. The strategy depends
on some details status of migration.
Proposed change
===============
Extend os-migrations API. Some new fields will be added in migration DB
and os-migrations API response.
The new fields will be updated to the migration object in
live_migration_monitor method of the libvirt driver so that API call just
needs to retrieve the object form db, traditionally API calls do not block
while they send a request to the compute node and wait for a reply.
New fields:
* memory_total: the total guest memory size.
* memory_processed: the amount memory has been transferred.
* memory_remaining: amount memory remaining to transfer.
* disk_total: total disk size.
* disk_processed: amount disk has been transferred.
* disk_remaining: amount disk remaining to transfer.
Note, the migration is always unbounded job, memoryTotal may be less than the
final sum of memoryProcessed + memoryRemaining in the event that the hypervisor
has to repeat some memory, such as due to dirtied pages during migration.
The same is true of the disk numbers. And Disk fields will all be zero when not
block migrating.
For cold migration, only the disk fields will be populated, for the drivers
that doesn't expose migration detail, the memory and disk fields will be null.
Alternatives
------------
Add a new API to report the migration status details.
Data model impact
-----------------
The `nova.objects.migration.Migration` object would have 6 new fields.
For the database schema, the following table constructs would suffice ::
CREATE TABLE migrations(
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`deleted_at` datetime DEFAULT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
`source_compute` varchar(255) DEFAULT NULL,
`dest_compute` varchar(255) DEFAULT NULL,
`dest_host` varchar(255) DEFAULT NULL,
`status` varchar(255) DEFAULT NULL,
`instance_uuid` varchar(36) DEFAULT NULL,
`old_instance_type_id` int(11) DEFAULT NULL,
`new_instance_type_id` int(11) DEFAULT NULL,
`source_node` varchar(255) DEFAULT NULL,
`dest_node` varchar(255) DEFAULT NULL,
`deleted` int(11) DEFAULT NULL,
`migration_type` enum('migration','resize','live-migration',
'evacuation') DEFAULT NULL,
`hidden` tinyint(1) DEFAULT NULL,
`memory_total` bigint DEFAULT NULL,
`memory_processed` bigint DEFAULT NULL,
`memory_remaining` bigint DEFAULT NULL,
`disk_total` bigint DEFAULT NULL,
`disk_processed` bigint DEFAULT NULL,
`disk_remaining` bigint DEFAULT NULL,
index(`instance_uuid`),
index(`deleted`)
);
REST API impact
---------------
* Extend migrations resource to get migrations statistics in a new
microversion. Then user can get the progress details of live-migration.
* GET `GET /servers/{id}/migrations`
* JSON schema definition for new fields::
non_negative_integer_with_null = {
'type': ['integer', 'null'],
'minimum': 0
}
{
'type': 'object',
'properties': {
'migrations': {
'type': 'array',
'items': {
'type': 'object',
'properties': {
'memory_total': non_negative_integer_with_null,
'memory_remaining': non_negative_integer_with_null,
'disk_total': non_negative_integer_with_null,
'disk_processed': non_negative_integer_with_null,
'disk_remainning': non_negative_integer_with_null,
..{all existing fields}...
}
'additionalProperties': False,
'required': ['memory_total', 'memory_remaining', 'disk_total',
'disk_processed', 'disk_remainning',
..{all existing fields}...]
}
}
},
'additionalProperties': False,
'required': ['migrations']
}
* The example of response body::
{
"migrations": [
{
"created_at": "2012-10-29T13:42:02.000000",
"dest_compute": "compute2",
"id": 1234,
"server_uuid": "6ff1c9bf-09f7-4ce3-a56f-fb46745f3770",
"new_flavor_id": 2,
"old_flavor_id": 1,
"source_compute": "compute1",
"status": "running",
"updated_at": "2012-10-29T13:42:02.000000",
"memory_total": 1057024,
"memory_processed": 3720,
"memory_remaining": 1053304,
"disk_total": 20971520,
"disk_processed": 20880384,
"disk_remaining": 91136,
},
]
}
The old top-level resource `/os-migrations` won't be extended anymore, any
new features will be go to the `/servers/{id}/migrations`. The old top-level
resource `/os-migrations` just keeps for admin query, may replaced by
`/servers/{id}/migrations` totally in the future. So we should add
link in the old top-level resource `/os-migrations` for guiding people to
get the new details of migration resource.
* Proposes adding new method to get each migration resource
* GET /servers/{id}/migrations/{id}
* Normal http response code: 200
* Expected error http response code
* 404: the specific in-progress migration can not found.
* JSON schema definition for the response body::
{
'type': object,
'properties': {
...{all existing fields}...
}
'additionalProperties': False,
'required': [...{all existing fields}...]
}
* The example of response body::
{
"created_at": "2012-10-29T13:42:02.000000",
"dest_compute": "compute2",
"id": 1234,
"server_uuid": "6ff1c9bf-09f7-4ce3-a56f-fb46745f3770",
"new_flavor_id": 2,
"old_flavor_id": 1,
"source_compute": "compute1",
"status": "running",
"updated_at": "2012-10-29T13:42:02.000000",
"memory_total": 1057024,
"memory_processed": 3720,
"memory_remaining": 1053304,
"disk_total": 20971520,
"disk_processed": 20880384,
"disk_remaining": 91136,
}
* There is new policy will be added
'os_compute_api:servers:migrations:show', and the default permission is
admin only.
* Proposes adding ref link to the `/servers/{id}/migrations/{id}` for
`/os-migrations`
* GET /os-migrations
* JSON schema definition for the response body::
{
'type': 'object',
'properties': {
'migrations': {
'type': 'array',
'items': {
'type': 'object',
'properties': {
'links': {
'type': 'array',
'items': {
'type': 'object',
'properties': {
'href': {
'type': 'string',
'format': 'uri'
},
'rel': {
'type': 'string',
'enum': ['self', 'bookmark'],
}
}
'additionalProperties': False,
'required': ['href', 'ref']
}
},
...
},
'additionalProperties': False,
'required': ['links', ...]
}
}
},
'additionalProperties': False,
'required': ['migrations']
}
* The example of response body::
{
"migrations": [
{
"created_at": "2012-10-29T13:42:02.000000",
"dest_compute": "compute2",
"dest_host": "1.2.3.4",
"dest_node": "node2",
"id": 1234,
"instance_uuid": "instance_id_123",
"new_instance_type_id": 2,
"old_instance_type_id": 1,
"source_compute": "compute1",
"source_node": "node1",
"status": "done",
"updated_at": "2012-10-29T13:42:02.000000",
"links": [
{
'href': "http://openstack.example.com/v2.1/openstack/servers/0e44cc9c-e052-415d-afbf-469b0d384170/migrations/1234",
'ref': 'self'
},
{
'href': "http://openstack.example.com/openstack/servers/0e44cc9c-e052-415d-afbf-469b0d384170/migrations/1234"
'ref': 'bookmark'
}
]
},
{
"created_at": "2013-10-22T13:42:02.000000",
"dest_compute": "compute20",
"dest_host": "5.6.7.8",
"dest_node": "node20",
"id": 5678,
"instance_uuid": "instance_id_456",
"new_instance_type_id": 6,
"old_instance_type_id": 5,
"source_compute": "compute10",
"source_node": "node10",
"status": "done",
"updated_at": "2013-10-22T13:42:02.000000"
"links": [
{
'href': "http://openstack.example.com/v2.1/openstack/servers/0e44cc9c-e052-415d-afbf-469b0d384170/migrations/5678",
'ref': 'self'
},
{
'href': "http://openstack.example.com/openstack/servers/0e44cc9c-e052-415d-afbf-469b0d384170/migrations/5678"
'ref': 'bookmark'
}
]
},
]
}
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
New python-novaclient command will be available, e.g.
nova server-migration-list <instance>
nova server-migration-show <instance> <migration_id>
Performance Impact
------------------
None
Other deployer impact
---------------------
None
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
ShaoHe Feng <shaohe.feng@intel.com>
Other contributors:
Yuntong Jin <yuntong.jin@intel.com>
Work Items
----------
* Add migration progress detail fields in DB.
* Write migration progress detail fields to DB.
* update the migration object in _live_migration_monitor method of the libvirt
driver.
* The API call to list os-migrations simply return data about the migration
objects, i.e. what is in DB.
* Implement new commands 'server-migration-list' and 'server-migration-show' to
python-novaclient.
Dependencies
============
None
Testing
=======
Unittest and funtional tests in Nova
Documentation Impact
====================
Doc the API change in the API Reference:
http://developer.openstack.org/api-ref-compute-v2.1.html
References
==========
os-migrations-v2.1:
http://developer.openstack.org/api-ref-compute-v2.1.html#os-migrations-v2.1
History
=======
Mitaka: Introduced

View File

@@ -0,0 +1,285 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
======================================
Making the live-migration API friendly
======================================
https://blueprints.launchpad.net/nova/+spec/making-live-migration-api-friendly
The current live-migration API is difficult to use, so we need to make the API
more user-friendly and external system friendly.
Problem description
===================
The current live-migration API requires the user to specify whether block
migration should be used with `block_migration` flag. Block migration requires
that the source and destination hosts aren't on a piece of shared storage.
Live migration without block migration requires the source and destination
hosts are on the same shared storage.
There are two problems for this flag:
* For external systems and cloud operators, it is hard to know which value
should be used for specific destination host. Before the user specifies the
value of `block_migration` flag, the user needs to figure out whether the
source and destination host on the same shared storage.
* When user passes the `host` flag with value None, the scheduler will choose a
host for user. If the scheduler selects a destination host which is on the
same shared storage with the source host, and user specifies
`block_migration` as True, the request will fail. That means scheduler didn't
know the topology of storage, so it can't select a reasonable host.
For the `host` flag, a value of None means the scheduler should choose a host.
For ease of use, the 'host' flag can be optional.
The `disk_over_commit` flag is libvirt driver specific. If the value is True,
libvirt virt driver will check the image's virtual size with disk usable size.
If the value is False, libvirt virt driver will check the image's actual size
with disk usable size. Nova API shouldn't expose any specific hypervisor
detail. This flag confuses user as well, as normally the user only wants to use
same policy of resource usage as scheduler already does.
Use Cases
---------
* API Users and external systems can use the live-migration API without
having to manually determine the storage topology of the Nova deployment.
* API Users should be able to have the scheduler select the destination host.
* Users don't want to know whether disk overcommit is needed, Nova shoud just
do the right thing.
Proposed change
===============
Make the `block_migration` flag optional, with a default value of None. When
the value is None, Nova will detect whether source and destination hosts on
shared storage. If they are on shared storage, the live-migration won't do
block migration. If they aren't on shared storage, the block migration will be
executed.
Make the `host` flag optional, and the default value is None. The behaviour
won't change.
Remove the `disk_over_commit` flag and remove the disk usage check from libvirt
virt driver.
Alternatives
------------
Ideally the Live-migration API will be improved continuously. For the flag
`block_migration`, there are two opinions on this:
* When `block_migration` flag is False, the scheduler will choose a host
which is on the shared storage with original host. When the value is True,
the scheduler will choose a host which isn't on the shared storage with
original host. This need some work for Nova to tracking the shared storage
to make scheduler choice right host.
* Remove `block_migration` flag totally, the API behaviour is always migrating
instance in one storage pool, this is people's choice in most of cases.
Anyway the shared storage can be tracked when this BP is implemented:
https://blueprints.launchpad.net/nova/+spec/resource-providers
So that will be future work.
The logic for `disk_over_commit` does not match how the ResourceTracker does
resource counting. Ideally we should have the ResourceTracker consume disk
usage, that will be done by another bug fix or proposal.
Data model impact
-----------------
None
REST API impact
---------------
The block_migration and host flag will be optional, disk_over_commit flag will
be removed, the json-schema as below::
boolean = {
'type': ['boolean', 'string', 'null'],
'enum': [True, 'True', 'TRUE', 'true', '1', 'ON', 'On', 'on',
'YES', 'Yes', 'yes',
False, 'False', 'FALSE', 'false', '0', 'OFF', 'Off', 'off',
'NO', 'No', 'no'],
}
{
'type': 'object',
'properties': {
'os-migrateLive': {
'type': 'object',
'properties': {
'block_migration': boolean,
'host': host,
},
'additionalProperties': False,
},
},
'required': ['os-migrateLive'],
'additionalProperties': False,
}
This change will need a new microversion, and the old version API will keep the
same behaviour as before.
For upgrades, if the user specifies a host which is using an old version node
with new API version, the API will return `HTTP BadRequest 400` when
`block_migration` or `disk_over_commit` is None. If user didn't specify host
and the old version node selected by host, the scheduler will retry to find
another host until there is new compute node found or reach the max number of
reties.
Currently the response body is empty. But user needs to know whether nova
decided to do block migration. The response body was proposed::
{
'type': 'object',
'properties': {
'block_migration': parameter_types.boolean,
'host': host
}
'required': ['block_migration', 'host'],
'additionalProperties': False
}
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
User needn't figure out whether the destination host is on the same shared
storage or not as the source host anymore before invoking the live-migration
API. But this may cause a block migration which will incur more load on the
live-migration network, which may be unexpected to the user. If user clearly
didn't want to block-migration, user may set specify block_migration to False
explicitly. This will be improved in the future.
Performance Impact
------------------
None
Other deployer impact
---------------------
The new REST API version won't work for old compute nodes when doing a rolling
upgrade. This is because `disk_over_commit` was removed, there isn't valid
value provided from API anymore. User only can use old version live-migration
API with old compute node.
Developer impact
----------------
None
Implementation
==============
The detection of block_migration
--------------------------------
For the virt driver interface, there are two interfaces to check if the
destination and source hosts satisfy the migration conditions. They are
`check_can_live_migrate_destination` and `check_can_live_migrate_source`. After
the check, the virt driver will return `migrate_data` to nova conductor.
We proposal that when is made with `block_migration` set to None, those two
driver interfaces will calculate out the new value for `block_migration` based
on the shared storage checksimplemented in the virt driver. The new value of
`block_migration` will be returned in the `migrate_data`.
Currently only three virt drivers implement live-migration. They are
libvirt driver, xenapi driver, and hyperv driver:
For libvirt driver, it already implements the detection of shared storage. The
result of the checks are in the dict `dest_check_data`, in values
`is_shared_block_storage` and `is_shared_instance_path`. So when the
`block_migration` is None, the driver will set `block_migration` to True if
`is_shared_block_storage` or `is_shared_instance_path` is True. Otherwise the
driver will set `block_migration` to False. Finally the new value of
`block_migration` will be returned in `migrate_data`.
For xenapi driver, the shared storage check is based on aggregate. It is
required that the destination host must be in the same aggregate /
hypervisor_pool as the source host. So the `block_migration` will be True when
the host in that aggregate. Otherwise the `block_migration` is False. Also pass
the new value back with `migrate_data`.
For hyperv driver, although it supports the live-migration, but there isn't any
code implementing the `block_migration` flag. So we won't implement it until
hyperv support that flag.
Remove the check of disk_over_commit
------------------------------------
The `disk_over_commit` flag still needs to work with older microversions. For
this proposal, we add a None value when the request with a newer microversion.
In the libvirt driver, if the value of `disk_over_commit` is None, the driver
won't doing any disk usage check, otherwise the check will do the same thing as
before.
The upgrade concern
-------------------
This propose will add new value of `None` for `block_migration` and
`disk_over_commit`. When openstack cluster is in the progress of rolling
upgrade, the old version compute nodes don't know this new value. So
there is a check added in the Compute RPC API. If client can't send the new
version Compute RPC API, a fault will be returned.
Assignee(s)
-----------
Primary assignee:
Alex Xu <hejie.xu@intel.com>
Work Items
----------
* Implement the value detection of `block_migration` in the libvirt and xenapi
driver.
* Implement skip the check of disk usage when the `disk_over_commit` value is
None
* Make `block_migration`, `host` flags optional, and remove `disk_over_commit`
flag in the API.
Dependencies
============
None
Testing
=======
Unit tests and functional tests in Nova
Documentation Impact
====================
Doc the API change in the API Reference:
http://developer.openstack.org/api-ref-compute-v2.1.html
References
==========
None
History
=======
Mitaka: Introduced

View File

@@ -0,0 +1,141 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
===================
No more soft delete
===================
https://blueprints.launchpad.net/nova/+spec/no-more-soft-delete
There was widespread agreement at the YVR summit not to soft-delete any more
things. To codify this, we should remove the SoftDeleteMixin from NovaBase.
Problem description
===================
Soft deletion of rows imposes a management overhead to later delete or archive
those rows. It has also proved less necessary than initially imagined. We would
prefer additional soft-deletes were not added and so it does not make sense to
automatically inherit the `SoftDeleteMixin` when inheriting from NovaBase.
Use Cases
---------
As an operator, adding new soft deleted things means I need to extend my
manual cleanup to cover those things. If I don't, those tables will become
very slow to query.
As a developer, I don't want to tempt operators to read soft-deleted rows
directly. That risks turning the DB schema into an unofficial API.
As a developer/DBA, providing `deleted` and `deleted_at` columns on tables
which are not soft-deleted is confusing. One might also say it's confusing to
soft-delete from tables where deleted rows are never read.
Proposed change
===============
This spec proposes removing the `SoftDeleteMixin` from NovaBase and re-adding
it to all tables which currently inherit from NovaBase. The removal of
SoftDeleteMixin from those tables which don't need it will be left for future
work.
Alternatives
------------
We could not do this. This means we need an extra two columns on new tables
and it makes it slightly easier to start soft-deleting new tables.
Data model impact
-----------------
None.
REST API impact
---------------
None.
Security impact
---------------
None.
Notifications impact
--------------------
None.
Other end user impact
---------------------
None.
Performance Impact
------------------
None.
Other deployer impact
---------------------
None.
Developer impact
----------------
None.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
alexisl
Other contributors:
None
Work Items
----------
* Remove `SoftDeleteMixin` from NovaBase.
* Add it to all models which inherited from NovaBase.
Dependencies
============
None.
Testing
=======
None.
Documentation Impact
====================
None.
References
==========
None.
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Liberty
- Introduced
* - Mitaka
- Simplified and re-proposed

View File

@@ -0,0 +1,177 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
Remove shared storage flag in evacuate API
==========================================
https://blueprints.launchpad.net/nova/+spec/remove-shared-storage-flag-in-evacuate-api
Today evacuate API expects an onSharedStorage flag to be provided by the admin
however this information can be detected by the virt driver as well. To ease
the work of the admin and to allow easier automation of the evacuation tasks
this spec propose to remove the onSharedStorage flag from the API in a new
microversion.
Problem description
===================
Use Cases
---------
When an instance needs to be evacuated from a failed host the admin has to
check if the instance was stored on shared storage or not to issue the evacuate
command properly. The admin wants to rely on the virt driver to detect if
the instance data is available on the target host and use it if possible for
the evacuation.
An external automatic evacuation engine also wants to let nova to decide
if the instance can be evacuated without rebuilding it on the target host.
Proposed change
===============
In compute manager in the rebuild_instance function the on_shared_storage
flag is made optional with a previous spec so that the onSharedStorage
parameter now can be removed from the evacuate API.
The evacuate API supports providing a new admin password optionally. This
makes the solution a bit more complicated.
Nova can only decide if the instance is on shared storage if the target host
of the evacuation is already known which means only after the scheduler
selected the new host because nova needs to check if the disk of the instance
is visible from the target host. However the evacuation API call returns the
new admin password in the response. This logic cannot be fully kept if the
onSharedStorage flag is removed.
There are two cases to consider if the onSharedStorage flag is removed:
* Client doesn't provide admin password. Nova will generate a new password.
If nova finds that the instance is on shared storage then
the instance will be rebooted and will use the same admin password as before.
If nova finds that the instance is not on shared storage then the instance
will be recreated and the newly generated admin password will be used.
* Client provides admin password.
If nova finds that the instance is on shared storage then
the password the client provided will be silently ignored. If nova finds
that the instance is not on shared storage then the provided password will
be injected to the recreated instance.
This spec propose to
* Remove the onSharedStorage parameter of the
/v2.1/{tenant_id}/servers/{server_id}/action API
* Remove adminPass from the response body of the API call. Admin user can still
access the generated password via
/v2.1/{tenant_id}/servers/{server_id}/os-server-password API
Alternatives
------------
For the automation use case the alternative would be to reimplement the
checking of the instance availability on the disk in the theoretical external
evacuation engine. However this would be a clear code duplication as nova
already contains this check in the virt driver.
Data model impact
-----------------
None
REST API impact
---------------
The onSharedStorage parameter of the
/v2.1/{tenant_id}/servers/{server_id}/action API will be removed.
So the related JSON schema would be change to the following::
{
'type': 'object',
'properties': {
'evacuate': {
'type': 'object',
'properties': {
'host': parameter_types.hostname,
'adminPass': parameter_types.admin_password,
},
'required': [],
'additionalProperties': False,
},
},
'required': ['evacuate'],
'additionalProperties': False,
}
Also the adminPass will be removed from the response body.
This would make the response body empty therefore the API response
will not return a response body instead.
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
None
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
balazs-gibizer
Work Items
----------
* Remove onSharedStorage from the evacuate REST API
* Remove adminPass and therefore the whole response body of the evacuate API
Dependencies
============
None
Testing
=======
Unit and functional test coverage will be provided.
Documentation Impact
====================
Admin guide needs to be updated with the new behavior of the evacuate
function.
References
==========
[1] The bp that made the on_shared_storage optional in compute manager in
Liberty https://blueprints.launchpad.net/nova/+spec/optional-on-shared-storage-flag-in-rebuild-instance
[2] The code that made the on_shared_storage optional in compute manager in
Liberty https://review.openstack.org/#/c/197951/
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Mitaka
- Introduced

View File

@@ -0,0 +1,192 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
===============================================
Make os-instance-actions read deleted instances
===============================================
https://blueprints.launchpad.net/nova/+spec/os-instance-actions-read-deleted-instances
Change the os-instance-actions API to read deleted instances so the owner can
see the actions performed on their deleted instance.
Problem description
===================
The os-instance-actions API currently does not read deleted instances [#f1]_.
Also, instance_actions are not soft deleted when an instance is deleted, so
we can still read them out of the DB without needing the read_deleted='yes'
flag.
The point of instance actions is auditing, and in the case of a post-mortem
when an instance is deleted, instance_actions would be used for this, but
because of the API limitation, you can't get those out of the API using the
deleted instance.
Use Cases
---------
#. Multiple users are in the same project/tenant.
#. User A deletes a shared instance.
#. User B wants to know what happened to it (or who deleted it).
User B should be able to lookup the instance actions on the instance since they
are in the same project as user A.
Proposed change
===============
Add a microversion change to the os-instance-actions API so that we mutate the
context and set the read_deleted='yes' attribute when looking up the instance
by uuid.
Alternatives
------------
* We can assume that operators are listening for nova notifications and storing
those off for later lookup in the case that they need to determine who
deleted an instance. This is not a great assumption since it relies on an
external monitoring system being setup outside of nova, which is optional.
* Operators can query the database directly to get the instance actions for a
deleted instance, but then they have to know the nova data model. And only
operators can do that, it doesn't allow for tenant users to do this lookup
themselves (so they'd have to open a support ticket to the operator to do
the lookup for them).
Data model impact
-----------------
None.
REST API impact
---------------
Impacted API: os-instance-actions
Impacted methods: GET
The os-instance-actions API only has two GET requests:
#. index: list the instance actions by instance uuid
#. show: show details on an instance action by instance uuid and request id
including, if authorized, the related instance action events.
The request and response values do not change in the API. The expected response
codes do not change - there is still a 404 returned if the instance or instance
action is not found.
The only change is that when looking up the instance, we set the
read_deleted='yes' flag on the context. This will be done within a conditional
block based on the microversion in the request.
Security impact
---------------
None.
Notifications impact
--------------------
None.
Other end user impact
---------------------
We can bump the max support API version in python-novaclient automatically for
this change since it's self-contained in the server side API code, the client
does not have to do anything except opt into the microversion.
Performance Impact
------------------
None.
Other deployer impact
---------------------
None.
Developer impact
----------------
None.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Matt Riedemann <mriedem@us.ibm.com>
Other contributors:
None
Work Items
----------
* If the microversion in the request satisfies the minimum version required,
temporarily mutate the context when reading the instance by uuid from the
database. For example:
::
with utils.temporary_mutation(context, read_deleted='yes'):
instance = common.get_instance(self.compute_api, context, server_id)
Dependencies
============
None.
Testing
=======
#. Unit tests will be updated.
#. Functional tests (API sample tests) will be provided for the microversion
change. The scenarios are basically:
* Delete an instance and try to get it's instance actions where the
microversion requested does not meet the minimum requirement and assert
that nothing is returned.
* Delete an instance and try to get it's instance actions where the
microversion requested does meet the minimum requirement and assert that
the related instance actions are returned.
Documentation Impact
====================
* http://docs.openstack.org/developer/nova/api_microversion_history.html will
be updated.
* http://developer.openstack.org/api-ref-compute-v2.1.html will be updated to
point out the microversion change.
References
==========
* Mailing list: http://lists.openstack.org/pipermail/openstack-dev/2015-November/080039.html
.. [#f1] API: https://github.com/openstack/nova/blob/12.0.0/nova/api/openstack/compute/instance_actions.py#L56
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Mitaka
- Introduced

View File

@@ -0,0 +1,233 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=====================================
Use the new enginefacade from oslo_db
=====================================
https://blueprints.launchpad.net/nova/+spec/new-oslodb-enginefacade
Implement the new oslo.db enginefacade interface described here:
https://blueprints.launchpad.net/oslo.db/+spec/make-enginefacade-a-facade
Problem description
===================
The linked oslo.db spec contains the details of the proposal, including its
general advantages to all projects. In summary, we transparently track database
transactions using the RequestContext object. This means that if there is
already a transaction in progress we will use it by default, only creating a
separate transaction if explicitly requested.
Use Cases
----------
These changes will only affect developers.
* Allow a class of database races to be fixed
Nova currently only exposes database transactions in nova/db/sqlalchemy/api.py,
which means that every db api call is in its own transaction. Although this
will remain the same initially, the new interface allows a caller to extend a
transaction across several db api calls if they wish. This will enable callers
who need these to be atomic to achieve this, which includes the save operation
on several Nova objects.
* Reduce connection load on the database
Many database api calls currently create several separate database connections,
which increases load on the database. By reducing these to a single connection,
load on the db will be decreased.
* Improve atomicity of API calls
By ensuring that database api calls use a single transaction, we fix a class of
bug where failure can leave a partial result.
* Make greater use of slave databases for read-only transactions
The new api marks sections of code as either readers or writers, and enforces
this separation. This allows us to automatically use a slave database
connection for all read-only transactions. It is currently only used when
explicitly requested in code.
Proposed change
===============
Code changes
------------
* Decorate the RequestContext class
nova.RequestContext is annotated with the
@enginefacade.transaction_context_provider decorator. This adds several code
hooks which provide access to the transaction context via the RequestContext
object.
* Update database apis incrementally
Database apis will be updated in batches, by function. For example, Service
apis, quota apis, instance apis. Invidual calls will be annotated as either
readers or writers. Existing transaction management will be replaced. Calls
into apis which have not been upgraded yet will continue to explicitly pass the
session or connection object.
* Remove uses of use_slave wherever possible
The use_slave parameter will be removed from all upgraded database apis, which
will involve updating call sites and tests. Where the caller no longer uses the
use_slave parameter anywhere, the removal will be propagated as far as
possible. The exception will be external interfaces. All uses of use_slave
will be removed. External interfaces will continue to accept it, but will not
use it.
* Cells 'api' database calls
get_api_engine() and get_api_session() will be replaced by a context manager
which changes the current transaction manager.
Alternatives
------------
Alternatives were examined during the design of the oslo.db code. The goal of
this change is to implement a solution which is common across OpenStack
projects.
Data model impact
-----------------
None.
REST API impact
---------------
None.
This change obsoletes the use_slave parameter everywhere it is used, which
includes several apis with external interfaces. We remove it from all internal
interfaces. For external interfaces we leave it in place, but ignore it. Slave
connections will be used everywhere automatically, whenever possible
Security impact
---------------
Nothing obvious.
Notifications impact
--------------------
None.
Other end user impact
---------------------
None.
Performance Impact
------------------
By reducing connection load on the database, the change is expected to provide
a small performance improvement. However, the primary purpose is correctness.
Other deployer impact
---------------------
None.
Developer impact
----------------
The initial phase of this work will be to implement the new engine facade in
nova/db/sqlalchemy/api.py only, and the couple of cells callers which access
the database outside this module. There will be some minor changes to function
signatures in this module due to removing use_slave, but all callers will be
updated as part of this work. Callers will not have to consider transaction
context if they do not currently do so, as it will be created and destroyed
automatically.
This change will allow developers to explicitly extend database transaction
context to cover several database calls. This allows the caller to make
multiple database changes atomically.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
mbooth-9
Work Items
----------
* Enable use of the new api in Nova
* Migrate api bundles along functional lines:
* Service
* ComputeNode
* Certificate
* FloatingIP
* DNSDomain
* FixedIP
* VIF
* Instance, InstanceInfoCache, InstanceExtra, InstanceMetadata,
InstanceSystemMetadata, InstanceFault, InstanceGroup, InstanceTag
* KeyPair
* Network
* Quota
* EC2
* BDM
* SecurityGroup
* ProviderFWRule
* Migration
* ConsolePool
* Flavor
* Cells
* Agent
* Bandwidth
* Volume
* S3
* Aggregate
* Action
* Task
* PCIDevice
Dependencies
============
A version of oslo.db including the new enginefacade api:
https://review.openstack.org/#/c/138215/
Testing
=======
This change is intended to have no immediate functional impact. The current
tests should continue to pass, except where:
* An internal API is modified to remove use_slave
* The change exposes a bug
* The tests assumed implementation details which have changed
Documentation Impact
====================
None.
References
==========
https://blueprints.launchpad.net/oslo.db/+spec/make-enginefacade-a-facade

View File

@@ -0,0 +1,197 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
===============================================
Provide a way to pause VM during live migration
===============================================
Blueprint:
https://blueprints.launchpad.net/nova/+spec/pause-vm-during-live-migration
When using live migrations, an operator might want to have a possibility to
increase success chance of migration even at the cost of longer VM downtime.
This spec proposes a new nova API for pausing VM during live migration.
Problem description
===================
The most common use case of live migration is host maintenance for different
purposes. It might be, e.g., OpenStack upgrade to newer version or even
hardware upgrade. Hypervisors have some features such as CPU throttling or
memory compression to make it possible to live migrate every VM to other hosts.
However, a VM might run workload that will prevent live migration from
finishing. In such case operator might want to pause VM during live migration
to stop memory writes on a VM.
Another use case is imminent host failure where live migration duration might
be crucial to keep VMs running regardless of VMs downtime during transition to
destination host.
Currently to pause VM during live migration operator needs to pause VM through
libvirt/hypervisor. This pause is transparent for Nova as this is the same that
happens during 'pause-and-copy' step during live migration.
Use Cases
----------
As an operator of an OpenStack cloud, I would like the ability to pause VM
during live migration. This operation prevents VM from dirtying memory and
therefore it forces live migration to complete.
Proposed change
===============
A new API method for pausing VM during live migration. This will make
asynchronous RPC call to compute node to pause a VM through libvirt.
Also this will introduce new instance action 'live-migration-paused-vm'.
The Migration object and MigrationList object will be used to establish which
migrations exist, with additional optional data provided by the compute driver.
This will need an increment to the rpcapi version too.
Alternatives
------------
Alternative is not doing this and let operator pause VM manually through
hypervisor.
Another alternative is to reuse existing pause operation in nova. However, it
might bring some confusion to operators. Libvirt preserves VM state that was
in effect when live migration started. When live migration completes
libvirt reverts VM state to preserved one. Example workflow:
* VM is active
* Operator starts live migration
* Libvirt preserves active state of a VM
* Operator pauses VM during transition (e.g., nova pause VM)
* LM finishes
* Libvirt reverts VM state to preserved one - in this case to active.
Because of such behavior it is not recommended to reuse existing pause
operation. It might be confusing for operators that single operation is used
for two different purposes.
Also in the future there might be multiple methods to force end of live
migration. This API can be extended to give hints to do things other than
pause the VM during live migration.
This also will be suitable for Tasks API.
Data model impact
-----------------
None. The Migration objects used are already created and tracked by nova.
REST API impact
---------------
To be added in a new microversion.
* Force live migration to complete by pausing VM
`POST /servers/{id}/migrations/{id}/action`
Body::
{
"force_complete": null
}
Normal http response code: `202 Accepted`
No response body is needed
Expected error http response code: `400 Bad Request`
- the instance state is invalid for forcing live migration to complete,
i.e., the task state is not 'migrating' or the migration is not in a
'running' state and the type is 'live-migration'. Also when live
migration cancel action is undergoing.
Expected error http response code: `403 Forbidden`
- Policy violation if the caller is not granted access to
'os_compute_api:servers:migrations:force_complete' in policy.json
Expected error http response code: `404 Not Found`
- the instance does not exist
Because this is async call there might be an error that will not be exposed
through API. For instance, hypervisor does not support pausing VM during live
migration. Such error will be logged by compute service.
Security impact
---------------
None
Notifications impact
--------------------
There will be new notification to indicate start and outcome of pausing VM
during ongoing live migration.
Other end user impact
---------------------
python-novaclient will be extended by new operation to force ongoing live
migration to complete by pausing VM during transition to destination host.
Performance Impact
------------------
None
Other deployer impact
---------------------
None
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Pawel Koniszewski (irc: pkoniszewski)
Work Items
----------
* Pausing VM during live migration through libvirt
* python-novaclient 'nova live-migration-force-complete'
Dependencies
============
None
Testing
=======
* Unit and Functional tests in Nova
* Tempest tests if possible to slow down live migration or start never-ending
live migration
Documentation Impact
====================
New API needs to be documented:
* Compute API extensions documentation
http://developer.openstack.org/api-ref-compute-v2.1.html
* nova.compute.api documentation
http://docs.openstack.org/developer/nova/api/nova.compute.api.html
References
==========
None

View File

@@ -0,0 +1,138 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================
Persist RequestSpec object
==========================
https://blueprints.launchpad.net/nova/+spec/persist-request-spec
Persist the RequestSpec object used for scheduling an instance.
Problem description
===================
There are a few times that it would be useful to have the RequestSpec used for
originally scheduling an instance where it is not currently available, such as
during a resize/migrate. In order to have later scheduling requests operate
under the same constraints as the original we should retain the RequestSpec for
these later scheduling calls.
Going forward with cells it will be necessary to store a RequestSpec before an
instance is created so that the API can return details on the instance before
it has been scheduled.
Use Cases
---------
* Operators/users want to move an instance through a migration or resize and
want the destination to satisfy the same requirements as the source.
Proposed change
===============
A save() method will be added to the RequestSpec object. This will store the
RequestSpec in the database. Since this is also a part of the cells effort it
will be possible to stor in both the api and regular nova database. Which
database it's stored in on save() will be determined by the context used.
Alternatives
------------
Parts of it could be put into the instance_extra table. Because later this
will be persisted in the api database before scheduling and then moved to the
cell database after scheduling it is beneficial to just store it in a table
that can exist in both.
Data model impact
-----------------
A new database table will be added to both the api and cell database. The
schema will match what is necessary for the RequestSpec object to be stored.
Since it is not yet implemented it's of little use to finalize the design here.
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None here, but this will allow for resizes to be scheduled like the original
boot request.
Performance Impact
------------------
An additional database write will be incurred.
Other deployer impact
---------------------
Same as for users, nothing here but this opens up future changes.
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
alaski
Work Items
----------
* Add a new table to the api and cell/current database
* Add the save() method to the RequestSpec object
* Call the save() method in the code at the appropriate place
Dependencies
============
https://blueprints.launchpad.net/nova/+spec/request-spec-object
Testing
=======
New unit tests will be added. This is not externally facing in a way that
Tempest can test.
Documentation Impact
====================
Devref documentation will be added explaining the existence of this data for
use in scheduling.
References
==========
None

View File

@@ -0,0 +1,281 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
======================
RBD Instance Snapshots
======================
https://blueprints.launchpad.net/nova/+spec/rbd-instance-snapshots
When using RBD as storage for glance and nova, instance snapshots are
slow and inefficient, resulting in poor end user experience. Using
local disk for the upload increases operator costs for supporting
instance snapshots.
As background reading the follow link provides an overview of the
snapshotting capabilities available in ceph.
http://docs.ceph.com/docs/master/rbd/rbd-snapshot/
Problem description
===================
RBD is often used to back glance images and nova disks. When using rbd
for nova's disks, nova 'snapshots' are slow, since they create full
copies by downloading data from rbd to a local file, uploading it to
glance, and putting it back into rbd. Since raw images are normally
used with rbd to enable copy-on-write clones, this process removes any
sparseness in the data uploaded to glance. This is a problem of user
experience, since this slow, inefficient process takes much longer
than necessary to let users customize images.
For operators, this is also a problem of efficiency and cost. For
rbd-backed nova deployments, this is the last part that uses
significant local disk space.
Use Cases
----------
This allows end users to quickly iterate on images, for example to
customize or update them, and start using the snapshots far more
quickly.
For operators, this eliminates any need for large local disks on
compute nodes, since instance data in rbd stays in rbd. It also
prevents lots of wasted space.
Project Priority
-----------------
None
Proposed change
===============
Instead of copying all the data to local disk, keep it in RBD by
taking an RBD snapshot in Nova and cloning it into Glance. Rather
than uploading the data, just tell Glance about its location in
RBD. This way data stays in the Ceph cluster, and the snapshot is
far more rapidly usable by the end user.
In broad strokes, the workflow is as follows:
1. Create an RBD snapshot of the ephemeral disk via Nova in
the ceph pool Nova is configured to use.
2. Clone the RBD snapshot into Glance's RBD pool. [7]
3. To keep from having to manage dependencies between snapshots
and clones, deep-flatten the RBD clone in Glance's RBD pool and
detach it from the Nova RBD snapshot in ceph. [7]
5. Remove the RBD snapshot from ceph created in (1) as it is no
longer needed.
6. Update Glance with the location of the RBD clone created and
flattend in (2) and (3).
This is the reverse of how images are cloned into nova instance disks
when both are on rbd [0].
If any of these steps fail, clean up any partial state and fall back
to the current full copy method. Failure of the RBD snapshot method
will be quick and usually transient in nature. The cloud admin can
monitor for these failures and address the underlying CEPH issues
causing the RBD snapshot to fail.
Failures will be reported in the form of stack traces in the nova
compute logs.
There are a few reasons for falling back to full copies instead of
bailing out if efficient snapshots fail:
* It makes upgrades graceful, since nova snapshots still work
before glance has enough permissions for efficient snapshots
(see Security Impact for glance permission details).
* Nova snapshots still work when efficient snapshots are not
possible due to architecture choices, such as not using rbd as
a glance backend, or using different ceph clusters for glance
and nova.
* This is consistent with existing rbd behavior in nova and cinder.
If cloning from a glance image fails, both projects fall back
to full copies when creating volumes or instance disks.
Alternatives
------------
The clone flatten step could be handled as a background task in a
green thread, or completely asynchronously as a periodic task. This
would increase user-facing performance, as the snapshots would be
available for use immediately, but it would also introduce
race-condition-like issues around deleting dependent images.
The flatten step could be omitted completely, and glance could be
made responsible for tracking the various image dependencies. At
the rbd level, an instance snapshot would consist of three things
for each disk. This is true of any instance, regardless of whether
it was created from a snapshot itself, or is just created from a
usual image. In rbd, there would be:
1. a snapshot of the instance disk
2. a clone of the instance disk
3. a snapshot of the clone
(3) is exposed through glance's backend location.
(2) is an internal detail of glance.
(1) is an internal detail that nova and glance handle.
At the rbd level, a disk with snapshots can't be deleted. Hide this
from the user if they delete an instance with snapshots by making
glance responsible for their eventual deletion, once their dependent
snapshots are deleted. Nova does this by renaming instance disks that
it deletes in rbd, so glance is aware that they can be deleted.
When a glance snapshot is deleted, it deletes (3), then (2), and
(1). If nova has renamed its parent in rbd with a preset suffix, the
instance has been destroyed already, so glance tries to delete the
original instance disk. The original instance disk will be
successfully deleted when the last snapshot is removed.
If glance snapshots are created but deleted before the instance is
destroyed, nova will delete the instance disks as usual.
The mechanism nova uses to let glance know it needs to clean up the
original disk could be different. It could use an image property with
certain restrictions which aren't possible in the current glance api:
* it must be writeable only once
* to avoid exposing backend details, it would need to be hidden
from end users
Storing this state in ceph is much easier to keep consistent with
ceph, rather than an external database which could become out of sync.
It would also be an odd abstraction leak in the glance_store api, when
upper layers don't need to be aware of it at all.
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
Glance will need to be configured with direct_url support enabled
in order for Nova to determine what and where to clone the image
from, depending on system configurations, this could leak backend
credentials [5]. Devstack has already been updated to switch
behaviors when Ceph support is requested [6].
Documentation has typically recommended using different ceph pools
for glance and nova, with different access to each. Since nova
would need to be able to create the snapshot in the pool used by
glance, it would need write access to this pool as well.
Notifications impact
--------------------
None
Performance Impact
------------------
Snapshots of RBD-backed instances would be significantly faster.
Other end user impact
---------------------
Snapshots of RBD-backed instances would be significantly faster.
Other deployer impact
---------------------
To use this in an existing installation with authx, adding 'allow
rwx pool=images' to nova's ceph user capabilities is necessary. The
'ceph auth caps' command can be used for this [1]. If these permissions
are not updated, nova will continue using the existing full copy
mechanism for instance snapshots because the the fast snapshot will fail
and nova compute will fall back to the full copy method.
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
nic
Other contributors:
jdurgin
pbrady
nagyz
cfb-n/cburgess
Work Items
----------
Implementation: [4]
The libvirt imagebackend does not currently recognize AMI images
as raw (and therefore cloneable) for whatever reason, so this
proposed change is of limited utility with a very popular image
format. This should be addressed in a separate change.
Dependencies
============
You need a Havana or newer version of glance as direct URL was added in
Havana.
Testing
=======
The existing tempest tests with ceph in the gate cover instance
snapshots generically. As fast snapshots are enabled automatically, there
is no need to change the tempest tests. Additionally, unit tests in nova
will verify error handling (falling back to full copies if the process
fails), and make sure that when configured correctly rbd snapshots and
clones are used rather than full copies.
Documentation Impact
====================
See the security and other deployer impact sections above.
References
==========
[0] http://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/rbd-clone-image-handler.html
[1] Ceph authentication docs: http://ceph.com/docs/master/rados/operations/user-management/#modify-user-capabilities
[2] Alternative: Glance cleanup patch: https://review.openstack.org/127397
[3] Alternative: Nova patch: https://review.openstack.org/125963
[4] Nova patch: https://review.openstack.org/205282
[5] https://bugs.launchpad.net/glance/+bug/880910
[6] https://review.openstack.org/206039
[7] http://docs.ceph.com/docs/master/dev/rbd-layering/

View File

@@ -0,0 +1,239 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=========================
Create RequestSpec Object
=========================
https://blueprints.launchpad.net/nova/+spec/request-spec-object-mitaka
Add a structured, documented object that represents a specification for
launching multiple instances in a cloud. This spec is a follow-up from the
previously approved and partially implemented request-spec-object spec.
Problem description
===================
The main interface into the scheduler, the `select_destinations()` method,
accepts a `request_spec` parameter that is a nested dict. This nested dict is
constructed in `nova.scheduler.utils.build_request_spec()`, however the
structure of the request spec is not documented anywhere and the filters in the
scheduler seem to take a laisse faire approach to querying the object during
scheduling as well as modifying the `request_spec` object during loops of the
`nova.scheduler.host_manager.HostStateManager.get_filtered_hosts()` method,
which calls the filter object's `host_passes` object, supplying a
`filter_properties` parameter, which itself has a key called `request_spec`
that contains the aforementioned nested dict.
This situation makes it very difficult to understand exactly what is going on
in the scheduler, and cleaning up this parameter in the scheduler interface is
a pre-requisite to making a properly-versioned and properly-documented
interface in preparation for a split-out of the scheduler code.
Use Cases
----------
This is a pure refactoring effort for cleaning up all the interfaces in between
Nova and the scheduler so the scheduler could be split out by the next cycle.
Proposed change
===============
A new class called `RequestSpec` will be created that models a request to
launch multiple virtual machine instances. The first version of the
`RequestSpec` object will simply be an objectified version of the current
dictionary parameter. The scheduler will construct this `RequestSpec` object
from the `request_spec` dictionary itself.
The existing
`nova.scheduler.utils.build_request_spec` method will be removed in favor of a
factory method on `nova.objects.request_spec.RequestSpec` that will construct
a `RequestSpec` from the existing key/value pairs in the `request_spec`
parameter supplied to `select_destinations`.
Alternatives
------------
None.
Data model impact
-----------------
This spec is not focusing on persisting the RequestSpec object but another
blueprint (and a spec) will be proposed with this one as dependency for
providing a save() method to the RequestSpec object which would allow it to be
persisted in (probably) instance_extra DB table.
REST API impact
---------------
None.
Security impact
---------------
None.
Notifications impact
--------------------
None.
Other end user impact
---------------------
None.
Performance Impact
------------------
None.
Other deployer impact
---------------------
None.
Developer impact
----------------
None, besides making the scheduler call interfaces gradually easier to read
and understand.
Implementation
==============
The `request_spec` dictionary is currently constructed by the nova-conductor
when it calls the `nova.scheduler.utils.build_request_spec()` function, which
looks like this:
.. code:: python
def build_request_spec(ctxt, image, instances, instance_type=None):
"""Build a request_spec for the scheduler.
The request_spec assumes that all instances to be scheduled are the same
type.
"""
instance = instances[0]
if isinstance(instance, obj_base.NovaObject):
instance = obj_base.obj_to_primitive(instance)
if instance_type is None:
instance_type = flavors.extract_flavor(instance)
# NOTE(comstud): This is a bit ugly, but will get cleaned up when
# we're passing an InstanceType internal object.
extra_specs = db.flavor_extra_specs_get(ctxt, instance_type['flavorid'])
instance_type['extra_specs'] = extra_specs
request_spec = {
'image': image or {},
'instance_properties': instance,
'instance_type': instance_type,
'num_instances': len(instances),
# NOTE(alaski): This should be removed as logic moves from the
# scheduler to conductor. Provides backwards compatibility now.
'instance_uuids': [inst['uuid'] for inst in instances]}
return jsonutils.to_primitive(request_spec)
As the filter_properties dictionary is hydrated with the request_spec
dictionary, this proposal is merging both dictionaries into a single object.
A possible first version of a class interface for the `RequestSpec`
class would look like this, in order to be as close to a straight conversion
from the nested dict's keys to object attribute notation:
.. code:: python
class RequestSpec(base.NovaObject):
"""Models the request to launch one or more instances in the cloud."""
VERSION = '1.0'
fields = {
'image': fields.ObjectField('ImageMeta', nullable=False),
'root_gb': fields.IntegerField(nullable=False),
'ephemeral_gb': fields.IntegerField(nullable=False),
'memory_mb: fields.IntegerField(nullable=False),
'vcpus': fields.IntegerField(nullable=False),
'numa_topology': fields.ObjectField('InstanceNUMATopology',
nullable=True),
'project_id': fields.StringField(nullable=True),
'os_type': fields.StringField(nullable=True),
'availability_zone': fields.StringField(nullable=True),
'instance_type': fields.ObjectField('Flavor', nullable=False),
'num_instances': fields.IntegerField(default=1),
'force_hosts': fields.StringField(nullable=True),
'force_nodes': fields.StringField(nullable=True),
'pci_requests': fields.ListOfObjectsField('PCIRequest', nullable=True),
'retry': fields.ObjectField('Retry', nullable=True),
'limits': fields.ObjectField('Limits', nullable=True),
'group': fields.ObjectField('GroupInfo', nullable=True),
'scheduler_hints': fields.DictOfStringsField(nullable=True)
}
This blueprint targets to provide a new Scheduler API method which would only
accept RequestSpec objects in replacement of select_destinations() which would
be deprecated and removed in a later cycle.
That RPC API method could be having the following signature:
.. code:: python
def select_nodes(RequestSpec):
# ...
As said above in the data model impact section, this blueprint is not targeting
to persist this object at the moment.
Assignee(s)
-----------
Primary assignee:
bauzas
Other contributors:
None
Work Items
----------
- Convert all filter classes to operate against the `RequestSpec` object
instead the nested `request_spec` dictionary.
- Change the Scheduler RPC API to accept a Spec object for select_destinations
- Modify conductor methods to directly hydrate a Spec object
- Add developer reference documentation for what the request spec models.
Dependencies
============
None.
Testing
=======
The existing unit tests of the scheduler filters will be modified to access
the `RequestSpec` object in the `filter_properties` dictionary.
Documentation Impact
====================
Update any developer reference material that might be referencing the old
dictionary accesses.
References
==========
This blueprint is part of an overall effort to clean up, version, and stabilize
the interfaces between the nova-api, nova-scheduler, nova-conductor and
nova-compute daemons that involve scheduling and resource decisions.

View File

@@ -0,0 +1,230 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=========================================================
Add notification for administrative service status change
=========================================================
https://blueprints.launchpad.net/nova/+spec/service-status-notification
Today external system cannot get notification based information about the nova
service status. Nova service status can be changed administratively via
os-services/disable API.
Having such a notification helps to measure the length of maintenance windows
or indirectly notify users about maintenance actions that possibly effect the
operation of the infrastructure.
Problem description
===================
Use Cases
---------
Deployer wants to measure the time certain nova services were disable
administratively due to troubleshooting or maintenance actions as this
information might be part of the agreement between Deployer and End User.
Deployer wants to measure the time certain nova services was forced down due
to an externally detected error as this information might be part of the
agreement between Deployer and End User.
Proposed change
===============
An easy solution for the problem above is to add oslo.messaging notification
for the following actions:
* /v2/{tenant_id}/os-services/disable
* /v2/{tenant_id}/os-services/enable
* /v2/{tenant_id}/os-services/disable-log-reason
* /v2/{tenant_id}/os-service/force-down
Then ceilometer can receive these notifications and the length of the
maintenance window can be calculated via ceilometer queries.
Alternatively other third party tools like StackTach can receive the new
notifications via AMQP.
Alternatives
------------
The only alternative is to poll /v2/{tenant_id}/os-services/ API periodically
however it means slower information flow and creates load on the nova API
and DB services.
Data model impact
-----------------
No database schema change is foreseen.
The following new objects will be added to nova:
.. code-block:: python
@base.NovaObjectRegistry.register
class ServiceStatusNotification(notification.NotificationBase):
# Version 1.0: Initial version
VERSION = '1.0'
fields = {
'payload': fields.ObjectField('ServiceStatusPayload')
}
@base.NovaObjectRegistry.register
class ServiceStatusPayload(base.NovaObject):
# Version 1.0: Initial version
VERSION = '1.0'
fields = {
'service': fields.ObjectField('Service')
}
The definition of NotificationBase can be found in the Versioned notification
spec [3].
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
A new notification service.status.update will be introduced with INFO priority
and the payload of the notification will be the serialized form of the already
existing Service versioned object. This notification will be the first that
uses versioned object as a payload but there is an initiative to
use versioned objects as notification payload for every nova notification [3].
This new notification will not support emitting legacy format.
During the implementation of this spec we will provide the minimum
infrastructure to emit versioned notification based on [3] but all the advanced
things like sample and doc generation will be done during the implementation
[3].
For example after the following API call::
PUT /v2/{tenant_id}/os-services/disable-log-reason
{"host": "Devstack",
"binary": "nova-compute",
"disabled_reason": "my reason"}
The notification would contain the following payload::
{
"nova_object.version":"1.0",
"nova_object.name":"ServiceStatusPayload",
"nova_object.namespace":"nova",
"nova_object.data":{
"service":{
"nova_object.version":"1.19",
"nova_object.name":"Service",
"nova_object.namespace":"nova",
"nova_object.data":{
"id": 1,
"host": "Devstack"
"binary": "nova-compute",
"topic": "compute",
"report_count": 32011,
"disabled": true,
"disabled_reason": "my reason,
"availability_zone": "nova",
"last_seen_up": "2015-10-15 07:29:13",
"forced_down": false,
"version": 2,
}
"nova_object.changes":[
"disabled",
"disabled_reason",
]
}
}
}
Please note that the compute_node field will not be serialized into the
notification payload as that will bring in a lot of additional data not needed
here.
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
None
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
balazs-gibizer
Work Items
----------
* Send a new notification if the disabled disabled_reson or forced_down field
of the Service object is updated
Dependencies
============
This work is part of the Versioned notification API [3] work. But it is not
directly depends on it. On the summit we agreed to add this new notification as
the first step of the versioned notification api work to serve us as a carrot
motivating the operators to start consuming new versioned notifications.
Testing
=======
Besides unit test new functional test cases will be added to cover the
new notification
Documentation Impact
====================
None
References
==========
[1] This idea has already been discussed on ML
http://lists.openstack.org/pipermail/openstack-dev/2015-April/060645.html
[2] This work is related to but not depends on the bp mark-host-down
https://blueprints.launchpad.net/nova/+spec/mark-host-down
[3] Versioned notification spec https://review.openstack.org/#/c/224755/
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Mitaka
- Introduced

View File

@@ -0,0 +1,196 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
================================
Service Version Behavior Changes
================================
https://blueprints.launchpad.net/nova/+spec/service-version-behavior
There are a lot of situations where operators may have multiple
versions of nova code running in a single deployment, either
intentionally or accidentally. There are several things we can do make
this safer and smoother in code to make the operator's life easier.
Problem description
===================
When running multiple versions of Nova code, care must be taken to
avoid sending RPC messages that are too new (or too old) for some of
the services to understand, as well as avoid accessing the database
with object models that are not able to handle the potential schema
skew.
Right now, during an upgrade, operators must calculate and set version
pins on the relevant RPC interfaces so that newer services (conductor,
api, etc) can speak to older services (compute) while a mix of
versions are present. This involves a lot of steps, config tweaking,
and service restarting. The potential for incorrectly executed or
missed steps is high.
Further, during normal operation, an older compute host that may have
been offlined for an extended period of time could be restarted and
attempt to join the system after compatibility code (or
configurations) have been removed.
In both of these cases, nova should be able to help identify, avoid,
and automate complex tasks that ultimately boil down to just a logical
decision based on reported versions.
Use Cases
----------
As an operator, I want live upgrades to be easier with fewer required
steps and more forgiving behavior from nova.
As an operator, I want more automated checks preventing an ancient
compute node from trying to rejoin after an extended hiatus.
Proposed change
===============
In Liberty, we landed a global service version counter. This records
each service's version in the database, and provides some historical
information (such as the compute rpc version at each global version
bump). In Mitaka, we should take advantage of this to automate some
tasks.
The first thing we will automate is the compute RPC version
selection. Right now, operators set the version pin in the config file
during a live upgrade and remove it after the upgrade is complete. We
will add an option to set this to "auto", which will select the
compute RPC version based on the reported service versions in the
database. By looking up the minimum service version, we can consult
the SERVICE_VERSION_HISTORY structure to determine what compute RPC
version is supported by the oldest nodes. We can make this transparent
to other code by doing the lookup in the compute_rpcapi module once at
startup, and again on signals like SIGHUP.
This will only be done if the version pin is set to "auto", requiring
operators to opt-in to this new behavior while it is smoke tested. In
the case where we choose the version automatically, the decision (and
whether it is the latest, or a backlevel version) will be logged for
audit purposes.
The second change thing we will automate is checking of the minimum
service version during service record create/update. This will prevent
ancient services from joining the deployment if they are too old. This
will be done in the Service object, and it will compare its own
version to the minimum version of other services in the database. If
it is older than all the other nodes, then it will refuse to start. If
we refuse to start, we'll log the versions involved and the reason for
the refusal visibly to make it clear what happend and what needs
fixing.
Alternatives
------------
We could continue to document both of these procedures and require
manual steps for the operators.
Data model impact
-----------------
There are no data(base) model impacts prescribed by the work here, as
those were added preemptively in Liberty.
The Service object will gain at least one remotable method for
determining the minimum service version.
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
Checking the minimum version in the database on compute_rpcapi module
startup will incur a small performance penalty and additional database
load. This will only happen once per startup (or signal) and is
expected to be massively less impactful than the effort required to
manually perform the steps being automated.
It would also be trivial for conductor to cache the minimum versions
for some TTL in order to avoid hitting the database during a storm of
services starting up.
Other deployer impact
---------------------
Deployer impact should be entirely positive. One of the behaviors will
be opt-in only initially, and the other is purely intended to prevent
the operators from shooting themselves in their feet.
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
danms
Work Items
----------
* Add a minimum version query to the Service object
* Automate selection of the compute RPC version when the pin is set to auto
* Automate service failure on startup when the service version is too old
* Hook re-checking of the minimum version to receiving a SIGHUP
Dependencies
============
None
Testing
=======
As with all things that affect nova service startup, unit tests will
be the only way to test that the service fails to startup when the
version is too old.
The compute RPC pin selection can and will be tested by configuring
grenade's partial-ncpu job to use "auto" instead of an explicit
pin. This will verify that the correct version is selected by the fact
that tempest continues to pass with nova configured in that way.
Documentation Impact
====================
A bit of documentation will be required for each change, merely to
explain the newly-allowed value for the compute_rpc version pin and
the potential new behavior of starting an older service.
References
==========
* https://review.openstack.org/#/c/201733/
* http://specs.openstack.org/openstack/nova-specs/specs/liberty/approved/service-version-number.html

View File

@@ -0,0 +1,224 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
Add soft affinity support for server group
==========================================
https://blueprints.launchpad.net/nova/+spec/soft-affinity-for-server-group
As a tenant I would like to schedule instances on the same host if possible,
so that I can achieve collocation. However if it is not possible to schedule
some instance to the same host then I still want that the subsequent
instances are scheduled together on another host. In this way I can express
a good-to-have relationship between a group of instances.
As a tenant I would like to schedule instances on different hosts if possible.
However if it is not possible I still want my instances to be scheduled even
if it means that some of them are placed on the same host.
Problem description
===================
Use Cases
---------
End User might want to have a less strict affinity and anti-affinity
rule than what is today available in server-group API extension.
With the proposed good-to-have affinity rule the End User can request nova
to schedule the instance to the same host (i.e. stack them) if possible.
However if it is not possible (e.g. due to resource limitations) then End User
still wants to keep the instances on a small amount of different host.
With the proposed good-to-have anti-affinity rule the End User can request
nova to spread the instances in the same group as much as possible.
Proposed change
===============
This change would extend the existing server-group API extension with two new
policies soft-affinity and soft-anti-affinity.
When a instance is booted into a group with soft-affinity policy the scheduler
will use a new weight AffinityWeight to sort the available hosts according to
the number of instances running on them from the same server-group in a
descending order.
When an instance is booted into a group with soft-anti-affinity policy the
scheduler will use a new weight AntiAffinityWeight to sort the available hosts
according to the number of instances running on them from the same
server-group in a ascending order.
The two new weights will get the necessary information about the number of
instances per host through the weight_properties (filter_properties) in
a similar way as the GroupAntiAffinityFilter gets the list of hosts used by
a group via the filter_properties.
These new soft-affinity and soft-anti-affinity policies are mutually exclusive
with each other and with the other existing server-group policies. This means
that a server group cannot be created with more than one policy as every
combination of the existing policies (affinity, anti-affinity, soft-affinity,
soft-anti-affinity) are contradicting.
If the scheduler sees a request which requires any of the new weigher classes
but those classes are not configured then the scheduler will reject the request
with an exception similarly to the case when affinity policy is requested but
ServerGroupAffinityFilter is not configured.
Alternatives
------------
Alternatively End User can use the server-group with affinity policy and if
the instance cannot be scheduled because the host associated to the group is
full then End User can create a new server-group for the subsequent instances.
However with large amount of instances that occupy many hosts this manual
process can become quite cumbersome.
Data model impact
-----------------
No schema change is needed.
There will be two new possible values soft-affinity and soft-anti-affinity for
the policy column of the instance_group_policy table.
REST API impact
---------------
POST: v2/{tenant-id}/os-server-groups
The value of the policy request parameter can be soft-affinity and
soft-anti-affinity as well. So the new JSON schema will be the following::
{"type": "object",
"properties": {
"server_group": {
"type": "object",
"properties": {
"name": parameter_types.name,
"policies": {
"type": "array",
"items": [{"enum": ["anti-affinity", "affinity",
"soft-anti-affinity",
"soft-affinity"]}],
"uniqueItems": True,
"additionalItems": False}},
"required": ["name", "policies"],
"additionalProperties": False}},
"required": ["server_group"],
"additionalProperties": False}
For example the following POST request body will be valid::
{"server_group": {
"name": "test",
"policies": [
"soft-anti-affinity"]}}
And will be answered with the following response body::
{"server_group": {
"id": "5bbcc3c4-1da2-4437-a48a-66f15b1b13f9",
"name": "test",
"policies": [
"soft-anti-affinity"
],
"members": [],
"metadata": {}}}
The above API change will be introduced in a new API microversion.
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
None
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
balazs-gibizer
Work Items
----------
* Add two new weighers to the filter scheduler. These weights will
sort the available hosts by the number of instances from the same
server-group.
* Update FilterScheduler to reject the request if the new policy is
requested but the related weigher is not configured.
* Update the server-group API extension to allow soft-affinity and
soft-anti-affinity as the policy of a group.
Dependencies
============
None
Testing
=======
Unit test coverage will be provided.
The following functional test coverage will be provided:
* create groups with soft-affinity and soft-anti-affinity
* boot two servers with soft-affinity with enough resource on the same host.
Nova shall boot both server to the same host.
* boot two servers with soft-affinity but there is not enough resource to boot
the second server to the same host as the first server. Nova shall boot the
second server to a different host.
* boot two servers with soft-anti-affinity and two compute hosts are available
with enough resources. Nova shall boot the two servers to two separate hosts.
* boot two servers with soft-anti-affinity but only a single compute host is
available. Nova shall boot the two servers to the same host.
* Rebuild, migrate, evacuate server with soft-affinity
* Rebuild, migrate, evacuate server with soft-anti-affinity
Documentation Impact
====================
New weights need to be described in filter_scheduler.rst.
References
==========
* instance-group-api-extension BP
https://blueprints.launchpad.net/nova/+spec/instance-group-api-extension
* Group API wiki
https://wiki.openstack.org/wiki/GroupApiExtension

View File

@@ -0,0 +1,185 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
======================================
Split network plane for live migration
======================================
https://blueprints.launchpad.net/nova/+spec/split-network-plane-for-live-migration
This spec is proposed to split the network plane of live migration from
management network, in order to avoid the network performance impact caused by
data transfer generated by live migration.
Problem description
===================
When we do live migration with QEMU/KVM driver, we use hostname of target
compute node as the target of live migration. So the RPC call and live
migration traffic will be in same network plane. Live migration will have
impact on network performance, and this impact is significant when lots of live
migration occurs concurrently, even if CONF.libvirt.live_migration_bandwidth
is set.
Use Cases
---------
The OpenStack deployer plan a specific network plane for live migration, which
is separated from the management network. As the data transfer of live migrate
is flowing in this specific network plane, its impact to network performance
will be limited in this network plane and will have no impact for management
network. The end user will not notice this change.
Proposed change
===============
Add an new option CONF.my_live_migration_ip in configuration file, set None as
default value. When pre_live_migration() execute in destination host, set the
option into pre_migration_data, if it's not None. When driver.live_migration()
execute in source host, if this option is present in pre_migration_data, the ip
address is used instead of CONF.libvirt.live_migration_uri as the uri for live
migration, if it's None, then the mechanism remains as it is now.
This spec focuses on the QEMU/KVM driver, the implementations for other drivers
should be completed in separate blueprint.
Alternatives
------------
Config live migration uri, like this::
live_migration_uri = "qemu+tcp://%s.INTERNAL/system"
Then modify the DNS configuration in the OpenStack deployment::
target_hostname 192.168.1.5
target_hostname.INTERNAL 172.150.1.5
But requiring such DNS changes in order to deploy and use OpenStack may not be
practical due to organizational procedure limitations at many organizations.
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
This feature has no negative impact for security. Split data transfer and
management will improve security somewhat by reducing the chance of a
management plane denial of service.
Notifications impact
--------------------
None
Other end user impact
---------------------
No impact on end user.
Performance Impact
------------------
Using specifically planed network plane, when live migration, the impact of
data transfer on network performance will no longer exist. The impact of live
migration on network performance will be limited to its own network plane.
Other deployer impact
---------------------
The added configuration option CONF.my_live_migration_ip will be available for
all drivers, the default value is None. Thus, when OpenStack upgrades, the
existing live migration mechanism remains, if the option of
CONF.my_live_migration_ip has been set, this option will be used for live
migration's target uri. If the deployers want to use this function, a separated
network plane will have to be planned in advance.
Developer impact
----------------
All drivers can implement this function using the same mechanism.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Rui Chen <chenrui.momo@gmail.com>
Other contributors:
Zhenyu Zheng <zhengzhenyu@huawei.com>
Work Items
----------
* Add new configuration option CONF.my_live_migration_ip into [DEFAULT] group.
* Modify the existing implementation of live migration, when
pre_live_migration() execute in destination host, set the option into
pre_migration_data, if it's not None.
* In QEMU/KVM driver when driver.live_migration() execute in source host, if
this option is present in pre_migration_data, the ip address is used instead
of CONF.libvirt.live_migration_uri as the uri for live migration, if it's
None, then the mechanism remains as it is now.
Dependencies
============
None
Testing
=======
Changes will be made for live migration, thus related unit tests will be added.
Documentation Impact
====================
The instruction for a new configuration option CONF.my_live_migration_ip will
be added to the OpenStack Configuration Reference manual.
The operators can plan a specify network plane for live migration,
like: 172.168.*.*, split it from management network (192.168.*.*), then add the
option into nova.conf on every nova-compute host according to the planed IP
addresses, like this: CONF.my_live_migration_ip=172.168.1.15.
The default value of new option is None, so the live-migration workflow is as
same as the original by default.
References
==========
None
History
=======
None

View File

@@ -0,0 +1,287 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
============================================================
Enable passthrough of SR-IOV physical functions to instances
============================================================
https://blueprints.launchpad.net/nova/+spec/sriov-physical-function-passthrough
Nova has supported passthrough of PCI devices with its libvirt driver for a
few releases already, during which time the code has seen some stabilization
and a few minor feature additions.
In the case of SR-IOV enabled cards, it is possible to treat any port on the
card either as a number of virtual devices (called VFs - virtual functions) or
as a full device (PF - physical function).
Nova's current handling exposes only virtual functions as resources that can
be requested by instances - and this is the most common use case by far.
However with the rise of the requirements to virtualize network applications,
it can be necessary to give instances full control over the port and not just a
single virtual function.
OpenStack is seen as one of the central bits of technology for the NFV
use-cases, and a lot of the work has already gone into making OpenStack and
Nova NFV enabled, so we want to make sure that we close these small remaining
gaps.
Problem description
===================
Currently it is not possible to pass through a physical function to an
OpenStack instance, but some NFV applications need to have full control of the
port, while others are happy with using a VF of an SR-IOV enabled card. It is
beneficial to be able to do so with the same set of cards, as pre-provisioning
resources on the granularity smaller than compute hosts is cumbersome
to manage and goes against the goal of Nova to provide on demand
resources. We want to be able to give certain instances unlimited access to the
port by assigning the PF to it, but revert back to using VFs when the PF is not
being used, so as to ensure on-demand provisioning of available resources. This
may not be possible with every SR-IOV card and their respective Linux drivers,
in which case certain ports will need to be pre-provisioned as either PFs or
VFs by administratior ahead of time.
This in turn means that Nova would have to keep track of which VFs belong to
particular PFs and make sure that this is reflected in the way resources are
tracked (so even a single VF being used means the related PF is unavailable and
vice versa, if a PF is being used, all of it's VFs are marked as used).
PCI device management code in Nova currently filters out any
device that is a physical function (this is currently hard-coded). In
addition, modeling of PCI device resources in Nova currently assumes flat
hierarchy and resource tracking logic does not understand the relationship
between different PCI devices that can be exposed to Nova.
Use Cases
----------
Certain NFV workloads may need to have the full control of the physical device,
in order to use some of the functionality not available to VFs, to bypass some
limitations certian cards impose on VFs, or to exclusively use the full
bandwidth of the port. However, due to the dynamic nature of the elastic cloud,
and the promise of Nova to deliver resources on demand, we do not wish to have
to pre-provision certain SR-IOV cards to be used as PFs as this defeats the
promise of the infrastructure management tool that allows for quick
re-purposing of resources that Nova brings.
Modern SR-IOV enabled cards along with their drivers usually allow for such
reconfiguration to be done on the fly, so once the passthrough of the PF is no
longer needed on a specific host (either the instance using it got moved or
deleted), the PF is bound back to it's Linux driver, thus enabling the use of
VFs provided that initialization steps (if any are needed) are done upon
handing the device back. It is not possible to
guarantee that this always works however, due to the vast range of equipment
and drivers available on the market, so we want to make sure that there is a
way to tell Nova that a card is in certain configuration and cannot be assumed
to be reconfigurable.
Additional use cases (that will require further work) will be enabled by having
the Nova data model usefully express the relationship between PF and its VFs.
Some of them have been proposed as separate specs (see [1]_ and [2]_).
Proposed change
===============
Two problems we need to solve are:
1) How to enable requesting a full physical device. This means extending the
InstancePCIRequest data model to be able to hold this information. Since
the whitelist parsing logic that builds up the Spec objects probes the
system and has the information about whether a device is a PF or not, it is
enough to add a physical_function field to the PCI alias schema and the
PCIRequest object.
2) Enable scheduling and resource tracking based on the request that can now
be for the whole device. This means extending the data model for PCIDevices
to hold information about relationship between physical and virtual
functions (this relationship is already recorded but not in a suitable
format), and also extending the
way we expose the aggregate data about PCI devices to the resource tracker
(a.k.a. the PCIDeviceStats class) to be able to present PFs and their
counts, and to make sure to track the corresponding VFs that become
unavailable once the PF is claimed/used.
In addition to the above, we will want to make sure that whitelist syntax can
support passing throught PFs. This will require very few changes it turns out.
Currently if a whitelist entry
specifies an address or a devname of a PF, the matching code will make sure
any of the VFs match. This behavior, combined with allowing a device that is a
PF to be tracked by nova (by removing the hard-coded check that skips any PFs)
should be sufficient to allow most of the flexibility administrators need.
As it is not sufficient for a device to be whitelisted to be requestable by
users (it needs to either have an alias that is specified on the flavor),
simply defaulting to whitelisting PFs along with all of their VFs if a PF
address is whitelisted gives us the flexibility we need, while keeping
backwards compatibility.
As is the case with the current implementation, there is some initial
configuration that will be needed on hosts that have PCI devices that can be
passed through. In addition to the standard setup needed to enable SR-IOV and
configure the cards, and
whitelist configuration setup that Nova requires, administrators may also need
to add an automated way (such as udev rules) to re-enable VFs, since
depending on the driver and the card used, any existing
configuration may be lost once a VM is given full control of the port, and the
device is unbound from the host driver.
In order for PFs to work as Neutron ports, some additional work that is outside
of scope of this blueprint will be needed. We aim to make internal Nova changes
that are needed the focus here and defer on the integration work to a future
(possibly cross-project) blueprint. For the libvirt driver, this means that,
since there will be no Neutron support
at first, the only way to assign such a device would be using the <hostdev>
element, and no support for <interface> is in scope for this blueprint.
Alternatives
------------
There are no real alternatives that cover all of the use cases. An alternative
that would cover only the requirement for bandwidth would be to allow for
reserving of all VFs of a single PF by a single instance while using only a
single VF, effectively reserving the bandwidth. In addition to not being a
solution for all the applications, it also does not reduce the complexity of
the change much as the relationship between VFs still needs to be modeled in
Nova.
Data model impact
-----------------
Even though there is a way currently to figure out the PF a single VF belongs
to (through the use of `extra_info` free-form field) it may be necessary to add
a more "query friendly" relationship, that will allow us to answer the question
"given a PCI device record that is a PF, which VF records does it contain".
It is likely to be implemented as a foreign key relationship to the same table,
and objects support will be added, but the actual implementation discussion is
better suited for the actual code proposal review.
It will also be necessary to be able to know relations between individual PFs
and VFs in the aggregate view of the PCI device data used in scheduling, so
changes to the way PciDeviceStats holds aggregate
data. This will also result in changes to the filtering/cliaming logic, the
extent of which may impact decisions about the data model so this is
best discussed on actual implementation changes.
REST API impact
---------------
There are no API changes required. PCI devices are requested through flavor
extra-specs by specifying an alias of a device specification. Currently,
device specifications and their aliases are part of the Nova deployment
configuration, and thus are deployment specific.
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None - non-admin users will continue to use only things exposed to them via
flavor extra-specs, which they cannot modify in any way.
Performance Impact
------------------
Scheduling of instances requiring PCI passthrough devices will be doing more
work and on a bit more data than currently in the case of PF requests. It is
unlikely that this will have any noticeable performance impact however.
Other deployer impact
---------------------
PCI alias syntax for enabling the PCI devices will become more feature-full, in
order to account for specifically requesting a PF.
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Nikola Đipanov <ndipanov@redhat.com>
Other contributors:
Vladik Romanovsky <vromanso@redhat.com>
Work Items
----------
* Re-work the DB models and corresponding objects to have explicit relationship
between the PF entry and it's corresponding VFs. Update the claiming
logic inside the PCI manager class so that claiming/assigning the PF claims
all of it's VFs and vice versa.
* Change the PCIDeviceStats class to expose PFs in it's pools, and change the
claiming/consuming logic to claim appropriate amounts of VFs when a PF is
consumed or claimed. Once this work item is complete, all of the scheduling
and resource tracking logic will be aware of the PF constraint.
* Add support for specifying the PF requirement through the pci_alias
configuration options, so that it can be requested through flavor
extra-specs.
Dependencies
============
None
Testing
=======
Changes proposed here only extend existing functionality, so they will require
updating the current test suite to make sure new functionality is covered.
It is expected that the tests currently in place are to prevent any regression
to the existing functionality. No new test suites are required to be added for
this functionality, only new test cases.
Documentation Impact
====================
Documentation for the PCI passthrough features in Nova will need to be updated
to reflect the above changes - that is to say - no impact out of the ordinary.
References
==========
.. [1] https://review.openstack.org/#/c/182242/
.. [2] https://review.openstack.org/#/c/142094/
.. [3] https://blueprints.launchpad.net/nova/+spec/pci-passthrough-whitelist-regex
History
=======
Optional section for Mitaka intended to be used each time the spec
is updated to describe new design, API or any database schema
updated. Useful to let reader understand what's happened along the
time.
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Mitaka
- Introduced

View File

@@ -0,0 +1,428 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=====================================================
Allow user to set and retrieve the server Description
=====================================================
The launchpad blueprint is located at:
https://blueprints.launchpad.net/nova/+spec/user-settable-server-description
Allow users to set the description of a server when it is created, rebuilt,
or updated. Allow users to get the server description.
Problem description
===================
Currently, when a server is created, the description is hardcoded to be the
server display name. The description cannot be set on a server rebuild.
Users cannot set the description on the server or retrieve the description.
Currently, they need to use other fields, such as the server name or meta-data,
to provide a description. These are overloading the name and meta-data
fields in a way for which they were not designed. A better way to provide
a long human-readable description is to use a separate field. The description
can be easily viewed in a server list display.
Use Cases
----------
* The End User wishes to provide a description when creating a server.
* The End User wishes to provide a description when rebuilding a server.
If the user chooses to change the name, a new description may be needed
to match the new name.
* The End User wishes to get the server's description.
* The End User wishes to change the server's description.
Proposed change
===============
* Nova REST API
* Add an optional description parameter to the Create Server, Rebuild Server,
and Update Server APIs.
* No default description on Create Server (set to NULL in the database).
* If a null description string is specified on the server update or
rebuild, then the description is set to NULL in the database
(description is removed)
* If the description parameter is not specified on the server update
or rebuild, then the description is not changed.
* An empty description string is allowed.
* The Get Server Details API returns the description in the JSON response.
This can be NULL.
* The List Details for Servers API returns the description for each server.
A description can be NULL.
* Nova V2 client
* Add an optional description parameter to the server create method.
* Add an optional description parameter to the server rebuild method.
* Add new methods for server set and clear the description. These will
implement a new CLI command "nova describe" with the following
positional parameters:
* server
* description (Pass in "" to remove the description)
* Return the description on server show method. This can be null.
* If detail is requested, return the description on each server
returned by the server list method. A description can be null.
* Openstack V2.1 compute client
* NOTE: Changes to the Openstack V2 compute client will be
implemented under a bug report, and not under this spec.
* Add an optional description parameter to CreateServer.
* Add an optional description parameter to RebuildServer.
* Add an optional description parameter to SetServer and
UnsetServer.
* Return the description on ShowServer. This can be null.
* If detail is requested, return the description on each server
returned by the ListServer. A description can be null.
Note: A description field already exists in the database, so the change is
to add API/CLI support for setting and getting the description.
Other projects possibly impacted:
* Horizon could be changed to set and show the server description.
Alternatives
------------
None
Data model impact
-----------------
None. The database column for description already exists as 255 characters,
and is nullable.
REST API impact
---------------
Add the following parameter validation:
::
valid_description_regex_base = '[%s]*'
valid_description_regex = valid_description_regex_base % (
re.escape(_get_printable()))
description = {
'type': ['string', 'null'], 'minLength': 0, 'maxLength': 255,
'pattern': valid_description_regex,
}
Change the following APIs under a new microversion:
`Create Server <http://developer.openstack.org/api-ref-compute-v2.1.html#createServer>`_
........................................................................................
New request parameter:
+---------------------+------+-------------+-----------------------+
|Parameter |Style |Type | Description |
+=====================+======+=============+=======================+
|description(optional)|plain | csapi:string|The server description |
+---------------------+------+-------------+-----------------------+
Add the description to the json request schema definition:
::
base_create = {
'type': 'object',
'properties': {
'server': {
'type': 'object',
'properties': {
'name': parameter_types.hostname,
'description': parameter_types.description,
'imageRef': parameter_types.image_ref,
'flavorRef': parameter_types.flavor_ref,
'adminPass': parameter_types.admin_password,
'metadata': parameter_types.metadata,
'networks': {
'type': 'array',
'items': {
'type': 'object',
'properties': {
'fixed_ip': parameter_types.ip_address,
'port': {
'type': ['string', 'null'],
'format': 'uuid'
},
'uuid': {'type': 'string'},
},
'additionalProperties': False,
}
}
},
'required': ['name', 'flavorRef'],
'additionalProperties': False,
},
},
'required': ['server'],
'additionalProperties': False,
}
Error http response codes:
* 400 (BadRequest) if the description is invalid unicode,
or longer than 255 characters.
`Rebuild Server <http://developer.openstack.org/api-ref-compute-v2.1.html#rebuildServer>`_
..........................................................................................
New request parameter:
+---------------------+------+-------------+-----------------------+
|Parameter |Style |Type | Description |
+=====================+======+=============+=======================+
|description(optional)|plain | csapi:string|The server description |
+---------------------+------+-------------+-----------------------+
Add the description to the json request schema definition:
::
base_rebuild = {
'type': 'object',
'properties': {
'rebuild': {
'type': 'object',
'properties': {
'name': parameter_types.name,
'description': parameter_types.description,
'imageRef': parameter_types.image_ref,
'adminPass': parameter_types.admin_password,
'metadata': parameter_types.metadata,
'preserve_ephemeral': parameter_types.boolean,
},
'required': ['imageRef'],
'additionalProperties': False,
},
},
'required': ['rebuild'],
'additionalProperties': False,
}
Error http response codes:
* 400 (BadRequest) if the description is invalid unicode,
or longer than 255 characters.
`Update Server <http://developer.openstack.org/api-ref-compute-v2.1.html#updateServer>`_
........................................................................................
New request parameter:
+---------------------+------+----------------------+-----------------------+
|Parameter |Style |Type | Description |
+=====================+======+======================+=======================+
|description(optional)|plain |csapi:ServerForUpdate |The server description |
+---------------------+------+----------------------+-----------------------+
Add the description to the json request schema definition:
::
base_update = {
'type': 'object',
'properties': {
'server': {
'type': 'object',
'properties': {
'name': parameter_types.name,
'description': parameter_types.description,
},
Response:
* The update API currently returns the details of the updated server. As part
of this, the description will now be returned in the json response.
Error http response codes:
* 400 (BadRequest) if the description is invalid unicode,
or longer than 255 characters.
`Get Server Details <http://developer.openstack.org/api-ref-compute-v2.1.html#getServer>`_
..........................................................................................
Add the description to the JSON response schema definition.
::
server = {
"server": {
"id": instance["uuid"],
"name": instance["display_name"],
"description": instance["display_description"],
"status": self._get_vm_status(instance),
"tenant_id": instance.get("project_id") or "",
"user_id": instance.get("user_id") or "",
"metadata": self._get_metadata(instance),
"hostId": self._get_host_id(instance) or "",
"image": self._get_image(request, instance),
"flavor": self._get_flavor(request, instance),
"created": timeutils.isotime(instance["created_at"]),
"updated": timeutils.isotime(instance["updated_at"]),
"addresses": self._get_addresses(request, instance),
"accessIPv4": str(ip_v4) if ip_v4 is not None else '',
"accessIPv6": str(ip_v6) if ip_v6 is not None else '',
"links": self._get_links(request,
instance["uuid"],
self._collection_name),
},
Security impact
---------------
None
Notifications impact
--------------------
The notification changes for this spec will be included as
part of the implementation of the Versioned Notification API spec:
https://review.openstack.org/#/c/224755/
* The new versioned notification on instance update will include
the description.
* The new versioned notification on instance create will include
the description.
* The new versioned notification on instance rebuild will include
the description.
Other end user impact
---------------------
Changes to python-novaclient and python-openstackclient as described above.
Horizon can add the description to the GUI.
Performance Impact
------------------
None
Other deployer impact
---------------------
None
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
chuckcarmack75
Other contributors:
none
Work Items
----------
1) Implement the nova API changes.
2) Implement the novaclient and openstackclient changes.
Dependencies
============
None
Testing
=======
* Nova functional tests
* Add a description to the tests that use the API to create a server.
* Check that the default description is NULL.
* Add a description to the tests that use the API to rebuild a server.
* Check that the description can be changed or removed.
* Check that the description is unchanged if not specified on the API.
* Add a description to the tests that use the API to update a server.
* Check that the description can be changed or removed.
* Check that the description is unchanged if not specified on the API.
* Check that the description is returned as part of server details for
an individual server or a server list.
* Python nova-client and openstack-client. For the client tests and
the CLI tests:
* Add a description to the tests that create a server.
* Add a description to the tests that rebuild a server.
* Set and remove the description on an existing server.
* Check that the description is returned as part of server details for
an individual server or a server list.
* Error cases:
* The description passed to the API is longer than 255 characters.
* The description passed to the API is not valid printable unicode.
* Edge cases:
* The description passed to the API is an empty string. This is allowed.
Documentation Impact
====================
Documentation updates to:
* API spec: http://developer.openstack.org/api-ref-compute-v2.1.html
including the API samples.
* Client: novaclient and openstackclient
References
==========
The request for this feature first surfaced in the ML:
http://lists.openstack.org/pipermail/openstack-dev/2015-August/073052.html
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Mitaka
- Introduced
* - Mitaka
- Re-submitted to add support for description on Rebuild.

View File

@@ -0,0 +1,781 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================
Versioned notification API
==========================
https://blueprints.launchpad.net/nova/+spec/versioned-notification-api
The notification interface of nova is not well defined and the current
notifications define a very inconsistent interface. There is no easy
way to see from the notification consumer point of view what is the format
and the content of the notification nova sends.
Problem description
===================
This is the generic notification envelope format supported by oslo.messaging
[1]::
{
"priority": "INFO",
"event_type": "compute.instance.update",
"timestamp": "2015-09-02 09:13:31.895554",
"publisher_id": "api.controller",
"message_id": "06d9290b-b9b0-4bd5-9e76-ddf8968a70b4",
"payload": {}
}
The problematic fields are:
* priority
* event_type
* publisher_id
* payload
priority: Nova uses info and error priorities in the current code base except
in case of the nova.notification.notify_decorator code where the priority is
configurable with the notification_level configuration parameter. However this
decorator is only used in the monkey_patch_modules configuration default value.
event_type: oslo allows a raw string to be sent as event_type, nova uses the
following event_type formats today:
* <service>.<object>.<action>.<phase> example: compute.instance.create.end
* <object>.<action>.<phase> example: aggregate.removehost.end
* <object>.<action> example: servergroup.create
* <service>.<action>.<phase> example: scheduler.select_destinations.end
* <action> example: snapshot_instance
* <module?>.<action> example: compute_task.build_instances
publisher_id: nova uses the following publisher_id formats today:
* <service>.controller examples: api.controller, compute.controller
* <object>.controller example: servergroup.controller
* <object>.<object_id> example: aggregate.<aggregate.name> and
aggregate.<aggregate_id>. See: [2].
It seems that the content of publisher_id and event_type overlaps in some
cases.
payload: nova does not have any restriction on the payload field which
leads to very many different formats. Sometimes it is a view of an existing
nova versioned object e.g. in case of compute.instance.update notification
nova dumps the fields of the instance object into the notification after some
filtering. In other case nova dumps the exception object or dumps the args and
kwargs of a function into the payload. This complex payload format seems to be
the biggest problem for notification consumers.
Use Cases
---------
As a tool developer I want to consume nova notifications to implement my
requirements. I want to know what is the format of the notifications and I want
to have some way to detect and follow up the changes in the notification format
later on.
Proposed change
===============
This spec is created to agree on the format, content and meaning of the fields
in notification sent by nova and to propose way to change the existing
notifications to the new format while giving time to the notification
consumers to adapt to the change. Also it tries to give a technical solution to
keep the notification payload more stable and versioned.
Current notifications are un-versioned. This spec proposes to transform the
un-versioned notification to versioned notifications while keeping the
possibility to emit un-versioned notifications for limited time to help the
transition for the notification consumers.
Versioned notifications will have a well defined format which is documented and
notification samples will be provided similarly to nova api samples.
New versions of a versioned notification will be kept backward compatible.
To model and version the new notifications nova will use the oslo
versionedobject module. To emit such notification nova will continue to use
the notifier interface of oslo.messaging module. To convert the notification
model to the format that can be fed into the notifier interface nova will use
the existing NovaObjectSerializer.
A single versioned notification will be modeled with a single oslo versioned
object but that object can use other new or existing versioned object as
payload field.
However some of the today's notifications cannot be really converted to
versioned notifications. For example the notify_decorator dumps the args and
kwargs of any function into the notification payload therefore we cannot create
a single versioned model for every possible payload it generates. For these
notifications a generic, semi-managed, dict based payload can be defined
that formulates as much as possible and leaves the rest of the payload
un-managed. Adding new semi-managed notifications shall be avoided in the
future.
We want to keep the notification envelope format defined by the notifier
interface in oslo.messaging, therefore versioned notifications will have the
same envelope on the wire as the un-versioned notifications.
Which is the following::
{
"priority": "INFO",
"event_type": "compute.instance.update",
"timestamp": "2015-09-02 09:13:31.895554",
"publisher_id": "api.controller",
"message_id": "06d9290b-b9b0-4bd5-9e76-ddf8968a70b4",
"payload": {}
}
The main difference between the wire format of the versioned and un-versioned
notification is the format of the payload field. The versioned notification
wire format will use the serialized format of a versioned object as payload.
The versioned notification model will define versioned object fields for every
fields oslo.messaging notifier interface needs (priority, event_type,
publisher_id, payload) so that a single notification can be fully modeled in
nova code. However only the payload field will use the default versioned object
serialization. The other fields in the envelope will be filled with strings as
in the example above.
The value of the event_type field of the envelope on the wire will be defined
by the name of the affected object, the name of the performed action emitting
the notification and the phase of the action. For example: instance.create.end,
aggregate.removehost.start, filterscheduler.select_destinations.end.
The notification model will do basic validation on the content of the
event_type e.g. enum for valid phases will be created.
The value of the the priority field of the envelope on the wire can be selected
from the predefined priorities in oslo.messaging (audit, debug, info, warn,
error, critical, sample) except 'warning' (use warn instead).
The notification model will do validation of the priority by providing an enum
with the valid priorities.
For concrete examples see the Data model impact section.
Backward compatibility
----------------------
The new notification model can be used to emit the current un-versioned
notification as well to provide backward compatibility while the un-versioned
notification will be deprecated. Nova might want to restrict adding new
un-versioned notification after this spec is implemented.
A new version of a versioned notification has to be backward compatible with
the previous version. Nova will always emit the latest version of a versioned
notification and nova will not support pinning back the notification versions.
Backward compatibility for pre Mitaka notification consumers will be ensured
by emitting both the verisoned and the un-versioned notification format on the
wire on separate topics. The new notification model will provide
a way to emit both old and new wire format from a same notification object.
A configuration option will be provided to specify which version of the
notifications shall be emitted but asking for the old format only will be
deprecated from the beginning. Emitting the un-versioned wire format of a
versioned notification will be deprecated along with a proper deprecation
message in Mitaka and will be removed in N release.
Alternatives
------------
Version the whole wire format instead of only the payload:
There seems to be two main alternatives how to generate the actual notification
message on the wire from the KeyPairNotification object defined in the Data
model impact section.
Use the current envelope structure defined by the notifier in oslo.messaging
[1] and use the versioning of the payload on the wire as proposed in the
Data model impact section.
Pros:
* No oslo.messaging change is required.
* Consumers only need to change the payload parsing code.
* Notification envelope in the whole OpenStack ecosystem are the same.
Cons:
* The envelope on the wire is not versioned just the payload field of
it. However the envelope structure is generic and well defined by
oslo.messaging.
Or alternatively create a new envelope structure in oslo.messaging that already
a versioned object and use the serialized form of that object on the wire.
If we change oslo.messaging to provide an interface where an object inheriting
from NotificationBase object can be passed in and oslo.messaging uses the
serialized from of that object as the message directly then KeyPair
notification message on the wire would look like the following::
{
"nova_object.version":"1.0",
"nova_object.name":"KeyPairNotification",
"nova_object.data":{
"priority":"info",
"publisher":{
"nova_object.version":"1.19",
"nova_object.name":"Service",
"nova_object.data":{
"host":"controller",
"binary":"api"
... # a lot of other fields from the Service object here
},
"nova_object.namespace":"nova"
},
"payload":{
"nova_object.version":"1.3",
"nova_object.name":"KeyPair",
"nova_object.namespace":"nova",
"nova_object.data":{
"id": 1,
"user_id":"21a75a650d6d4fb28858579849a72492",
"fingerprint": "e9:49:b2:ca:56:8c:25:77:ea:0d:d9:7c:89..."
"public_key": "ssh-rsa AAAAB3NzaC1yc2EAA...",
"type": "ssh",
"name": "mykey5"
}
},
"event_type":{
"nova_object.version":"1.0",
"nova_object.name":"EventType",
"nova_object.data":{
"action":"create",
"phase":"start",
"object":"keypair"
},
"nova_object.namespace":"nova"
}
},
"nova_object.namespace":"nova"
}
In this case the NotificationBase classes shall be provided by the
oslo.messaging.
Pros:
* The whole message on the wire are versioned.
Cons:
* Needs extensive changes in oslo.messaging in the notification interface code
as well as in the notification drivers as today notification drivers depend
on the current envelope structure.
* It would create a circular dependency between oslo.messaging and
oslo.versionedobject
* Consumers need to adapt to the top level structure change as well.
Use a single global notification version:
The proposal is to use separate version number per notification. Alternatively
a single global notification version number can be defined that is bumped every
time when a single notification has been changed.
Data model impact
-----------------
The following base objects will be defined:
.. code-block:: python
class NotificationPriorityType(Enum):
AUDIT = 'audit'
CRITICAL = 'critical'
DEBUG = 'debug'
INFO = 'info'
ERROR = 'error'
SAMPLE = 'sample'
WARN = 'warn'
ALL = (AUDIT, CRITICAL, DEBUG, INFO, ERROR, SAMPLE, WARN)
def __init__(self):
super(NotificationPriorityType, self).__init__(
valid_values=NotificationPriorityType.ALL)
class NotificationPriorityTypeField(BaseEnumField):
AUTO_TYPE = NotificationPriorityType()
@base.NovaObjectRegistry.register
class EventType(base.NovaObject):
# Version 1.0: Initial version
VERSION = '1.0'
fields = {
'object': fields.StringField(),
'action': fields.EventTypeActionField(), # will be an enum
'phase': fields.EventTypePhaseField(), # will be an enum
}
@base.NovaObjectRegistry.register
class NotificationBase(base.NovaObject):
fields = {
'priority': fields.NotificationPriorityTypeField(),
'event_type': fields.ObjectField('EventType'),
'publisher': fields.ObjectField('Service'),
}
def emit(self, context):
"""Send the notification. """
def emit_legacy(self, context):
"""Send the legacy format of the notification. """
Note that the publisher field of the NotificationBase will be used to fill the
publisher_id field of the envelope in the wire format by extracting the name of
the service and the host the service runs on from the Service object.
Then here is a concrete example that uses the base object:
.. code-block:: python
@base.NovaObjectRegistry.register
class KeyPairNotification(notification.NotificationBase):
# Version 1.0: Initial version
VERSION = '1.0'
fields = {
'payload': fields.ObjectField('KeyPair')
}
Where the referred KeyPair object is an already existing versioned object in
nova. Then the current keypair notification sending code can be written like:
.. code-block:: python
def _notify(self, context, keypair):
event_type = notification.EventType(
object='keypair',
action=obj_fields.EventTypeActionField.CREATE,
phase=obj_fields.EventTypePhaseField.START)
publisher = utils.get_current_service()
keypair_obj.KeyPairNotification(
priority=obj_fields.NotificationPriorityType.INFO,
event_type=event_type,
publisher=publisher,
payload=keypair).emit(context)
When defining the payload model for a versioned notification we will try to
reuse the existing nova versioned objects like in case of the KeyPair example
above. If that is not possible a new versioned object for the payload will be
created.
The wire format of the above KeyPair notification will look like the
followings::
{
"priority":"INFO",
"event_type":"keypair.create.start",
"timestamp":"2015-10-08 11:30:09.988504",
"publisher_id":"api:controller",
"payload":{
"nova_object.version":"1.3",
"nova_object.name":"KeyPair",
"nova_object.namespace":"nova",
"nova_object.data":{
"id": 1,
"user_id":"21a75a650d6d4fb28858579849a72492",
"fingerprint": "e9:49:b2:ca:56:8c:25:77:ea:0d:d9:7c:89:35:36"
"public_key": "ssh-rsa AAAAB3NzaC1yc2EAA...",
"type": "ssh",
"name": "mykey5"
}
},
"message_id":"98f1221f-ded0-4153-b92d-3d67219353ee"
}
For an alternative wire format see the Alternatives section.
Semi managed notification example
---------------------------------
The nova.exceptions.wrap_exception decorator is used to send notification in
case an exception happens during the decorated function. Today this
notification has the following structure::
{
event_type: <the named of the decorated function>,
publisher_id: <needs to be provided to the decorator via the notifier>,
payload: {
exception: <the exception object>
args: <dict of the call args of the decorated function as gathered
by nova.safe_utils.getcallargs expect the ones that has
'_pass' in their names>
}
timestamp: ...
message_id: ...
}
We can define a following semi managed notification object for it::
@base.NovaObjectRegistry.register
class Exception(base.NovaObject):
# Version 1.0: Initial version
VERSION = '1.0'
fields = {
'message': fields.StringField(),
'code': fields.IntegerField(),
}
@base.NovaObjectRegistry.register
class ExceptionPayload(base.NovaObject):
# Version 1.0: Initial version
VERSION = '1.0'
fields = {
'exception': fields.ObjectField('Exception'),
'args': fields.ArgDictField(),
}
@base.NovaObjectRegistry.register
class ExceptionNotification(notification.NotificationBase):
# Version 1.0: Initial version
VERSION = '1.0'
fields = {
'payload': fields.ObjectField('ExceptionPayload')
}
Where the ArgDictField takes any python object, it uses object serialisation
when available, otherwise, a primitive->json conversion,
but if that fails, it just stringifies the object.
This field does not have a well defined wire format so this part of the
notification will not be really versioned, hence the semi versioned name.
send_api_fault notification example
-----------------------------------
The nova.notifications.send_api_fault function is used to send notification in
case of api faults. The current format of the notification is the following::
{
event_type: "api.fault",
publisher_id: "api.myhost",
payload: {
"url": <the request url>,
"exception": <the stringified exception object>,
"status": <http status code>
}
timestamp: ...
message_id: ...
}
We can define the following managed notification object for it::
@base.NovaObjectRegistry.register
class ApiFaultPayload(base.NovaObject):
# Version 1.0: Initial version
VERSION = '1.0'
fields = {
'url': fields.UrlField(),
'exception': fields.ObjectField('Exception'),
'status': fields.IntegerField(),
}
@base.NovaObjectRegistry.register
class ApiFaultNotification(notification.NotificationBase):
# Version 1.0: Initial version
VERSION = '1.0'
fields = {
'payload': fields.ObjectField('ApiFaultPayload')
}
instance update notification example
------------------------------------
The nova.notifications.send_update function is used today to send notification
about the change of the instance. Here is an example of the current
notification format::
{
"priority":"INFO",
"event_type":"compute.instance.update",
"timestamp":"2015-10-12 14:33:45.704324",
"publisher_id":"api.controller",
"payload":{
"instance_id":"0ab36db7-0770-47de-b34d-45adb17248e7",
"user_id":"21a75a650d6d4fb28858579849a72492",
"tenant_id":"8cd4a105ae504184ade871e23a2c6d07",
"reservation_id":"r-epzg3dq2",
"display_name":"vm1",
"hostname":"vm1",
"host":null,
"node":null,
"architecture":null,
"os_type":null,
"cell_name":"",
"availability_zone":null,
"instance_flavor_id":"42"
"instance_type_id":6,
"instance_type":"m1.nano",
"memory_mb":64,
"vcpus":1,
"root_gb":0,
"disk_gb":0,
"ephemeral_gb":0,
"image_ref_url":"http://192.168.200.200:9292/images/34d9b758-e9c8-4162-ba15-78e6ce05a350",
"kernel_id":"7fc91b81-2ff1-4bd2-b79b-ec218463253a",
"ramdisk_id":"25f19ee8-a350-4d8c-bb53-12d0f834d52f",
"image_meta":{
"kernel_id":"7fc91b81-2ff1-4bd2-b79b-ec218463253a",
"container_format":"ami",
"min_ram":"0",
"ramdisk_id":"25f19ee8-a350-4d8c-bb53-12d0f834d52f",
"disk_format":"ami",
"min_disk":"0",
"base_image_ref":"34d9b758-e9c8-4162-ba15-78e6ce05a350"
},
"created_at":"2015-10-12 14:33:45.662955+00:00",
"launched_at":"",
"terminated_at":"",
"deleted_at":"",
"new_task_state":"scheduling",
"state":"building",
"state_description":"scheduling",
"old_state":"building",
"old_task_state":"scheduling",
"progress":"",
"audit_period_beginning":"2015-10-12T14:00:00.000000",
"audit_period_ending":"2015-10-12T14:33:45.699612",
"access_ip_v6":null,
"access_ip_v4":null,
"bandwidth":{
},
"metadata":{
},
}
}
We can define the following managed notification object for it::
@base.NovaObjectRegistry.register
class BwUsage(base.NovaObject):
# Version 1.0: Initial version
VERSION = '1.0'
fields = {
'label': fields.StringField(),
'bw_in': fields.IntegerField(),
'bw_out': fields.IntegerField(),
}
@base.NovaObjectRegistry.register
class FixedIp(base.NovaObject):
# Version 1.0: Initial version
VERSION = '1.0'
fields = {
'label': fields.StringField(),
'vif_mac': fields.StringField(),
'meta': fields.DictOfStringsField(),
'type': fields.StringField(), # maybe an enum
'version': fields.IntegerField(), # maybe an enum
'address': fields.IPAddress()
}
@base.NovaObjectRegistry.register
class InstanceUpdatePayload(base.NovaObject):
# Version 1.0: Initial version
VERSION = '1.0'
fields = {
'instance_id': fields.UUIDField(),
'user_id': fields.StringField(),
'tenant_id': fields.StringField(),
'reservation_id': fields.StringField(),
'display_name': fields.StringField(),
'host_name': fields.StringField(),
'host': fields.StringField(),
'node': fields.StringField(),
'os_type': fields.StringField(),
'architecture': fields.StringField(),
'cell_name': fields.StringField(),
'availability_zone': fields.StringField(),
'instance_flavor_id': fields.StringField(),
'instance_type_id': fields.IntegerField(),
'instance_type': fields.StringField(),
'memory_mb': fields.IntegerField(),
'vcpus': fields.IntegerField(),
'root_gb': fields.IntegerField(),
'disk_gb': fields.IntegerField(),
'ephemeral_gb': fields.IntegerField(),
'image_ref_url': fields.StringField(),
'kernel_id': fields.StringField(),
'ramdisk_id': fields.StringField(),
'image_meta': fields.DictOfStringField(),
'created_at': fields.DateTimeField(),
'launched_at': fields.DateTimeField(),
'terminated_at': fields.DateTimeField(),
'deleted_at': fields.DateTimeField(),
'new_task_state': fields.StringField(),
'state': fields.StringField()
'state_description': fields.StringField(),
'old_state': fields.StringField(),
'old_task_state': fields.StringField(),
'progress': fields.IntegerField(),
"audit_period_beginning": fields.DateTimeField(),
"audit_period_ending": fields.DateTimeField(),
'access_ip_v4': fields.IPV4AddressField(),
'access_ip_v6': fields.IPV6AddressField(),
'fixed_ips': fields.ListOfFixedIps(),
'bandwidth': fields.ListOfBwUsages()
'metadata': fields.DictOfStringField(),
}
@base.NovaObjectRegistry.register
class InstanceUpdateNotification(notification.NotificationBase):
# Version 1.0: Initial version
VERSION = '1.0'
fields = {
'payload': fields.ObjectField('InstanceUpdatePayload')
}
No db schema changes are foreseen.
REST API impact
---------------
None.
Security impact
---------------
None.
Notifications impact
--------------------
See the Proposed change and Data model section.
Other end user impact
---------------------
None.
Performance Impact
------------------
Sending both un-versioned and versioned wire format for a notification due to
keeping backward compatibility in Mitaka will increase the load on the message
bus. A config option will be provided to specify which version of the
notificatios shall be emited to mitigate this. Also the deployer can use NoOp
notification driver to turn the interface off.
Other deployer impact
---------------------
Backward compatibility for pre Mitaka notification consumers will be ensured
by emitting both the verisoned and the un-versioned notification format on the
wire for every versioned notification using the configured driver. Emitting the
un-versioned wire format of a versioned notification will be deprecated along
with a proper deprecation message in Mitaka and will be removed in N release.
A new config option ``notification_format`` will be introduced with three
possible values ``versioned``, ``un-versioned``, ``both`` to specify which
version of the notifications shall be emited. The ``un-versioned`` value will
be deprecated from the beginning to encourage deployers to start consuming
versioned notifications. In Mitaka the default version of this config option
will be ``both``.
Developer impact
----------------
Developers shall use the notification base classes when implementing a new
notification.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
* balazs-gibizer
Other contributors:
* belliott
* andrea-rosa-m
Work Items
----------
* Create the necessary base infrastructure e.g base classes, sample generation,
basic test infrastructure, documentation
* Create a versioned notifications for an easy old style notification
(e.g. keypair notifications) to serve as an example
* Create versioned notification for instance.update notification
* Create versioned notifications for nova.notification.send_api_fault type of
notifications
Dependencies
============
None
Testing
=======
Functional test coverage shall be provided for versioned notifications.
Documentation Impact
====================
* Notification samples shall be generated for versioned notifications.
* A new devref shall be created that describe how to add new versioned
notifications to nova
References
==========
* [1] http://docs.openstack.org/developer/oslo.messaging/notifier.html
* [2] https://github.com/openstack/nova/blob/master/nova/compute/utils.py#L320
* [3] https://github.com/openstack/nova/blob/bc6f30de953303604625e84ad2345cfb595170d2/nova/compute/api.py#L3769
* [4] The service status notification will be the first new notification using
a versisoned payload https://review.openstack.org/#/c/182350/ . That spec
will add only a minimal infrastructure to emit the versioned payload.
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Mitaka
- Introduced

View File

@@ -0,0 +1,215 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
===============================================
Virt driver pinning guest vCPU threads policies
===============================================
https://blueprints.launchpad.net/nova/+spec/virt-driver-cpu-thread-pinning
This feature aims to implement the remaining functionality of the
virt-driver-cpu-pinning spec. This entails implementing support for thread
policies.
Problem description
===================
Some applications must exhibit real-time or near real-time behavior. This
is general possible by making use of processor affinity and binding vCPUs to
pCPUs. This functionality currently exist in Nova. However, it is also
necessary to consider thread affinity in the context of simultaneous
multithreading (SMT) enabled systems, such as those with Intel(R)
Hyper-Threading Technology. In these systems, competition for shared resources
can result in unpredictable behavior.
Use Cases
----------
Depending on the workload being executed the end user or cloud admin may wish
to have control over how the guest uses hardware threads. To maximise cache
efficiency, the guest may wish to be pinned to thread siblings. Conversely
the guest may wish to avoid thread siblings. This level of control is of
particular importance to Network Function Virtualization (NFV) deployments,
which care about maximizing cache efficiency of vCPUs.
Project Priority
-----------------
None
Proposed change
===============
The flavor extra specs will be enhanced to support one new parameter:
* hw:cpu_thread_policy=prefer|isolate|require
This policy is an extension to the already implemented CPU policy parameter:
* hw:cpu_policy=shared|dedicated
The threads policy will control how the scheduler / virt driver places guests
with respect to CPU threads. It will only apply if the CPU policy is
'dedicated', i.e. guest vCPUs are being pinned to host pCPUs.
- prefer: The host may or may not have an SMT architecture. This retains the
legacy behavior, whereby siblings are prefered when available. This is the
default if no policy is specified.
- isolate: The host must not have an SMT architecture, or must emulate a
non-SMT architecture. If the host does not have an SMT architecture, each
vCPU will simply be placed on a different core as expected. If the host
does have an SMT architecture (i.e. one or more cores have "thread
siblings") then each vCPU will be placed on a different physical core
and no vCPUs from other guests will be placed on the same core. As such,
one thread sibling is always guaranteed to always be unused.
- require: The host must have an SMT architecture. Each vCPU will be
allocated on thread siblings. If the host does not have an SMT architecture
then it will not be used. If the host has an SMT architecture, but not
enough cores with free thread siblings are available, then scheduling
will fail.
The image metadata properties will also allow specification of the threads
policy:
* hw_cpu_thread_policy=prefer|isolate|require
This will only be honored if the flavor specifies the 'prefer' policy, either
explicitly or implicitly as the defalt option. This ensures that the cloud
administrator can have absolute control over threads policy if desired.
Alternatives
------------
None.
Data model impact
-----------------
None.
The necessary changes were already completed in the original spec.
REST API impact
---------------
No impact.
The existing APIs already support arbitrary data in the flavor extra specs.
Security impact
---------------
No impact.
Notifications impact
--------------------
No impact.
The notifications system is not used by this change.
Other end user impact
---------------------
No impact.
Support for flavor extra specs is already available in the Python clients.
Performance Impact
------------------
The scheduler will incur small further overhead if a threads policy is set
on the image or flavor. This overhead will be negligible compared to that
implied by the enhancements to support NUMA policy and huge pages. It is
anticipated that dedicated CPU guests will typically be used in conjunction
with huge pages.
Other deployer impact
---------------------
The cloud administrator will gain the ability to define flavors with explicit
threading policy. Although not required by this design, it is expected that
the administrator will commonly use the same host aggregates to group hosts
for both CPU pinning and large page usage, since these concepts are
complementary and expected to be used together. This will minimize the
administrative burden of configuring host aggregates.
Developer impact
----------------
It is expected that most hypervisors will have the ability to support the
required thread policies. The flavor parameter is simple enough that any Nova
driver would be able to support it.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
sfinucan
Work Items
----------
* Enhance the scheduler to take account of threads policy when choosing
which host to place the guest on.
* Enhance the scheduler to take account of threads policy when mapping
vCPUs to pCPUs
Dependencies
============
None.
Testing
=======
It is not practical to test this feature using the gate and tempest at this
time, since effective testing will require that the guests running the test
be provided with multiple NUMA nodes, each in turn with multiple CPUs.
These features will be validated using a third-party CI (Intel Compute CI).
Documentation Impact
====================
None.
The documentation changes were made in the previous change.
References
==========
Current "big picture" research and design for the topic of CPU and memory
resource utilization and placement. vCPU topology is a subset of this
work:
* https://wiki.openstack.org/wiki/VirtDriverGuestCPUMemoryPlacement
Current CPU pinning validation tests for Intel Compute CI:
* https://github.com/stackforge/intel-nfv-ci-tests
Existing CPU Pinning spec:
* http://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/virt-driver-cpu-pinning.html
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Liberty
- Introduced
* - Mitaka
- Revised to include rework policies, removing two, adding one and
clarifying the remainder

View File

@@ -0,0 +1,219 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
VMware Limits, Shares and Reservations
==========================================
https://blueprints.launchpad.net/nova/+spec/vmware-limits-mitaka
VMware Virtual Center provides options to specify limits, reservations and
shares for CPU, memory, disks and network adapters.
In the Juno cycle support for CPU limits, reservation and shares was added.
This blueprint proposes a way of supporting memory, disk and network
limits, reservations and shares.
For limits the utlization will not exceed the limit. Reservations will be
guaranteed for the instance. Shares are used to determine relative allocation
between resource consumers. In general, a consumer with more shares gets
proportionally more of the resource, subject to certain other constraints.
Problem description
===================
The VMware driver is only able to support CPU limits. Providing admins the
ability to provide limits, reservation and shares for memory, disks and
network adapters will be a very useful tool for providing QoS to tenants.
Use Cases
----------
* This will enable a cloud provider to provide SLA's to customers
* It will allow tenants to be guaranteed performance
Proposed change
===============
Due to the different models for different drivers and the API's in which
the backends expose we are unable to leverage the same existings flavor
extra specs.
For example for devices libvirt makes use of: 'hw_rng:rate_bytes',
'hw_rng:rate_period'.
In addition to this there are the following disk I/O options are:
'disk_read_bytes_sec', 'disk_read_iops_sec', 'disk_write_bytes_sec',
'disk_write_iops_sec', 'disk_total_bytes_sec', and
'disk_total_iops_sec'.
For bandwidth limitations there is the 'rxtx_factor'. This will not enable
us to provide the limits, reservations and shares for vifs. This is used in
some bases to pass the information through to Neutron so that the backend
network can do the limitations. The following extra_specs can be configured
for bandwidth I/O for vifs:
'vif_inbound_average', 'vif_inbound_burst', 'vif_inbound_peak',
'vif_outbound_average', 'vif_outbound_burst' and 'vif_outbound_peak'.
None of the above of possible for the VMware driver due to VC API's. The
following additions below are proposed:
Limits, reservations and shares will be exposed for the following:
* memory
* disks
* network adapters
The flavor extra specs for quotas has been extended to support:
* quota:memory_limit - The memory utilization of a virtual machine will not
exceed this limit, even if there are available resources. This is
typically used to ensure a consistent performance of virtual machines
independent of available resources. Units are MB.
* quota:memory_reservation - guaranteed minimum reservation (MB)
* quota:memory_shares_level - the allocation level. This can be 'custom',
'high' 'normal' or 'low'.
* quota:memory_shares_share - in the event that 'custom' is used, this is
the number of shares.
* quota:disk_io_limit - The I/O utilization of a virtual machine will not
exceed this limit. The unit is number of I/O per second.
* quota:disk_io_reservation - Reservation control is used to provide guaranteed
allocation in terms of IOPS
* quota:disk_io_shares_level - the allocation level. This can be 'custom',
'high' 'normal' or 'low'.
* quota:disk_io_shares_share - in the event that 'custom' is used, this is
the number of shares.
* quota:vif_limit - The bandwidth limit for the virtual network adapter.
The utilization of the virtual network adapter will not exceed this limit,
even if there are available resources. Units in Mbits/sec.
* quota:vif_reservation - Amount of network bandwidth that is guaranteed to
the virtual network adapter. If utilization is less than reservation, the
resource can be used by other virtual network adapters. Reservation is not
allowed to exceed the value of limit if limit is set. Units in Mbits/sec.
* quota:vif_shares_level - the allocation level. This can be 'custom',
'high' 'normal' or 'low'.
* quota:vif_shares_share - in the event that 'custom' is used, this is the
number of shares.
Alternatives
------------
The alternative is to create an abstract user concept that could help hide
the details and of the difference from end users, and isolate the differences
to just the admin users.
This is really out of the scope of what is proposed and will take a huge
cross driver effort. This will not only be relevant for flavors but maybe for
images too.
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
Preventing instances from exhausting storage resources can have a significant
performance impact.
Other deployer impact
---------------------
None
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
garyk
Work Items
----------
* common objects for limits, reservation and shares
* memory support
* disk support
* vif support
Dependencies
============
None
Testing
=======
This will be tested by the VMware CI. We will add tests to validate this.
Documentation Impact
====================
This should be documented in the VMware section.
References
==========
The vCenter API's can be see the following links:
* Disk IO: http://goo.gl/uepivS
* Memory: http://goo.gl/6sHwIA
* Network Adapters: http://goo.gl/c2amhq
History
=======
None

View File

@@ -0,0 +1,171 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
VMware: Expand Support for Opaque Networks
==========================================
https://blueprints.launchpad.net/nova/+spec/vmware-expand-opaque-support
An opaque network was introduced in the vSphere API in version 5.5. This is
a network that is managed by a control plane outside of vSphere. The identifier
and name of this network is made known to vSphere so that a host and virtual
machine ethernet device can be connected to them.
The initial code was added to support the NSX-MH (multi hypervisor) Neutron
plugin. This was in commit 2d7520264a4610068630d7664eeff70fb5e8c681. That
support would require the configuration of a global integration bridge and
ensuring that the network was connected to that bridge. This approach is
similar to the way in which this is implemented in the libvirt VIF driver.
In the Liberty cycle, a new plugin was added to the openstack/vmware-nsx
repository, this is called NSXv3. This is to support a new NSX backend. This
is a multi-hypervisor plugin. The support for libvirt, Xen etc. already exists.
This spec will deal with the compute integration for the VMware VC driver.
Problem description
===================
This spec will deal with the configuration of the Opaque network for the NSXv3
Neutron driver.
Use Cases
----------
This is required for the NSXv3 plugin. Without it Nova will be unable to attach
a ethernet device to a virtual machine.
Proposed change
===============
The change is self contained within the VMware driver code and just related to
how the ethernet device backing is configured. This is only when the Neutron
virtual port is of the type 'ovs'. The NSXv3 plugin will ensure that the port
type is set to 'ovs'. The VC driver will need to treat this port type.
When the type is 'ovs' there are two different flows:
* If the configuration flag 'integration_bridge' is set. This is for the
NSX-MH plugin. This requires that the backing type opaqueNetworkId be set
as the 'integration_bridge'; the backing type opaqueNetworkType be set as
'opaque'.
* If the flag is not set then this is the NSXv3 plugin. This requires that
the backing value opaqueNetworkId be set as the neutron network UUID; the
backing type opaqueNetworkType will have value 'nsx.LogicalSwitch'; and the
backing externalId has the neutron port UUID.
.. note::
* The help for the configuration option 'integration_bridge' will be updated
to reflect the values for the different plugins.
* A log warning will appear if the invalid VC version is used.
* The above should be done regardless of this support.
Alternatives
------------
None
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
The NSXv3 support will be greenfield.
The NSX-MH will be deprecated in favor of the NSXv3 plugin. As a result of
this we will set the default 'integration_bridge' value as None. This means
that a user running the existing NSX-MH will need to make sure that this value
is set. This is something that will be clearly documented.
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
garyk
Work Items
----------
The implementation of the changes in Nova can be seen at:
https://review.openstack.org/#/c/165750/.
Dependencies
============
This code depends on the Neutron driver NSXv3 added in the Liberty cycle.
This code can be found at https://github.com/openstack/vmware-nsx/blob/master/vmware_nsx/plugins/nsx_v3/plugin.py
Testing
=======
The code is tested as part of the Neutron CI testing.
Documentation Impact
====================
We will need to make sure that the release notes are updated to explain the
configuration of CONF.vmware.integration_bridge config. As mentioned above
that is only relevant to the NSX-MH as the code will be changed to support
the NSXv3.
References
==========
* https://www.vmware.com/support/developer/converter-sdk/conv55_apireference/vim.OpaqueNetwork.html
* https://review.openstack.org/#/c/165750/
History
=======
None

View File

@@ -0,0 +1,191 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==============================
Volume Operations When Shelved
==============================
https://blueprints.launchpad.net/nova/+spec/volume-ops-when-shelved
Currently attach, detach and swap volume operations are allowed when
an instance is paused, stopped and soft deleted, but are
not allowed when an instance has been shelved. These operations are
possible when an instance is shelved so we should enable them.
Problem description
===================
The attach, detach and swap volume operations are not allowed when an
instance is in the shelved or shelved_offloaded states. From a user's
perspective this is at odds with the fact these operations can be
performed on instances in other inactive states.
Use Cases
---------
As a cloud user I want to be able to detach volumes from my shelved instance
and use them elsewhere, without having to unshelve the instance first.
As a cloud user I want to be able to perform all the volume operations on
a shelved instance that I can when it is stopped, paused or soft deleted.
Proposed change
===============
Shelved instances can be in one of two possible states: shelved and
shelved_offloaded (ignoring transitions during shelving and unshelving).
When in shelved the instance is still on a host but inactive. When in
shelved_offloaded the instance has been removed from the host and the
resources it was using there are released.
Volume operations on an instance in the shelved state are similar to
any other state when on the host. The operations can be enabled by allowing
them at the compute API for this state. The existing compute manager code
does handle this case already; it is merely disabled in the API.
The shelved_offloaded state is different. In this case the instance is not
on any host, so functions to attach and detach need to be implemented in
the API in the same way that the code to detach volumes for deletion is done.
These will only perform the steps to manage the block device mappings and
register with cinder. Any actual attachment to a host will be completed
when the instance is unshelved as usual.
The compute api attach volume code makes an rpc call to the hosting compute
manager to select a name for the device, which includes a call into the virt
driver. This can not be done when the instance is offloaded
because it is not on a host.
In fact, devices names are set when an instance is booted
and there is no guarantee that a name provided by the user will be
respected. So the new attach method for the shelved_offloaded state will
defer name selection until the instance is unshelved. This avoids the need
to call a compute manager at all.
Alternatives
------------
The only clear alternative is to not allow volumes to be attached or detached
when an instance is shelved.
Data model impact
-----------------
None.
REST API impact
---------------
The attach, detach and swap operations will
be allowed when the instance is in the shelved and shelved_offloaded states.
Instead of returning the existing HTTP error 409 (Conflict)
the return values will be the same as they are for other valid states.
This change will require an API microversion increment.
Security impact
---------------
None.
Notifications impact
--------------------
None.
Other end user impact
---------------------
None.
Performance Impact
------------------
None.
Other deployer impact
---------------------
None.
Developer impact
----------------
None.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
pmurray
Other contributors:
andrea-rosa-m
Work Items
----------
The following changes will be required:
#. Change the guards on the attach, detach and swap functions in the compute
API to allow them when the instance is in the shelved state.
#. Add functions to attach, detach and swap volumes that are be executed
locally at the API when the instance is in the shelved offloaded state.
#. Add code to handle device names on unshelve (devices attached in
shelved_offloaded will have had name selection deferred to unshelve).
#. Change the guards on the attach, detach and swap functions to allow them
when the instance is in the shelved_offloaded state.
Dependencies
============
This spec is a step towards allowing boot volumes to be attached and
detached when in the shelved_offloaded state (see [1]). But this spec
also provides useful functionality on its own.
This spec adds more opportunity to get race conditions due to
conflicting parallel operations, it is important to note that those races
are not introduced by this change but already exist in nova and they are
going to be addressed by a different change, please see [2] for more
information.
Testing
=======
Most of the attach and detach functionality can be tested with unit tests.
In particular the shelved state is the same as shutdown or stopped.
New unit tests will be needed for the new attach and detach functions in the
shelved offloaded state.
A tempest test will be added to check that the sequence of shelving,
detaching/attaching volumes and then unshelving leads to a running
instance with the expected volumes correctly attached.
Documentation Impact
====================
This spec will affect cloud users. They will now be able to perform volume
operations on shelved instances.
References
==========
[1] https://blueprints.launchpad.net/openstack/?searchtext=detach-boot-volume
[2] https://review.openstack.org/216578
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Mitaka
- Introduced