Files
nova-specs/specs/pike/approved/list-instances-using-searchlight.rst
Matt Riedemann 533a5ebcb5 Clarify wording in Searchlight spec - error conditions
There was some confusing / conflicting wording in the error
conditions part of the spec (do we log error or warning?). This
clarifies that to be consistent about saying we just log a warning
if configured to use Searchlight but it's not available (which is
what we do if Placement isn't available).

Related to blueprint list-instances-using-searchlight

Change-Id: I1331d685e82f100a1965bd835e89147c10cbcda1
2017-05-08 14:22:49 -04:00

345 lines
14 KiB
ReStructuredText

..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
================================
List instances using Searchlight
================================
`<https://blueprints.launchpad.net/nova/+spec/list-instances-using-searchlight>`_
To support efficiently listing instances across multiple cells Nova plans to
integrate support for using `Searchlight`_. This will be an optional feature
as we prove out the effectiveness of this approach.
.. _Searchlight: https://docs.openstack.org/developer/searchlight/
Problem description
===================
Listing instances across multiple cells will be inefficient in a large
deployment since the compute API will have to query each cell and apply filters
and then merge sort the results in Python. It will be more efficient to use a
single global data store like an ElasticSearch (ES) cluster fronted by
Searchlight.
Use Cases
---------
As a user in an OpenStack multiple cell environment it's important that I can
quickly get a view of all my instances. I want to be able to filter and sort
them on the server, and have a predictable sort order when new instances are
created in multiple cells.
Proposed change
===============
Add a configuration option to Nova which will toggle whether or not the compute
API will iterate the cells to list instances and then merge sort the results,
or query Searchlight and translate those results for a proper compute API
response.
This will be configurable and disabled by default because an existing
deployment may not be setup to emit versioned notifications or using
Searchlight, so initially there would be no data to give back in the response.
.. note:: When using Searchlight to service a ``GET /servers`` or
``GET /servers/detail`` request we will get all of the necessary information
from Searchlight. There will not be any additional calls to the nova
database, otherwise that would defeat the purpose of this change. We will
not use Searchlight for ``GET /servers/{server_id}`` because when we have a
specific server ID we can look up which cell it's in using the
InstanceMapping record in the Nova API database.
Error conditions
----------------
* If configured to use Searchlight but it is not available in the service
catalog or Nova does not have access to it, we will fallback to the default
path which means iterating the cells to list instances and merge sort the
results. A warning would be logged in this case but the API request should
not fail with a 500. We will also set a flag so that we do not
continue to check until the service is restarted, similar to how we handled
the placement API in ``nova.scheduler.client.report.SchedulerReportClient``
with the ``@safe_connect`` decorator in Newton.
Known issues
------------
* It is currently possible for an administrator to list deleted instances in
the compute REST API. This is due to the fact that when an instance is
"deleted" in Nova, it is not actually deleted from the database. It is
considered "soft deleted", meaning ``instances.deleted != 0`` in the
database. That is not to be confused with the ``SOFT_DELETED`` status in the
REST API which is based on the ``reclaim_instance_interval`` configuration
option. There are two ways to remove a (soft) deleted instance from the REST
API:
1. Run the ``nova-manage db archive_deleted_rows`` command which will move
the (soft) deleted instances to the ``shadow_instances`` table.
2. Purge the deleted instances from the database directly. While not a
supported operation in Nova directly, there are publicly available
scripts for operators to use for purging the database.
The issue this presents with using Searchlight is that currently Searchlight
will delete an index entry for the instance once it processes the
``compute.instance.delete.end`` notification. This effectively means that
with the existing behavior, if using Searchlight you will not be able to list
deleted instances since they will not be stored in Searchlight. This includes
the `changes-since` query parameter no longer returning deleted instances,
which it does today.
Having noted this, we should mention that there is no guarantee today that
you can list deleted instances from the compute REST API based on the
data retention/archive/purge policy in the given cloud provider. For example,
if the cloud provider has a policy to archive or purge all deleted instances
after 30 days, then they already cannot list instances that were deleted more
than 30 days ago.
We will have to sort this limitation out with the Searchlight team. It might
be possible, for example, to add a configuration option to Searchlight to
control how long an index can be stored for a deleted instance before it is
finally removed. It is worth noting that ElasticSearch used to have a concept
of a ``_ttl`` field but that was `removed in 5.0`_.
Another alternative is that if `deleted` or `changes-since` query parameters
are specified, we do not use Searchlight and instead iterate across cells.
This would not be ideal as it would mean we still have to maintain two code
paths for listing instances, but we will probably have to do that for a
couple of releases anyway until we can make Searchlight required, which gives
us some time to find better solutions with Searchlight.
* When making changes to the compute REST API server response, developers will
have to also mirror those changes in the versioned notification
`InstancePayload`_. This also poses an issue between microversions in the
REST API and the versions on the InstancePayload object. Microversions in the
REST API are opt-in by the client and Nova will continue to honor older
microversions. However, the versioned notifications are pushed out at the
latest available version.
As an example, say we remove a field 'foo' from the server response in
microversion 2.53. The compute API will still return the 'foo' field in
requests with microversion before 2.53. The InstancePayload object cannot
remove the 'foo' field without a major version bump, and even then it would
indirectly break the compute API contract if we were using Searchlight since
Searchlight would not give back a server response with the 'foo' field if it
were removed from the InstancePayload object. This essentially means we
cannot drop fields from the InstancePayload for versioned notifications
unless we have also raised the minimum required microversion in the compute
REST API to the point that we are also dropping fields from the server
response.
.. _removed in 5.0: https://www.elastic.co/guide/en/elasticsearch/reference/5.0/breaking_50_mapping_changes.html#_literal__timestamp_literal_and_literal__ttl_literal
.. _InstancePayload: https://github.com/openstack/nova/blob/15.0.0/nova/notifications/objects/instance.py#L19
Data migrations
---------------
A deployment that is not using Searchlight will have none of the necessary
information at first to start using this change for serving compute REST API
requests.
Once Searchlight is deployed and consuming versioned notifications from Nova,
new instance operations will be indexed. However, any existing instance data
will need to be transferred to Searchlight.
Therefore before configuring nova-api to use Searchlight, the deployer must
perform a bulk index of the existing instances from Nova into Searchlight. This
can be performed by issuing::
searchlight-manage index sync --type OS::Nova::Server
That will tell Searchlight to call the compute REST API to list existing
instances and populate the server indexes using the results. See the
Searchlight documentation for more details on `bulk indexing`_.
.. _bulk indexing: https://docs.openstack.org/developer/searchlight/indexingservice.html#bulk-indexing
Alternatives
------------
None.
Data model impact
-----------------
None.
REST API impact
---------------
While this will change how ``GET /servers`` and ``GET /servers/detail``
responses are generated on the backend, there should be no user-visible changes
to the contract on those APIs. This will be enforced via Tempest testing.
It should also be noted that ElasticSearch supports `pagination`_ and
Searchlight is largely compatible with ElasticSearch, so it supports paging by
page/size. You could also do it with the OpenStack 'marker' method by ordering
on id.
.. _pagination: https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html
Security impact
---------------
This would require deploying an ElasticSearch cluster and front that with
project Searchlight, which means another endpoint in the service catalog and
potentially service user. The ES cluster will need to have proper access
controls in place. This also means enabling notifications in the deployment
such that Nova versioned notifications can be fed into the Searchlight ES
cluster.
Notifications impact
--------------------
None. While this solution depends on using versioned notifications in Nova,
there are no changes proposed for notifications themselves.
Other end user impact
---------------------
None. This change should be transparent to the end user.
Performance Impact
------------------
The intent of this change is to improve performance when listing instances
across a multi-cell deployment. However, the actual performance will depend on
how well the ElasticSearch cluster performs.
Other deployer impact
---------------------
* Configure Nova to emit versioned notifications.
* Setup Searchlight including any service user and endpoint required for the
service catalog along with the backing data store, e.g. ElasticSearch.
* Existing deployments would need a certain amount of time to feed existing
instance data into Searchlight before switching the compute API over to using
it. See the `Data migrations`_ section above for more details.
Developer impact
----------------
Developers will have to ensure that any changes to the compute REST API which
require returning new fields in a response will have those new fields also in
versioned notifications sent to Searchlight.
Depending on how Searchlight implements support for versioned notifications,
developers may also need to update index mappings to expose the new fields. We
might be able to automate that in Searchlight, however, using the work done in
the `json-schema-for-versioned-notifications blueprint`_. If we can not or do
not end up using versioned notification schema in Searchlight then that would
create an install/upgrade order dependency such that Searchlight must be
installed/upgraded before nova-api.
Let's run through a scenario of what this might entail when one is adding a new
field in the compute REST API response. We also need to put that in the
versioned notification payload so Searchlight gets it. The point about the
schema is if the notification also sends the schema, then Searchlight can use
that schema dynamically, otherwise you have to update Searchlight statically to
know about the new field.
Taking the static case, if one is adding a new field to the server
response in the compute API, and let's assume it's not in the instances table
(it's a new column in the DB), then the steps are:
1. Add column to instances table in nova DB.
2. Add field to Instance object.
3. Add field to InstancePayload object.
4. Add schema change to Searchlight for the new field.
5. Add the new field to the compute REST API response via microversion.
This of course means that you have to upgrade Searchlight before you upgrade
nova-api to get the new field out of the REST API.
.. _json-schema-for-versioned-notifications blueprint: https://blueprints.launchpad.net/nova/+spec/json-schema-for-versioned-notifications
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Zhenyu (Kevin) Zheng (Kevin_Zheng)
Other contributors:
Matt Riedemann (mriedem)
Work Items
----------
* Get a working development environment where Searchlight is regularly running
with Nova and consuming notifications.
* Add the conditional path to the compute API ``get_all`` flow where we query
Searchlight for data if Nova is configured to do so.
* There will likely need to be some kind of translation utility code in place
to convert the Searchlight response to an ``nova.objects.InstanceList``
object which will be returned to the REST API handler.
* Integrate Searchlight and configure Nova to emit versioned notifications in
the ``gate-tempest-dsvm-neutron-nova-next-full-ubuntu-xenial-nv`` job for
testing.
* Install guide changes to explain the setup of Searchlight with Nova.
Dependencies
============
* For parity with the existing compute REST API, this change depends on
blueprint `additional-notification-fields-for-searchlight`_ for getting the
needed information into Searchlight.
* This change also depends on Searchlight adding support for nova versioned
notifications which is tracked in `blueprint nova-versioned-notifications`_.
.. _additional-notification-fields-for-searchlight: https://blueprints.launchpad.net/nova/+spec/additional-notification-fields-for-searchlight
.. _blueprint nova-versioned-notifications: https://blueprints.launchpad.net/searchlight/+spec/nova-versioned-notifications
Testing
=======
* Unit tests for the changes in the compute API.
* The majority of the test effort for this change will be integrating
Searchlight into the
``gate-tempest-dsvm-neutron-nova-next-full-ubuntu-xenial-nv`` job, enabling
versioned notifications and then using Searchlight as described in this spec
for listing instances. A full Tempest run on that job will show if we have
parity with the API responses.
* When we have a multi-cell CI job setup then we will probably also make the
same changes to that job for efficient instance listing operations.
Documentation Impact
====================
The `compute admin guide`_ will need to be updated to discuss how to enable
this feature. It is also possible that the install, operations and architecture
guides may also need to be updated.
.. _compute admin guide: https://docs.openstack.org/admin-guide/compute.html
References
==========
None.
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Pike
- Introduced