glance-specs/specs/kilo/catalog-index-service.rst
Lakshmi N Sampath 58f8f92af4 Catalog Index Service
Implements: blueprint catalog-index-service

This is intended to improve performance of Glance API services while
dramatically improving search capabilities.

It will improve performance by offloading user search queries from existing
API servers. In addition, we are working on numerous improvements in Horizon
which will include improvements to image, snapshot, artifact details and
searching. The desired user experience is greatly dependent upon a rich,
dynamic, near real time faceted and aggregated search capability with a strong
query language.

DocImpact
APIImpact
SecurityImpact

Change-Id: I94428288cb6851ae74bd80b02b761f3509a188fc
Co-Authored-By: Lakshmi N Sampath <lakshmi.sampath@hp.com>
Co-Authored-By: Travis Tripp <travis.tripp@hp.com>
Co-Authored-By: Murali Sundar <murali.sundar@intel.com>
2015-03-25 07:51:43 -07:00

724 lines
24 KiB
ReStructuredText

..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=====================
Catalog Index Service
=====================
https://blueprints.launchpad.net/glance/+spec/catalog-index-service
This is intended to improve performance of Glance API services while
dramatically improving search capabilities.
It will improve performance by offloading user search queries from existing
API servers. In addition, we are working on numerous improvements in Horizon
which will include improvements to image, snapshot, artifact details and
searching. The desired user experience is greatly dependent upon a rich,
dynamic, near real time faceted and aggregated search capability with a strong
query language.
This will initially be considered "Experimental API". The API is intended to
be as close to final as possible, but we reserve the right to completely
abandon this or change it making it backwards in-compatible.
Problem description
===================
Glance has metadata for all images and provides listing of them. If you want
to search for images based on criteria, the API is limited and somewhat
inflexible. Search currently is limited to union ("AND" operations) searching
of certain hard coded attributes. There is no support for intersection ( "OR"
operations).
Searching does not include searching descriptions and has limited support for
property based searches. Full text searching of descriptions is typically slow
with a traditional relational database. Adding full text indexing to the
database can degrade overall performance of the database (affects inserts).
With the addition of metadata definitions in Juno release, the new search API
should allow the users to specify the search criteria using the tags and
property type definition in addition to adhoc string search. They should
ideally get auto-completion of possible properties and property values with
near real time response.
Artifact definitions will support storing dynamic properties based on the
artifact type. However, proper indexing and search is not possible with a
traditional RDBMS. You end up with full table scans across tables that are not
applicable for a given query. In addition, writing and understanding the
query engine is difficult to understand and maintain, especially after the
original authors move on.
Adding new properties with constraints for highly performant searching is
difficult to achieve in a relational database.
A search engine needs to be easily customizable so that the data being
collected can be changed dynamically and the way it is indexed and searched
can be easily modified without requiring source data migration. (e.g. a new
type of object to group namespaces together called "category")
Typical search interfaces should also provide facilities like auto-completion
and search suggestions with near real time performance. This is not possible
with a traditional database.
Adding more attributes to a search query over time should be easy to extend
and maintain without exponentially increasing the complexity of the search
logic.
User should be able to combine the search query across resources (images,
metadata, artifacts) easily.
A search engine should not put additional load on the normal functions of the
primary service and should easily accommodate more users by distributing the
load on separate server instances. For example, a search query from Horizon
should not impact Nova.
Proposed change
===============
We are proposing a new catalog search service for Glance that will improve
performance of Glance API services while dramatically improving search
capabilities. The following subsections detail the concepts.
Block Diagram:
https://wiki.openstack.org/w/images/7/74/Index-service-block-diagram.png
The service will be based on Elasticsearch. Elasticsearch is a search server
based on Lucene. It provides a distributed, scalable, near real-time, faceted,
multitenant-capable full-text search engine with a RESTful web interface and
schema-free JSON documents. Elasticsearch is developed and released as open
source under the terms of the Apache License. Notable users of Elasticsearch
include Wikimedia, StumbleUpon, Mozilla, Quora, Foursquare, Etsy, SoundCloud,
GitHub, FDA, CERN, and Stack Exchange.
(Source: http://en.wikipedia.org/wiki/Elasticsearch)
The elastic-recheck project also uses Elasticsearch (and kibana) to classify
and track OpenStack gate failures.
(Source: http://status.openstack.org/elastic-recheck)
**Indexing**
*Index*
This will serve as the cache for all search requests. It will be backed by
Elasticsearch.
*Index Loaders*
Index loaders define the data mappings for indexing and load the data from the
source. They are called during initialization of service and on-demand later
when required to index everything. They ensure that all appropriate RBAC
information is included in the index to facilitate appropriate authorization
on search responses.
The index loaders will attempt to maintain native API format as best as
possible with as much direct pass through as possible so that data manipulation
and maintenance is kept to a minimum.
An example Glance Image data mapping would be::
{
'dynamic': True,
'properties': {
'id': {'type': 'string'},
'name': {'type': 'string'},
'description': {'type': 'string'},
'tags': {'type': 'string'},
'disk_format': {'type': 'string'},
'container_format': {'type': 'string'},
'size': {'type': 'long'},
'virtual_size': {'type': 'long'},
'status': {'type': 'string'},
'visibility': {'type': 'string'},
'checksum': {'type': 'string'},
'min_disk': {'type': 'long'},
'min_ram': {'type': 'long'},
'owner': {'type': 'string'},
'protected': {'type': 'boolean'},
},
}
*Index Updates*
Once the index is initialized it needs to be constantly updated to keep it in
sync with the data source. Update clients will listen for notifications from
data sources to re-index the data for specific resources (e.g. an image or
artifact).
For glance, it would listen on message Topic for notifications like (image.create,
image.update, etc) and reindex for the effected image metadata.
(More info at http://docs.openstack.org/developer/glance/notifications.html)
*Index Management API*
Allows for CRUD management of loading, updating and deleting data in the index.
Indexing is allowed only for admin users.
Default policy.json will be::
{
"catalog_index": "role:admin",
"catalog_search": ""
}
**Searching**
The search API allows users to execute a search query and get back search hits
that match the query. The query can either be provided using a simple query
string as a parameter, or using a request body.
.. note:: Search query is not parsed and passed "as-is" to elastic search engine except for adding filters. Response from search engine could be filtered based on the plugin implementation of document type.
All search APIs can be applied across multiple types within an index, and
across multiple indices with support for the multi index syntax.
This will allow for search phrase completion as well as search suggestions(
such as handling misspellings)
The search will have two levels of RBAC.
1. API level policy checks using policy.json files. This will allow coarse
grained RBAC support for simple deny / allow on API usage.
2. RBAC query filters. These will be defined in conjunction with index loaders.
When a request comes in, the type(s) of resource(s) being requested will map
to an RBAC query filter.
The RBAC query filter will add any appropriate filters to the request being
sent into the elastic search service, such that only specific results that
the user is allowed to view will be returned.
For example, the image index loader will include indexing owner information
and visibility information. The RBAC filter will examine the incoming request
and adds filters to the request so that the results don't include non-shared /
non-public images from a different project than the user making the request.
Property protected fields will be read from the config file and will be added
as "source filtering" field(s) in elasticsearch query which will keep/remove the
protected fields from the search output based on the authorization of the user.
Alternatives
------------
Searching data could also be achieved by writing SQL queries on the Glance
database but there are several factors which do not make it an ideal
solution:
* Joins across multiple tables in real time will make the response time very
slow
* Full text searching of descriptions is typically slow with a traditional
relational database. Adding full text indexing to the database can degrade
overall performance of the database
* Property types can be added dynamically using metadefs and proper indexing
in relational databases is not possible
* Search queries will be running against the same database used by Glance core
functions and inadvertently effecting their response time.
* Adding more attributes to search query over time should be easy to extend
and maintain without exponentially increasing the complexity of the search
logic
User should be able to combine the search query across resources (images,
artifacts) etc. and the search engine should not be tightly integrated with
any specific module.
Another alternative would be for clients to load the entire data set and
search within the client. This means every user gets all the data every
time the user loads the page and has to keep it in sync with server side data.
This is increases the load and burden on the core OpenStack service providing
the data and is slower since the client has to load the entire dataset across
the network. In addition, the client has to recreate the logic for things like
search suggestions and complex queries with AND / OR logic.
It should be noted that NONE of these options also include an ability to do things
like get search request scoring of results returned with a configurable threshold
for results (something elastic search provides).
Data model impact
-----------------
The data being indexed will be stored outside the Glance SQL database and
therefore we don't expect any data model changes in Glance.
REST API impact
---------------
Common Response Codes
* Create Success: 201 Created
* Modify Success: 200 OK
* Delete Success: 204 No Content
* Failure: 400 Bad Request with details.
* Forbidden: 403 Forbidden
* Not found: 404 Not found e.g. if specific entity not found
* Method Not Allowed: 405 Not allowed e.g. if trying to delete on a list resource
* Not Implemented: 501 Not Implemented e.g. HEAD not implemented
This is an experimental API
**API Version**
Search images supports both GET and POST.
Elasticsearch supports GET with query params but its a limited subset of query DSL.
GET is implemented here with a request body to make use of all the available query options
Please refer to the following URI for the Query DSL
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html
Search images(GET)::
GET /v2/search
Example Request Body::
{
"index": ["glance"],
"type": ["image"],
"query": {
"query_string": {
"query": "cirros"
}
}
}
Example Response Body::
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.40409642,
"hits": [
{
"_index": "search",
"_type": "image",
"_id": "75fbdd4c-3e5b-4552-8950-9bb5262babcd",
"_score": 0.40409642,
"_source": {
"status": "active",
"virtual_size": null,
"name": "cirros-0.3.2-x86_64-uec-ramdisk",
"property": [],
"container_format": "ari",
"min_ram": 0,
"disk_format": "ari",
"properties": [],
"owner": "f72690e85b2a4ff095f50b7fad99429a",
"protected": false,
"checksum": "68085af2609d03e51c7662395b5b6e4b",
"min_disk": 0,
"is_public": true,
"size": 3723817,
"id": "75fbdd4c-3e5b-4552-8950-9bb5262babcd",
"description": ""
}
},
{
"_index": "search",
"_type": "image",
"_id": "95467ea8-dd34-4bdd-8a6a-f52e47ee9bce",
"_score": 0.23091224,
"_source": {
"status": "active",
"virtual_size": null,
"name": "cirros-0.3.2-x86_64-uec",
"property": [
"kernel_id_d00ea383-a1fa-48d3-b56c-880093730b53",
"ramdisk_id_75fbdd4c-3e5b-4552-8950-9bb5262babcd",
"hypervisor_type_uml",
"hw_watchdog_action_poweroff"
],
"container_format": "ami",
"min_ram": 0,
"disk_format": "ami",
"properties": [
{
"name": "kernel_id",
"value": "d00ea383-a1fa-48d3-b56c-880093730b53"
},
{
"name": "ramdisk_id",
"value": "75fbdd4c-3e5b-4552-8950-9bb5262babcd"
},
{
"name": "hypervisor_type",
"value": "uml"
},
{
"name": "hw_watchdog_action",
"value": "poweroff"
}
],
"owner": "f72690e85b2a4ff095f50b7fad99429a",
"protected": false,
"checksum": "4eada48c2843d2a262c814ddc92ecf2c",
"min_disk": 0,
"is_public": true,
"size": 25165824,
"id": "95467ea8-dd34-4bdd-8a6a-f52e47ee9bce",
"description": ""
}
},
{
"_index": "search",
"_type": "image",
"_id": "d00ea383-a1fa-48d3-b56c-880093730b53",
"_score": 0.067124054,
"_source": {
"status": "active",
"virtual_size": null,
"name": "cirros-0.3.2-x86_64-uec-kernel",
"property": [],
"container_format": "aki",
"min_ram": 0,
"disk_format": "aki",
"properties": [],
"owner": "f72690e85b2a4ff095f50b7fad99429a",
"protected": false,
"checksum": "836c69cbcd1dc4f225daedbab6edc7c7",
"min_disk": 0,
"is_public": true,
"size": 4969360,
"id": "d00ea383-a1fa-48d3-b56c-880093730b53",
"description": ""
}
}
]
}
}
Search images(POST)::
POST /v2/search
Please refer to the following URI for the Query DSL
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html
Example Request Body::
{
"index": ["glance"],
"type": ["image"],
"query": {
"query_string": {
"query": "cirros"
}
}
}
Example Response Body::
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.40409642,
"hits": [
{
"_index": "search",
"_type": "image",
"_id": "75fbdd4c-3e5b-4552-8950-9bb5262babcd",
"_score": 0.40409642,
"_source": {
"status": "active",
"virtual_size": null,
"name": "cirros-0.3.2-x86_64-uec-ramdisk",
"property": [],
"container_format": "ari",
"min_ram": 0,
"disk_format": "ari",
"properties": [],
"owner": "f72690e85b2a4ff095f50b7fad99429a",
"protected": false,
"checksum": "68085af2609d03e51c7662395b5b6e4b",
"min_disk": 0,
"is_public": true,
"size": 3723817,
"id": "75fbdd4c-3e5b-4552-8950-9bb5262babcd",
"description": ""
}
},
{
"_index": "search",
"_type": "image",
"_id": "95467ea8-dd34-4bdd-8a6a-f52e47ee9bce",
"_score": 0.23091224,
"_source": {
"status": "active",
"virtual_size": null,
"name": "cirros-0.3.2-x86_64-uec",
"property": [
"kernel_id_d00ea383-a1fa-48d3-b56c-880093730b53",
"ramdisk_id_75fbdd4c-3e5b-4552-8950-9bb5262babcd",
"hypervisor_type_uml",
"hw_watchdog_action_poweroff"
],
"container_format": "ami",
"min_ram": 0,
"disk_format": "ami",
"properties": [
{
"name": "kernel_id",
"value": "d00ea383-a1fa-48d3-b56c-880093730b53"
},
{
"name": "ramdisk_id",
"value": "75fbdd4c-3e5b-4552-8950-9bb5262babcd"
},
{
"name": "hypervisor_type",
"value": "uml"
},
{
"name": "hw_watchdog_action",
"value": "poweroff"
}
],
"owner": "f72690e85b2a4ff095f50b7fad99429a",
"protected": false,
"checksum": "4eada48c2843d2a262c814ddc92ecf2c",
"min_disk": 0,
"is_public": true,
"size": 25165824,
"id": "95467ea8-dd34-4bdd-8a6a-f52e47ee9bce",
"description": ""
}
},
{
"_index": "search",
"_type": "image",
"_id": "d00ea383-a1fa-48d3-b56c-880093730b53",
"_score": 0.067124054,
"_source": {
"status": "active",
"virtual_size": null,
"name": "cirros-0.3.2-x86_64-uec-kernel",
"property": [],
"container_format": "aki",
"min_ram": 0,
"disk_format": "aki",
"properties": [],
"owner": "f72690e85b2a4ff095f50b7fad99429a",
"protected": false,
"checksum": "836c69cbcd1dc4f225daedbab6edc7c7",
"min_disk": 0,
"is_public": true,
"size": 4969360,
"id": "d00ea383-a1fa-48d3-b56c-880093730b53",
"description": ""
}
}
]
}
}
Index images: index, create, update and delete data::
POST /v2/index
Indexing is allowed only for admin users.
Supported actions are index, create, update and delete
Example Request Body::
{
"default_index": "search",
"default_type": "image",
"actions": [
{
"action": "create",
"index": "search",
"type": "image",
"id": "d00ea383-a1fa-48d3-b56c-880093730b54",
"data": {
"status": "active",
"virtual_size": null,
"name": "cirros-0.3.3-x86_64-uec-kernel",
"property": [],
"container_format": "aki",
"min_ram": 0,
"disk_format": "aki",
"properties": [],
"owner": "f72690e85b2a4ff095f50b7fad99429a",
"protected": false,
"checksum": "836c69cbcd1dc4f225daedbab6edc7c7",
"min_disk": 0,
"is_public": false,
"size": 4969360,
"id": "d00ea383-a1fa-48d3-b56c-880093730b54",
"description": ""
}
},
{
"action": "update",
"index": "search",
"type": "image",
"id": "75fbdd4c-3e5b-4552-8950-9bb5262babcd",
"data": {
"name": "cirros x86",
"status": "inactive"
}
},
{
"action": "delete",
"index": "search",
"type": "image",
"id": "95467ea8-dd34-4bdd-8a6a-f52e47ee9bce"
}
]
}
Security impact
---------------
None to existing Glance API.
Search queries will apply filters to return data that the user is authorized
to see. See description.
Notifications impact
--------------------
None to existing notifications. Will only consume notifications
Need to add metadef notifications to Glance service.
Other end user impact
---------------------
Update python-glanceclient as needed
Performance Impact
------------------
No changes to existing API or code
Data from Glance DB will read once during initialization to index it inside
search engine.
This is intended to improve performance of Glance API services while
dramatically improving search capabilities. It will improve performance
by offloading user search queries from existing API servers.
Other deployer impact
---------------------
Glance Catalog Index service will be installed as a separate service
with its own port and endpoint.
This will initially be considered "Experimental API". The API is intended to be
as close to final as possible, but we reserve the right to completely abandon
this or change it making it backwards in-compatible.
glance-manage will have new commands for indexing image, metadef, and artifact
data
The deployment will be targeted as a single region service. In future if required
an "Aggregate search" of all regions which can search across all the regions
could be provided.
Developer impact
----------------
These are new API's and will not impact any existing API's.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
lakshmi-sampath, kamil-rykowski,
Other contributors:
wayne-okuma, travis-tripp
Reviewers
---------
Core reviewer(s):
nikhil-komawar zhiyan
Other reviewer(s):
icordasc
Work Items
----------
* Installation of Elastic Search in Glance environment (single node)
* Index Dictionary data in ElasticSearch
* Write a tool to index all metadata objects (namespaces objects,
properties) from database into elasticsearch
* Write a tool to index all images from database into elasticsearch
* Merge used properties from Glance(optional)
* Listen to notifications/events form Glance on Image CRUD(optional) for
continuous indexing of new/old data
* Create Glance Search API - Interface to backend ElasticSearch
* Make Policy checks on requests
* Filter request based on RBAC with user token
* Search Images
* List all the results by given search query string
* Create Glance Index API
* Policy checks
* Discuss with Openstack/Infra
* Test environment for elasticsearch
* Devstack integration of single node elastic search.
* Metadef notifications
* Generate and Listen to metadef notifications
* Calls the tools (loaders)
* Documentation update
* Update glance client
* Update glance manage
Dependencies
============
* Depends on elasticsearch for search engine
Testing
=======
Unit tests will be added for all possible code with a goal of being able to
isolate functionality as much as possible.
Tempest tests will be added wherever possible.
Documentation Impact
====================
Docs needed for new service and usage.
All document changes will indicate this as "Experimental API"
References
==========
* Elasticsearch Query DSL
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html