diff --git a/specs/kilo/catalog-index-service.rst b/specs/kilo/catalog-index-service.rst new file mode 100644 index 00000000..1ee4fe6c --- /dev/null +++ b/specs/kilo/catalog-index-service.rst @@ -0,0 +1,723 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +===================== +Catalog Index Service +===================== + +https://blueprints.launchpad.net/glance/+spec/catalog-index-service + +This is intended to improve performance of Glance API services while +dramatically improving search capabilities. + +It will improve performance by offloading user search queries from existing +API servers. In addition, we are working on numerous improvements in Horizon +which will include improvements to image, snapshot, artifact details and +searching. The desired user experience is greatly dependent upon a rich, +dynamic, near real time faceted and aggregated search capability with a strong +query language. + +This will initially be considered "Experimental API". The API is intended to +be as close to final as possible, but we reserve the right to completely +abandon this or change it making it backwards in-compatible. + +Problem description +=================== + +Glance has metadata for all images and provides listing of them. If you want +to search for images based on criteria, the API is limited and somewhat +inflexible. Search currently is limited to union ("AND" operations) searching +of certain hard coded attributes. There is no support for intersection ( "OR" +operations). + +Searching does not include searching descriptions and has limited support for +property based searches. Full text searching of descriptions is typically slow +with a traditional relational database. Adding full text indexing to the +database can degrade overall performance of the database (affects inserts). + +With the addition of metadata definitions in Juno release, the new search API +should allow the users to specify the search criteria using the tags and +property type definition in addition to adhoc string search. They should +ideally get auto-completion of possible properties and property values with +near real time response. + +Artifact definitions will support storing dynamic properties based on the +artifact type. However, proper indexing and search is not possible with a +traditional RDBMS. You end up with full table scans across tables that are not +applicable for a given query. In addition, writing and understanding the +query engine is difficult to understand and maintain, especially after the +original authors move on. + +Adding new properties with constraints for highly performant searching is +difficult to achieve in a relational database. + +A search engine needs to be easily customizable so that the data being +collected can be changed dynamically and the way it is indexed and searched +can be easily modified without requiring source data migration. (e.g. a new +type of object to group namespaces together called "category") + +Typical search interfaces should also provide facilities like auto-completion +and search suggestions with near real time performance. This is not possible +with a traditional database. + +Adding more attributes to a search query over time should be easy to extend +and maintain without exponentially increasing the complexity of the search +logic. + +User should be able to combine the search query across resources (images, +metadata, artifacts) easily. + +A search engine should not put additional load on the normal functions of the +primary service and should easily accommodate more users by distributing the +load on separate server instances. For example, a search query from Horizon +should not impact Nova. + +Proposed change +=============== + +We are proposing a new catalog search service for Glance that will improve +performance of Glance API services while dramatically improving search +capabilities. The following subsections detail the concepts. + +Block Diagram: + +https://wiki.openstack.org/w/images/7/74/Index-service-block-diagram.png + +The service will be based on Elasticsearch. Elasticsearch is a search server +based on Lucene. It provides a distributed, scalable, near real-time, faceted, +multitenant-capable full-text search engine with a RESTful web interface and +schema-free JSON documents. Elasticsearch is developed and released as open +source under the terms of the Apache License. Notable users of Elasticsearch +include Wikimedia, StumbleUpon, Mozilla, Quora, Foursquare, Etsy, SoundCloud, +GitHub, FDA, CERN, and Stack Exchange. +(Source: http://en.wikipedia.org/wiki/Elasticsearch) + +The elastic-recheck project also uses Elasticsearch (and kibana) to classify +and track OpenStack gate failures. +(Source: http://status.openstack.org/elastic-recheck) + +**Indexing** + +*Index* + +This will serve as the cache for all search requests. It will be backed by +Elasticsearch. + +*Index Loaders* + +Index loaders define the data mappings for indexing and load the data from the +source. They are called during initialization of service and on-demand later +when required to index everything. They ensure that all appropriate RBAC +information is included in the index to facilitate appropriate authorization +on search responses. + +The index loaders will attempt to maintain native API format as best as +possible with as much direct pass through as possible so that data manipulation +and maintenance is kept to a minimum. + +An example Glance Image data mapping would be:: + + { + 'dynamic': True, + 'properties': { + 'id': {'type': 'string'}, + 'name': {'type': 'string'}, + 'description': {'type': 'string'}, + 'tags': {'type': 'string'}, + 'disk_format': {'type': 'string'}, + 'container_format': {'type': 'string'}, + 'size': {'type': 'long'}, + 'virtual_size': {'type': 'long'}, + 'status': {'type': 'string'}, + 'visibility': {'type': 'string'}, + 'checksum': {'type': 'string'}, + 'min_disk': {'type': 'long'}, + 'min_ram': {'type': 'long'}, + 'owner': {'type': 'string'}, + 'protected': {'type': 'boolean'}, + }, + } + +*Index Updates* + +Once the index is initialized it needs to be constantly updated to keep it in +sync with the data source. Update clients will listen for notifications from +data sources to re-index the data for specific resources (e.g. an image or +artifact). + +For glance, it would listen on message Topic for notifications like (image.create, +image.update, etc) and reindex for the effected image metadata. +(More info at http://docs.openstack.org/developer/glance/notifications.html) + +*Index Management API* + +Allows for CRUD management of loading, updating and deleting data in the index. +Indexing is allowed only for admin users. + +Default policy.json will be:: + + { + "catalog_index": "role:admin", + "catalog_search": "" + } + + +**Searching** + +The search API allows users to execute a search query and get back search hits +that match the query. The query can either be provided using a simple query +string as a parameter, or using a request body. + +.. note:: Search query is not parsed and passed "as-is" to elastic search engine except for adding filters. Response from search engine could be filtered based on the plugin implementation of document type. + +All search APIs can be applied across multiple types within an index, and +across multiple indices with support for the multi index syntax. + +This will allow for search phrase completion as well as search suggestions( +such as handling misspellings) + +The search will have two levels of RBAC. + +1. API level policy checks using policy.json files. This will allow coarse +grained RBAC support for simple deny / allow on API usage. + +2. RBAC query filters. These will be defined in conjunction with index loaders. +When a request comes in, the type(s) of resource(s) being requested will map +to an RBAC query filter. + +The RBAC query filter will add any appropriate filters to the request being +sent into the elastic search service, such that only specific results that +the user is allowed to view will be returned. + +For example, the image index loader will include indexing owner information +and visibility information. The RBAC filter will examine the incoming request +and adds filters to the request so that the results don't include non-shared / +non-public images from a different project than the user making the request. + +Property protected fields will be read from the config file and will be added +as "source filtering" field(s) in elasticsearch query which will keep/remove the +protected fields from the search output based on the authorization of the user. + +Alternatives +------------ + +Searching data could also be achieved by writing SQL queries on the Glance +database but there are several factors which do not make it an ideal +solution: + +* Joins across multiple tables in real time will make the response time very + slow +* Full text searching of descriptions is typically slow with a traditional + relational database. Adding full text indexing to the database can degrade + overall performance of the database +* Property types can be added dynamically using metadefs and proper indexing + in relational databases is not possible +* Search queries will be running against the same database used by Glance core + functions and inadvertently effecting their response time. +* Adding more attributes to search query over time should be easy to extend + and maintain without exponentially increasing the complexity of the search + logic + +User should be able to combine the search query across resources (images, +artifacts) etc. and the search engine should not be tightly integrated with +any specific module. + +Another alternative would be for clients to load the entire data set and +search within the client. This means every user gets all the data every +time the user loads the page and has to keep it in sync with server side data. +This is increases the load and burden on the core OpenStack service providing +the data and is slower since the client has to load the entire dataset across +the network. In addition, the client has to recreate the logic for things like +search suggestions and complex queries with AND / OR logic. + +It should be noted that NONE of these options also include an ability to do things +like get search request scoring of results returned with a configurable threshold +for results (something elastic search provides). + +Data model impact +----------------- + +The data being indexed will be stored outside the Glance SQL database and +therefore we don't expect any data model changes in Glance. + +REST API impact +--------------- + +Common Response Codes + +* Create Success: 201 Created +* Modify Success: 200 OK +* Delete Success: 204 No Content +* Failure: 400 Bad Request with details. +* Forbidden: 403 Forbidden +* Not found: 404 Not found e.g. if specific entity not found +* Method Not Allowed: 405 Not allowed e.g. if trying to delete on a list resource +* Not Implemented: 501 Not Implemented e.g. HEAD not implemented + +This is an experimental API + +**API Version** + +Search images supports both GET and POST. +Elasticsearch supports GET with query params but its a limited subset of query DSL. +GET is implemented here with a request body to make use of all the available query options + +Please refer to the following URI for the Query DSL +http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html + +Search images(GET):: + + GET /v2/search + +Example Request Body:: + + { + "index": ["glance"], + "type": ["image"], + "query": { + "query_string": { + "query": "cirros" + } + } + } + +Example Response Body:: + + { + "took": 5, + "timed_out": false, + "_shards": { + "total": 10, + "successful": 10, + "failed": 0 + }, + "hits": { + "total": 3, + "max_score": 0.40409642, + "hits": [ + { + "_index": "search", + "_type": "image", + "_id": "75fbdd4c-3e5b-4552-8950-9bb5262babcd", + "_score": 0.40409642, + "_source": { + "status": "active", + "virtual_size": null, + "name": "cirros-0.3.2-x86_64-uec-ramdisk", + "property": [], + "container_format": "ari", + "min_ram": 0, + "disk_format": "ari", + "properties": [], + "owner": "f72690e85b2a4ff095f50b7fad99429a", + "protected": false, + "checksum": "68085af2609d03e51c7662395b5b6e4b", + "min_disk": 0, + "is_public": true, + "size": 3723817, + "id": "75fbdd4c-3e5b-4552-8950-9bb5262babcd", + "description": "" + } + }, + { + "_index": "search", + "_type": "image", + "_id": "95467ea8-dd34-4bdd-8a6a-f52e47ee9bce", + "_score": 0.23091224, + "_source": { + "status": "active", + "virtual_size": null, + "name": "cirros-0.3.2-x86_64-uec", + "property": [ + "kernel_id_d00ea383-a1fa-48d3-b56c-880093730b53", + "ramdisk_id_75fbdd4c-3e5b-4552-8950-9bb5262babcd", + "hypervisor_type_uml", + "hw_watchdog_action_poweroff" + ], + "container_format": "ami", + "min_ram": 0, + "disk_format": "ami", + "properties": [ + { + "name": "kernel_id", + "value": "d00ea383-a1fa-48d3-b56c-880093730b53" + }, + { + "name": "ramdisk_id", + "value": "75fbdd4c-3e5b-4552-8950-9bb5262babcd" + }, + { + "name": "hypervisor_type", + "value": "uml" + }, + { + "name": "hw_watchdog_action", + "value": "poweroff" + } + ], + "owner": "f72690e85b2a4ff095f50b7fad99429a", + "protected": false, + "checksum": "4eada48c2843d2a262c814ddc92ecf2c", + "min_disk": 0, + "is_public": true, + "size": 25165824, + "id": "95467ea8-dd34-4bdd-8a6a-f52e47ee9bce", + "description": "" + } + }, + { + "_index": "search", + "_type": "image", + "_id": "d00ea383-a1fa-48d3-b56c-880093730b53", + "_score": 0.067124054, + "_source": { + "status": "active", + "virtual_size": null, + "name": "cirros-0.3.2-x86_64-uec-kernel", + "property": [], + "container_format": "aki", + "min_ram": 0, + "disk_format": "aki", + "properties": [], + "owner": "f72690e85b2a4ff095f50b7fad99429a", + "protected": false, + "checksum": "836c69cbcd1dc4f225daedbab6edc7c7", + "min_disk": 0, + "is_public": true, + "size": 4969360, + "id": "d00ea383-a1fa-48d3-b56c-880093730b53", + "description": "" + } + } + ] + } + } + +Search images(POST):: + + POST /v2/search + +Please refer to the following URI for the Query DSL +http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html + +Example Request Body:: + + { + "index": ["glance"], + "type": ["image"], + "query": { + "query_string": { + "query": "cirros" + } + } + } + +Example Response Body:: + + { + "took": 5, + "timed_out": false, + "_shards": { + "total": 10, + "successful": 10, + "failed": 0 + }, + "hits": { + "total": 3, + "max_score": 0.40409642, + "hits": [ + { + "_index": "search", + "_type": "image", + "_id": "75fbdd4c-3e5b-4552-8950-9bb5262babcd", + "_score": 0.40409642, + "_source": { + "status": "active", + "virtual_size": null, + "name": "cirros-0.3.2-x86_64-uec-ramdisk", + "property": [], + "container_format": "ari", + "min_ram": 0, + "disk_format": "ari", + "properties": [], + "owner": "f72690e85b2a4ff095f50b7fad99429a", + "protected": false, + "checksum": "68085af2609d03e51c7662395b5b6e4b", + "min_disk": 0, + "is_public": true, + "size": 3723817, + "id": "75fbdd4c-3e5b-4552-8950-9bb5262babcd", + "description": "" + } + }, + { + "_index": "search", + "_type": "image", + "_id": "95467ea8-dd34-4bdd-8a6a-f52e47ee9bce", + "_score": 0.23091224, + "_source": { + "status": "active", + "virtual_size": null, + "name": "cirros-0.3.2-x86_64-uec", + "property": [ + "kernel_id_d00ea383-a1fa-48d3-b56c-880093730b53", + "ramdisk_id_75fbdd4c-3e5b-4552-8950-9bb5262babcd", + "hypervisor_type_uml", + "hw_watchdog_action_poweroff" + ], + "container_format": "ami", + "min_ram": 0, + "disk_format": "ami", + "properties": [ + { + "name": "kernel_id", + "value": "d00ea383-a1fa-48d3-b56c-880093730b53" + }, + { + "name": "ramdisk_id", + "value": "75fbdd4c-3e5b-4552-8950-9bb5262babcd" + }, + { + "name": "hypervisor_type", + "value": "uml" + }, + { + "name": "hw_watchdog_action", + "value": "poweroff" + } + ], + "owner": "f72690e85b2a4ff095f50b7fad99429a", + "protected": false, + "checksum": "4eada48c2843d2a262c814ddc92ecf2c", + "min_disk": 0, + "is_public": true, + "size": 25165824, + "id": "95467ea8-dd34-4bdd-8a6a-f52e47ee9bce", + "description": "" + } + }, + { + "_index": "search", + "_type": "image", + "_id": "d00ea383-a1fa-48d3-b56c-880093730b53", + "_score": 0.067124054, + "_source": { + "status": "active", + "virtual_size": null, + "name": "cirros-0.3.2-x86_64-uec-kernel", + "property": [], + "container_format": "aki", + "min_ram": 0, + "disk_format": "aki", + "properties": [], + "owner": "f72690e85b2a4ff095f50b7fad99429a", + "protected": false, + "checksum": "836c69cbcd1dc4f225daedbab6edc7c7", + "min_disk": 0, + "is_public": true, + "size": 4969360, + "id": "d00ea383-a1fa-48d3-b56c-880093730b53", + "description": "" + } + } + ] + } + } + +Index images: index, create, update and delete data:: + + POST /v2/index + +Indexing is allowed only for admin users. +Supported actions are index, create, update and delete + +Example Request Body:: + + { + "default_index": "search", + "default_type": "image", + "actions": [ + { + "action": "create", + "index": "search", + "type": "image", + "id": "d00ea383-a1fa-48d3-b56c-880093730b54", + "data": { + "status": "active", + "virtual_size": null, + "name": "cirros-0.3.3-x86_64-uec-kernel", + "property": [], + "container_format": "aki", + "min_ram": 0, + "disk_format": "aki", + "properties": [], + "owner": "f72690e85b2a4ff095f50b7fad99429a", + "protected": false, + "checksum": "836c69cbcd1dc4f225daedbab6edc7c7", + "min_disk": 0, + "is_public": false, + "size": 4969360, + "id": "d00ea383-a1fa-48d3-b56c-880093730b54", + "description": "" + } + }, + { + "action": "update", + "index": "search", + "type": "image", + "id": "75fbdd4c-3e5b-4552-8950-9bb5262babcd", + "data": { + "name": "cirros x86", + "status": "inactive" + } + }, + { + "action": "delete", + "index": "search", + "type": "image", + "id": "95467ea8-dd34-4bdd-8a6a-f52e47ee9bce" + } + ] + } + + +Security impact +--------------- + +None to existing Glance API. +Search queries will apply filters to return data that the user is authorized +to see. See description. + +Notifications impact +-------------------- + +None to existing notifications. Will only consume notifications +Need to add metadef notifications to Glance service. + +Other end user impact +--------------------- + +Update python-glanceclient as needed + +Performance Impact +------------------ + +No changes to existing API or code +Data from Glance DB will read once during initialization to index it inside +search engine. + +This is intended to improve performance of Glance API services while +dramatically improving search capabilities. It will improve performance +by offloading user search queries from existing API servers. + +Other deployer impact +--------------------- + +Glance Catalog Index service will be installed as a separate service +with its own port and endpoint. + +This will initially be considered "Experimental API". The API is intended to be +as close to final as possible, but we reserve the right to completely abandon +this or change it making it backwards in-compatible. + +glance-manage will have new commands for indexing image, metadef, and artifact +data + +The deployment will be targeted as a single region service. In future if required +an "Aggregate search" of all regions which can search across all the regions +could be provided. + +Developer impact +---------------- + +These are new API's and will not impact any existing API's. + + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + lakshmi-sampath, kamil-rykowski, + +Other contributors: + wayne-okuma, travis-tripp + +Reviewers +--------- + +Core reviewer(s): + nikhil-komawar zhiyan + +Other reviewer(s): + icordasc + +Work Items +---------- + +* Installation of Elastic Search in Glance environment (single node) +* Index Dictionary data in ElasticSearch + * Write a tool to index all metadata objects (namespaces objects, + properties) from database into elasticsearch + * Write a tool to index all images from database into elasticsearch + * Merge used properties from Glance(optional) + * Listen to notifications/events form Glance on Image CRUD(optional) for + continuous indexing of new/old data +* Create Glance Search API - Interface to backend ElasticSearch + * Make Policy checks on requests + * Filter request based on RBAC with user token +* Search Images + * List all the results by given search query string +* Create Glance Index API + * Policy checks +* Discuss with Openstack/Infra + * Test environment for elasticsearch +* Devstack integration of single node elastic search. +* Metadef notifications + * Generate and Listen to metadef notifications +* Calls the tools (loaders) +* Documentation update +* Update glance client +* Update glance manage + + +Dependencies +============ + +* Depends on elasticsearch for search engine + + +Testing +======= + +Unit tests will be added for all possible code with a goal of being able to +isolate functionality as much as possible. + +Tempest tests will be added wherever possible. + + +Documentation Impact +==================== + +Docs needed for new service and usage. + +All document changes will indicate this as "Experimental API" + + +References +========== + +* Elasticsearch Query DSL + http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html + + + + + + +