287 lines
12 KiB
ReStructuredText
287 lines
12 KiB
ReStructuredText
..
|
|
Copyright (c) 2015 Hewlett-Packard Development Company, L.P.
|
|
All Rights Reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may
|
|
not use this file except in compliance with the License. You may obtain
|
|
a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
|
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
|
License for the specific language governing permissions and limitations
|
|
under the License.
|
|
|
|
Searchlight Indexing
|
|
====================
|
|
In order for the Searchlight API service to return results, information
|
|
must be indexed. The two primary mechanisms by which this happens are indexing
|
|
from the source (which allows a complete index rebuild) and incrementally
|
|
updating the index based on information received via notifications.
|
|
The information indexed is determined by a plugin model.
|
|
|
|
Search plugins
|
|
--------------
|
|
|
|
The search service determines the type of information that is indexed and
|
|
searchable via a plugin mechanism.
|
|
|
|
See :ref:`searchlight-plugins` for plugin installation and general
|
|
configuration information.
|
|
|
|
See each plugin below for detailed information about specific plugins:
|
|
|
|
.. toctree::
|
|
:maxdepth: 1
|
|
:glob:
|
|
plugins/*
|
|
|
|
.. _Indexing-Model:
|
|
|
|
Indexing model
|
|
--------------
|
|
The Mitaka Searchlight release introduced the ability to continue executing
|
|
search requests while reindexing operations are running. This feature is called
|
|
*zero-downtime reindexing*. In order to implement zero-downtime indexing, the
|
|
concept of a *resource group* was introduced.
|
|
|
|
A *resource group* is a collection of plugins that share an Elasticsearch
|
|
index. Since each plugin represents a *resource type*, you can think of a
|
|
resource group as a collection of resource types.
|
|
|
|
For each resource group, Searchlight creates an index whose name consists of
|
|
the resource group name appended with a timestamp. Each resource group is
|
|
referred to by a pair of Elasticsearch aliases_. One alias is used for
|
|
searching by the API (the *search alias*), and the other (the *listener alias*)
|
|
is used to index incoming events.
|
|
|
|
During reindexing, a new index is created, and the listener alias is pointed at
|
|
both the old and new indices. Incoming events are therefore indexed into both
|
|
indices. The search alias is left pointing at the old index. Once indexing is
|
|
finished, both aliases are pointed solely at the new index and the old index
|
|
is deleted.
|
|
|
|
In order to improve the performance of reindexing, index refresh of the new
|
|
index is disabled during reindexing, and turned on after reindexing is done.
|
|
As a consequence, Documents synced to the new index are not searchable until
|
|
index is refreshed, but document retrieval by IDs still works, because GET
|
|
operation in Elasticsearch is realtime.
|
|
|
|
It is important to note that zero-downtime reindexing requires that **all**
|
|
plugins in a resource group are indexed together. When it's desired to index an
|
|
individual resource type, an optimization copies existing data directly from
|
|
the old index to the new one to avoid re-harvesting the data from each service
|
|
API.
|
|
|
|
.. note::
|
|
Due to some limitations discovered during the Mitaka release, indexing into
|
|
multiple indices (multiple plugin resource groups) is disabled. The newton release
|
|
implemented full support for specifying different resource groups for different
|
|
resource types.
|
|
|
|
.. _using-resource-groups:
|
|
|
|
Sorting across resource groups
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
Using multiple resource groups will impact sort behavior when sorting on fields
|
|
across resource types when all the resource types don't have the requested 'sort-by field'.
|
|
Follow the guidelines below to avoid errors:
|
|
|
|
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html#_ignoring_unmapped_fields
|
|
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html#_missing_values
|
|
|
|
.. _aliases: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html
|
|
|
|
.. _ES-Bulk-Indexing:
|
|
|
|
Bulk indexing
|
|
-------------
|
|
To initially create the catalog index (or add new resource typs to it later),
|
|
run the following command::
|
|
|
|
$ searchlight-manage index sync
|
|
|
|
This will iterate through all registered and enabled search plugins and
|
|
request that they perform a full indexing of all data that's available to them.
|
|
|
|
It is also possible to index just a single resource, or all resources
|
|
belonging to a resource group. For instance, to index all glance images::
|
|
|
|
$ searchlight-manage index sync --type OS::Glance::Image
|
|
|
|
As described above, this will create a new index for all plugins that share a
|
|
resource group with OS::Glance::Image. The management command will retrieve
|
|
up-to-date information from the Glance API. Data for other plugins will be
|
|
bulk-copied from a preexisting index into the new one using the scroll_ and
|
|
bulk_ features of Elasticsearch.
|
|
|
|
You can use the wildcard character * at the *end* of the ``type`` argument.
|
|
For instance, the following will match all cinder plugins::
|
|
|
|
$ searchlight-manage index sync --type OS::Cinder::*
|
|
|
|
Wildcard characters are only allowed at the end of the argument; they will not
|
|
be matched anywhere else.
|
|
|
|
To index all resources in the 'searchlight' resource group::
|
|
|
|
$ searchlight-manage index sync --index searchlight
|
|
|
|
You will be prompted to confirm unless ``--force`` is provided.
|
|
|
|
The ``searchlight-manage index sync`` command may be re-run at any time to
|
|
perform a full re-index of the data. As described above, there should be no
|
|
or very little impact on search requests during this process.
|
|
|
|
.. _scroll: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
|
|
.. _bulk: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
|
|
|
|
Parent/child relationships
|
|
--------------------------
|
|
Some plugins contain multiple resources with parent/child relationships;
|
|
the Designate plugins are an example. Because reindexing parent data independent
|
|
of child documents does not logically make sense (without orphaning them), it
|
|
is not possible to request indexing of a child resource type::
|
|
|
|
$ searchlight-manage index sync --type OS::Designate::RecordSet
|
|
|
|
'OS::Designate::RecordSet' is a child of 'OS::Designate::Zone' and cannot be indexed separately.
|
|
Indexing 'OS::Designate::Zone' will re-index all child resource types.
|
|
|
|
You can see parent/child relationships in the list of resources presented prior
|
|
to indexing::
|
|
|
|
$ searchlight-manage index sync --type OS::Designate::Zone
|
|
|
|
Resource types (and indices) matching selection:
|
|
OS::Designate::Zone (designate)
|
|
----> OS::Designate::RecordSet
|
|
|
|
Child plugins will inherit their resource group name from their parent. Any
|
|
child configuration setting for resource_group_name will be ignored.
|
|
|
|
Incremental Updates
|
|
-------------------
|
|
|
|
Once a resource has been indexed, typically you will only need to consume
|
|
incremental updates rather than re-index the entire data set again. The
|
|
preferred methodolgy is to set up notification listening.
|
|
|
|
Notifications
|
|
^^^^^^^^^^^^^
|
|
Many services publish notifications when there are changes to the resources
|
|
they own. The searchlight listener consumes these notifications and will
|
|
perform incremental updates to the index based on those notifications.
|
|
|
|
To start this service, run the following command::
|
|
|
|
$ searchlight-listener
|
|
|
|
Note, this will typically require that you have configured notifications
|
|
properly for the service which owns the resource. For example, the glance
|
|
service owns images and metadata definitions. Please check the plugin
|
|
documentation for each service's specific configuration requirements.
|
|
|
|
Multi-Thread Support
|
|
--------------------
|
|
The Newton Searchlight release introduced multiple thread support for
|
|
indexing. Previously when the ``searchlight-manage index sync``
|
|
command was executed, all indexing occurred in a single thread. To boost
|
|
the performance of the indexing functionality, each resource type
|
|
will now index in its own thread. Multiple indexing threads will run
|
|
concurrently.
|
|
|
|
By default, the maximum number of simultaneous threads is 3. This limit
|
|
can be modified in the Searchlight configuration file. The setting is
|
|
called ``workers`` and lives under ``[manage]``. For example, to
|
|
increase the maximum number of threads to 6, the following can be added
|
|
to the Searchlight configuration file::
|
|
|
|
[manage]
|
|
workers=6
|
|
|
|
The use of threads can also affect the parsing of the log files. The
|
|
default formating of the log messages include only the process ID,
|
|
but no thread-specific information. This can be changed by modifying
|
|
the formating string settings in the Searchlight configuration file.
|
|
To add the thread ID for a message, add ``%(thread)d``. To add the thread
|
|
name, add ``%(threadName)s``. For example, to add the thread ID and the
|
|
thread name after the process ID to the logging message, the following
|
|
setting can be added to the Searchlight configuration file::
|
|
|
|
logging_default_format_string = %(asctime)s.%(msecs)03d %(process)d %(thread)d %(threadName)s %(levelname)s %(name)s [-] %(instance)s%(message)s
|
|
|
|
Force Elasticsearch indexing
|
|
----------------------------
|
|
The Newton Searchlight release introduced the ability to reindex
|
|
from Elasticsearch only, bypassing the plugin APIs altogether.
|
|
This option is useful if there has been a change to the mapping
|
|
definitions or the index settings. This functionality is enabled
|
|
with the option ``--apply-mapping-changes`` for the ``index`` command.
|
|
|
|
A sample usage would be::
|
|
|
|
$ searchlight-manage index aliases --apply-mapping-changes
|
|
|
|
The ``--type`` option is not compatible with the ``--apply-mapping-changes``
|
|
option. Specifying both options on the command line will result in an error.
|
|
|
|
.. warning::
|
|
|
|
The resource group cannot be changed when using this option.
|
|
If you do change the resource group, the underlying index will
|
|
be changed and will result in an empty index.
|
|
=======
|
|
|
|
.. _ES-Index-Cleanup:
|
|
|
|
Elasticsearch Index Cleanup
|
|
===========================
|
|
|
|
In some cases, there may be orphaned Searchlight indices in Elasticsearch.
|
|
An orphaned index is one that is no longer used by Searchlight, either
|
|
directly or through an alias.
|
|
|
|
To help detect which Searchlight-related indices may be orphaned in
|
|
Elasticsearch, the ``searchlight-manage`` command will display all indices
|
|
that are currently being used by Searchlight. This is the ``aliases``
|
|
option to the ``index`` command::
|
|
|
|
$ searchlight-manage index aliases
|
|
|
|
This command outputs a listing of all indices that are used by
|
|
Searchlight (based on the current configuration file). The aliases
|
|
associated with each index is also shown. A sample output will look
|
|
like this::
|
|
|
|
$ searchlight-manage index aliases
|
|
List of Elasticsearch indices (and their associated aliases) used by Searchlight.
|
|
|
|
Note:
|
|
The indices are based on the current config file.
|
|
To view indices used by other Searchlight config files, use the --config-file option.
|
|
|
|
Indices are denoted with a '*'
|
|
Aliases are denoted with a '+'
|
|
|
|
* searchlight-2016_07_13_17_09_27
|
|
+ searchlight-listener
|
|
+ searchlight-search
|
|
* sl-swift-2016_07_13_17_09_26
|
|
+ sl-swift-listener
|
|
+ sl-swift-search
|
|
|
|
The example shows that Searchlight is using two indices in Elasticsearch:
|
|
``searchlight-2016_07_13_17_09_27`` and ``sl-swift-2016_07_13_17_09_26``.
|
|
The index ``searchlight-2016_07_13_17_09_27`` has two aliases: ``searchlight-listener``
|
|
and ``searchlight-search``. The index ``sl-swift-2016_07_13_17_09_26`` has
|
|
two aliases: ``sl-swift-listener`` and ``sl-swift-search``.
|
|
|
|
Any other indices or aliases in Elasticsearch are not used by this specific
|
|
Searchlight configuration. NOTE: If there are other Searchlight
|
|
instances running with a different configuration, their indices and aliases
|
|
will not by displayed by this command. The user will need to rerun the
|
|
``index aliases`` command using these other configuration files.
|