Zero Downtime Reindexing Proposal
This patch provides the initial proposal for zero downtime reindexing. Change-Id: I110a6ffde79d3dd7fdff3b3a368efddf3a430955 Partially-Implements: blueprint zero-downtime-reindexing Co-Authored-By: Rick Aulino <rick.aulino@hp.com>
This commit is contained in:
parent
e9b441ea22
commit
dec4404ece
Binary file not shown.
Binary file not shown.
After Width: | Height: | Size: 38 KiB |
Binary file not shown.
After Width: | Height: | Size: 46 KiB |
Binary file not shown.
After Width: | Height: | Size: 47 KiB |
Binary file not shown.
After Width: | Height: | Size: 46 KiB |
|
@ -0,0 +1,487 @@
|
|||
|
||||
..
|
||||
c) Copyright 2015-2016 Hewlett-Packard Development Company, L.P.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
not use this file except in compliance with the License. You may obtain
|
||||
a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
License for the specific language governing permissions and limitations
|
||||
under the License.
|
||||
|
||||
================================================
|
||||
Zero Downtime Re-indexing
|
||||
================================================
|
||||
|
||||
https://blueprints.launchpad.net/searchlight/+spec/zero-downtime-reindexing
|
||||
|
||||
This feature enables seamless zero downtime re-indexing of resource data from
|
||||
an API user's point of view.
|
||||
|
||||
Problem Description
|
||||
===================
|
||||
|
||||
As a user of the searchlight API, we expect the following traits:
|
||||
* The index is up to date and coherent with the source data
|
||||
* The index is available
|
||||
* That we are not affected by updates and upgrades to the searchlight service
|
||||
|
||||
As a deployer, we expect the following:
|
||||
* That we can roll out service upgrades and update the index with new data
|
||||
* That we can bring the index back into coherency without downtime
|
||||
* That we can tune the service deployment according to performance needs
|
||||
* That we can have easy deployment of new / patched plugins
|
||||
* That we can change data mappings and re-index the data
|
||||
|
||||
Background
|
||||
----------
|
||||
|
||||
ElasticSearch documents are stored and indexed into an "index" (imagine that).
|
||||
The index is a logical namespace which points to primary and replica shards
|
||||
where the document is replicated. A shard is a single Apache Lucene instance.
|
||||
The shards are distributed amongst nodes in a cluster. API users only interact
|
||||
with the index and are not exposed to the internals, which ElasticSearch
|
||||
manages based on configuration inputs from the administrator.
|
||||
|
||||
Certain actions can only be done at index creation time, such as changing
|
||||
shard counts, changing the way data is indexed, etc. In addition to changing
|
||||
the data, re-populating an index that has lost coherency with source service
|
||||
data is much easier to do from scratch rather than determining what differences
|
||||
there are in the data. Due to this the data and indexes should be designed so
|
||||
that it is possible to re-index at any time without disruption to API users.
|
||||
The re-indexing happens while the services are in use, still indexing new
|
||||
documents in ElasticSearch.
|
||||
|
||||
In Searchlight 0.1.0, we allowed for each plugin to specify the index where
|
||||
the data should be stored via configuration in the searchlight-api.conf file.
|
||||
By default, all plugins store their data in the "searchlight" index. This was
|
||||
simply chosen as a starting point, because the amount of total data indexed
|
||||
for resource instance data is believed to be quite small in comparison to
|
||||
typical log based indexing for small deployments, but this may differ
|
||||
dramatically based on the resource type being indexed and the size of the
|
||||
deployment.
|
||||
|
||||
To reiterate, all resource types in Searchlight 0.1.0 (either the plug-in or
|
||||
the searches) have the ElasticSearch index hard-coded into them. This
|
||||
hard-coded functionality prevents Searchlight from doing smart things
|
||||
internally with ElasticSearch. Exposing indexes directly to the users is
|
||||
generally not recommended by the user community or by ElasticSearch. Instead,
|
||||
they recommend using aliases. API users can use an alias in exactly the same
|
||||
way as an index, but it can be changed to point to different index(es)
|
||||
transparently to the user. This allows for seamless migration between
|
||||
indexes, allowing for all of the above use cases to be fulfilled.
|
||||
|
||||
The concept of aliases is described in depth in the ElasticSearch guides [1].
|
||||
|
||||
Proposed Change
|
||||
===============
|
||||
|
||||
With this blueprint, we will divorce the plug-ins and searches from knowing
|
||||
about physical ElasticSearch indexes. Instead we will introduce the concept
|
||||
of a "Resource Type Group". A Resource Type Group is a collection of Resource
|
||||
Types that are treated as a single unit within ElasticSearch. All users of
|
||||
Searchlight will deal with Resource Type Groups, instead of low-level
|
||||
ElasticSearch details like an index or alias. A Resource Type Group will
|
||||
correspond to ElasticSearch aliases that are created and controlled by
|
||||
Searchlight.
|
||||
|
||||
The plug-in configuration in the searchlight-api.conf file will no longer
|
||||
specify the index name. Instead the plug-in will specify the Resource Type
|
||||
Group it chooses to be a member of. It is important for a plug-in to know which
|
||||
Resource Type Group it belongs. When some operations are undertaken by one
|
||||
member of a Resource Type Group, it will need to be done to all members in
|
||||
the group. There will be more details on this later.
|
||||
|
||||
Now that the users are removed from the internals of ElasticSearch, we
|
||||
can handle zero downtime re-indexing. The basic idea is to create new
|
||||
indexes on demand, populate them, but use ElasticSearch aliases inside of
|
||||
Searchlight in a way that makes the actual indexes being used transparent
|
||||
to both API users and Searchlight listener processes.
|
||||
|
||||
We will not directly expose the alias to API users. We will use resource
|
||||
type mapping to transparently direct API requests to the correct alias.
|
||||
When implementing this blueprint, we may choose to still expose an "index"
|
||||
through the plug-in API. Exposing an "index" may allow other open-source
|
||||
ElasticSearch libraries (which are index-based) to still work. Currently
|
||||
we are not using any of these libraries, but we may not want to exclude
|
||||
their usage in the future.
|
||||
|
||||
Searchlight will internally manage two aliases per Resource Type Group.
|
||||
Note: Having these two aliases is the key change enabling zero downtime
|
||||
indexing.
|
||||
|
||||
* API alias
|
||||
* Listener/Sync alias
|
||||
|
||||
The names of the aliases will be derived from the Resource Type
|
||||
Group name in the configuration file. Exactly how this is handled will
|
||||
be left to the implementation. For example, we can append "-listener" and
|
||||
/Sync"-search" to the Resource Group Type name for the two aliases.
|
||||
|
||||
The API alias will point to a single "live" index and only be switched once
|
||||
the index is completely ready to serve data to the API users. Completely
|
||||
ready means that the new index is a superset of the old index. This allows
|
||||
for transparently switching the incoming requests to the new index without
|
||||
disruption to the API end user.
|
||||
|
||||
The listener alias will point to 1...* indexes at a time. The listener
|
||||
simply knows that it must perform CRUD operations on the provided alias. The
|
||||
fact that it might be updating more than one index at a time is
|
||||
transparent to the listener. The benefit to this is that the listeners do
|
||||
not have to provide any additional management API as ElasticSearch handles
|
||||
this for us automatically.
|
||||
|
||||
The algorithm for searchlight-manage index sync will be changed to the following:
|
||||
|
||||
* Create a new index in ElasticSearch. Any mapping changes to the index are done
|
||||
now, before the index is used.
|
||||
* Add the new index to the listener(s) alias. At this point, the listener’s alias is
|
||||
pointing to multiple indexes. The new index is now “live” and receiving data. Any
|
||||
data received by the listener(s) will be sent to both indexes.
|
||||
|
||||
* There is an issue with indexing an alias with multiple indexes [2]. The
|
||||
issue is that this case is not allowed! In this case we will catch the
|
||||
exception and write to both indexes individually in this step. For more
|
||||
details, refer to the "Implementation Notes" subsection below.
|
||||
* Bulk dump of data from each Resource Type associated with the old index to the
|
||||
new index in ElasticSearch.
|
||||
|
||||
* The same issue with multiple indexes mentioned above applies here also.
|
||||
* Atomically switch the aliases for the API alias to point to the new index.
|
||||
* We will use the actions command with remove/add commands in the same actions API call.
|
||||
ElasticSearch treats this as an atomic operation. [2]::
|
||||
|
||||
{ "actions" : [ { "remove" : { ...} }, { "add" : " {...} } ] }
|
||||
|
||||
* Remove the old index from the listener(s) alias.
|
||||
* Delete the old index from ElasticSearch. We do not want the index to hang around
|
||||
forever. We can figure out when the index is no longer being used and then delete
|
||||
it (asynchronous task, a type of internal reference count, etc). If this turns out
|
||||
to be too unwieldy we can revisit this action.
|
||||
|
||||
Notes:
|
||||
* This algorithm assumes that we can handle out of order events. See below for more details.
|
||||
* During the re-syncing process, the listener(s) will be adding any new documents to both indexes.
|
||||
* The listeners will always keep the ElasticSearch index associated with the API alias up to date.
|
||||
* The listeners will keep the old index up to date after the API alias has switched over to minimize any race conditions.
|
||||
|
||||
|
||||
A critical aspect to all of this is that the batch indexer and all
|
||||
notification handlers MUST only update documents if they have the most
|
||||
recent data. This is being handled by a separate bug [3]. In addition,
|
||||
Searchlight listeners and index must start setting the TTL field in deleted
|
||||
documents instead of deleting them right away. This functionality is covered
|
||||
in the ES deletion journal blueprint [4].
|
||||
|
||||
We are operating on a Resource Type Group as a whole. We need to make sure
|
||||
that the entire Resource Type Group is re-indexed instead of just a single
|
||||
Resource Type within the group. For example, consider the case where a
|
||||
Resource Type Group consists of Glance and Nova. When Searchlight gets a
|
||||
command to re-index Glance, Searchlight needs to also re-index Nova. Otherwise
|
||||
the new index will not have the previous Nova objects in it. If Nova did not
|
||||
re-index, the new index will not be a superset of the old index. When the
|
||||
alias switches to this new index it will be incomplete.
|
||||
|
||||
The CLI must support manual searchlight-manage commands as well as automated
|
||||
switchover. For example:
|
||||
|
||||
* Delete the specified or current index / alias for a specific resource type group.
|
||||
* Create a new index for the specified resource type group.
|
||||
* Switch API and listener aliases automatically when complete (default - yes).
|
||||
* Delete old index automatically when complete (default - yes).
|
||||
* Provide a status command so that progress can be seen.
|
||||
* List all aliases and indexes by resource type with their status
|
||||
* Can be used from a GUI or a separate CLI concurrently to monitor progress.
|
||||
|
||||
This change affects:
|
||||
|
||||
* The plugins API which lists plugins
|
||||
* The API
|
||||
* The Listener
|
||||
* The bulk indexer
|
||||
* The CLI
|
||||
|
||||
Illustrated Example
|
||||
-------------------
|
||||
|
||||
To further illuminate the blueprint we will turn to a series of images and save
|
||||
ourselves thousands of words. The images shows the state of Searchlight during
|
||||
sequence of operations.
|
||||
|
||||
For this example we have three resource types: Glance, Nova and Swift. There are
|
||||
two Resource Type Groups. The first group, RTG1, contains Glance and Nova. The
|
||||
two aliases associated with RTG1 are "RTG1-sync" for the plug-in listeners and
|
||||
"RTG1-query" for the plug-in searches. The second group, RTG2, contains Swift.
|
||||
The two aliases associated with RTG2 are "RTG2-sync" for the plug-in listener
|
||||
and "RTG2-query" for the plug-in search.
|
||||
|
||||
Figure 1: The initial State
|
||||
|
||||
.. image:: ../../images/ZeroFig1.png
|
||||
|
||||
First Searchlight will create the ElasticSearch index "Index1" for use by RTG1.
|
||||
The ElasticSearch aliases "RTG1-sync" and "RTG1-query" are created and will both be
|
||||
associated with the index "index1". Next Searchlight will create the
|
||||
ElasticSearch index "Index2" for use by RTG2. The ElasticSearch aliases
|
||||
"RTG2-sync" and "RTG2-query" are created and will both be associated with the index
|
||||
"Index2".
|
||||
|
||||
Glance has now created two documents "Glance ObjA" and "Glance ObjB". Nova has
|
||||
created two documents "Nova ObjC" and "Nova ObjD". These four new documents for
|
||||
the first Resource Type Group are now indexed. They will be indexed against
|
||||
alias "RTG1-sync" and end up in index "Index1".
|
||||
|
||||
Swift has now created two new documents "Swift ObjE" and "Swift ObjF". These two
|
||||
new documents for the second Resource Type Group are now indexed. They will be
|
||||
indexed against alias "RTG2-sync" and end up in index "Index2".
|
||||
|
||||
Figure 1 shows the current state of Searchlight.
|
||||
|
||||
A Glance search will be made against "RTG1-query". Going to "Index1" it will return
|
||||
"Glance ObjA", "Glance ObjB", "Nova ObjC" and "Nova ObjD". A Swift search will
|
||||
be made against "RTG2-query". Going to "index2" it will return "Swift ObjE" and
|
||||
"Swift ObjF".
|
||||
|
||||
Figure 2: Explicit Glance Re-sync
|
||||
|
||||
.. image:: ../../images/ZeroFig2.png
|
||||
|
||||
All of the changes from Image 1 are highlighted in red.
|
||||
|
||||
Searchlight receives a re-index command for Glance. After the re-sync, Glance
|
||||
creates two new documents "Glance ObjG" and "Glance ObjH". Nova creates one new
|
||||
document "Nova ObjI". Swift creates two new documents "Swift ObjJ" and "Swift
|
||||
ObjK".
|
||||
|
||||
Searchlight will create a new ElasticSearch index "Index3". Since Glance is
|
||||
re-syncing, the new index is associated with RTG1. Searchlight now associates
|
||||
both "Index1" and "Index3" to the alias "RTG1-sync". Since the new index "Index3"
|
||||
is not a superset of the index "Index1" yet, we do not change the RTG1 search
|
||||
alias "RTG1-query". It remains unchanged for now.
|
||||
|
||||
As the Glance re-sync occurs, the previous Glance documents "Glance ObjA" and
|
||||
"Glance ObjB" get indexed into "Index3". The new documents for RTG1 ("Glance
|
||||
ObjG", "Glance ObjH" and "Nova ObjI") are indexed against the alias "RTG1-sync".
|
||||
These documents end up in both "Index1" and "Index3".
|
||||
|
||||
The new documents for RTG2 ("Swift ObjJ" and "Swift ObjK") are indexed against
|
||||
the alias "RTG2-sync". These documents end up in "Index2".
|
||||
|
||||
Figure 2 shows the current state of Searchlight.
|
||||
|
||||
A Glance search will be made against "RTG1-query". Going to "Index1" it will
|
||||
return "Glance ObjA", "Glance ObjB", "Nova ObjC", "Nova ObjD", "Glance ObjG",
|
||||
"Glance ObjH" and "Nova ObjI". A Swift search will be made against "RTG2-query".
|
||||
Going to "index2" it will return "Swift ObjE", "Swift ObjF", "Swift ObjJ" and
|
||||
"Swift ObjK".
|
||||
|
||||
This diagram shows the subtle point that all resource types within a Resource
|
||||
Type Group need to re-synced together. If we did not re-sync Nova and updated
|
||||
the RTG1 search alias "RTG1-query" to be associated with the new index "Index3", the
|
||||
Searchlight state is incorrect. A Glance search will now be made against
|
||||
"Index3" and it will return "Glance ObjA", "Glance ObjB", "Glance ObjG",
|
||||
"Glance ObjH" and "Nova ObjI". This is incorrect as it does not include the
|
||||
earlier Nova documents: "Nova ObjC" and "Nova ObjD". This incomplete state is
|
||||
the reason that all resources in a Resource Type Group need to be re-synced
|
||||
before the Resource Type Group re-sync is to be considered completed.
|
||||
|
||||
Figure 3: Implicit Nova Re-Sync
|
||||
|
||||
.. image:: ../../images/ZeroFig3.png
|
||||
|
||||
All of the changes from Image 2 are highlighted in red.
|
||||
|
||||
Searchlight starts an implicit Nova re-sync, since Nova is a member of RTG1.
|
||||
All of the aliases are still set up correctly, so they do not need to change.
|
||||
After the re-sync, Glance creates one new document "Glance ObjL". Nova creates
|
||||
one new document "Nova ObjM". Swift creates one new documents "Swift ObjN".
|
||||
|
||||
As the Nova re-sync occurs, the previous Nova documents "Nova ObjC" and "Nova
|
||||
ObjD" get indexed into "Index3". The new documents for RTG1 ("Glance ObjL" and
|
||||
"Nova ObjM") are indexed against the alias "RTG1-sync". These documents end up in
|
||||
both "Index1" and "Index3".
|
||||
|
||||
The new document for RTG2 ("Swift ObjN") is indexed against the alias "RTG2-sync".
|
||||
This document ends up in "Index2".
|
||||
|
||||
Searchlight has not yet acknowledged the Nova re-sync as being completed.
|
||||
Therefore "RTG1-query" has not been updated yet.
|
||||
|
||||
Figure 3 shows the current state of Searchlight.
|
||||
|
||||
A Glance search will be made against "RTG1-query". Going to "Index1" it will
|
||||
return "Glance ObjA", "Glance ObjB", "Nova ObjC", "Nova ObjD", "Glance ObjG",
|
||||
"Glance ObjH", "Nova ObjI", "Glance ObjL" and "Nova ObjM". A Swift search will
|
||||
be made against "RTG2-query". Going to "index2" it will return "Swift ObjE",
|
||||
"Swift ObjF", "Swift ObjJ", "Swift ObjK" and "Swift ObjN".
|
||||
|
||||
Figure 4: RTG1 Re-Sync Complete
|
||||
|
||||
.. image:: ../../images/ZeroFig4.png
|
||||
|
||||
All of the changes from Image 3 are highlighted in red.
|
||||
|
||||
All resource types within RTG1 have finished re-syncing. Searchlight will now
|
||||
update the RTG1 search alias "RTG1-query". The alias "RTG1-query" will now be
|
||||
associated with index "Index3". After updated the RTG1 search alias,
|
||||
Searchlight will update the RTG1 plug-in listener alias "RTG1-sync". The alias
|
||||
"RTG1-sync" will now be associated with the index "Index3".
|
||||
|
||||
The alias updates need to happen in this order to handle the corner case of a
|
||||
new RTG1 document being indexed while the aliases are being modified. If we
|
||||
modified the RTG1 plug-in listener alias first a new document would be indexed
|
||||
to index "Index3" only. But a search will still go to index "Index1", thus
|
||||
missing the newly indexed document.
|
||||
|
||||
Figure 4 shows the current state of Searchlight.
|
||||
|
||||
A Glance search will be made against "RTG1-query". Going to "Index3" it will
|
||||
return "Glance ObjA", "Glance ObjB", "Nova ObjC", "Nova ObjD", "Glance ObjG",
|
||||
"Glance ObjH", "Nova ObjI", "Glance ObjL" and "Nova ObjM". A Swift search will
|
||||
be made against "RTG2-query". Going to "index2" it will return "Swift ObjE",
|
||||
"Swift ObjF", "Swift ObjJ", "Swift ObjK" and "Swift ObjN".
|
||||
|
||||
The internal Searchlight state is correct, coherent and ready to continue.
|
||||
Sometime in the future we will be able to delete Index1 completely.
|
||||
|
||||
Implementation Notes
|
||||
--------------------
|
||||
|
||||
Implementation Note #1: Multiple Indexes
|
||||
----------------------------------------
|
||||
|
||||
Upon careful review of the ES alias documentation [2], there is this warning
|
||||
lurking in the shadows: "It is an error to index to an alias which points to
|
||||
more than one index." Yikes. Now the simple solution of adding additional
|
||||
indexes to an alias and having the re-indexing just work, will not work.
|
||||
ElasticSearch will through an "ElasticsearchIllegalArgument" exception and
|
||||
return a 400 (Bad Request).
|
||||
|
||||
The plug-ins will need to be aware of this exception and react to it.
|
||||
Through experimentation, ElasticSearch will return this error: ::
|
||||
|
||||
{"error":"ElasticsearchIllegalArgumentException[Alias [test-alias] has more than one indices associated with it [[test-2, test-1]], can't execute a single index op]","status":400}
|
||||
|
||||
From this error message, we have the actual indexes. After extracting
|
||||
the names of the indexes, the plug-ins will be able to complete the
|
||||
task. The plug-in will now index iterating on each real index, instead
|
||||
of using the alias. This case applies only to the case where there are
|
||||
multiple indexes in an alias (i.e. the re-syncing case). When not
|
||||
re-syncing, the plug-in will not receive this exception.
|
||||
|
||||
We need to be careful when parsing the error message. This is a potential
|
||||
hazardous area if the error message ever changes. The catching of the
|
||||
exception and parsing of the message should be as flexible as possible.
|
||||
|
||||
Implementation Note #2: Incompatible Changes
|
||||
--------------------------------------------
|
||||
|
||||
A corner case in the rationale for triggering a re-index needs to be
|
||||
addressed. Sometimes an incompatible change between indexes has occurred.
|
||||
For example a new plug-in has been added or the documents from the
|
||||
service of changed in an incompatible way (different ElasticSearch mapping).
|
||||
In any of these cases we need to be able to handle the changes and
|
||||
roll them out seamlessly.
|
||||
|
||||
Some possible options to handle these cases would include:
|
||||
* Disable re-indexing into the old index.
|
||||
* Run two listeners, one understanding the old index and the other
|
||||
understanding the new index.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Alternate #1
|
||||
------------
|
||||
|
||||
An alternate usage scenario would look like the following:
|
||||
|
||||
Queries to v1/search/plugins would change so that the index listed for each type would
|
||||
actually be the alias (the API user won't know this).
|
||||
|
||||
The searchlight-manage index sync CLI will change to support the following capabilities:
|
||||
|
||||
* Re-index the current index without migrating the alias (no change from 0.1.0).
|
||||
* Delete the specified or current index for a specific type.
|
||||
* Create a new index for specified resource types.
|
||||
* Specified name or autogenerated name using a post-fix numbering pattern.
|
||||
* Contact and stop all listeners from processing incoming notifications for specified types.
|
||||
* Switch alias automatically when complete (default - no ?).
|
||||
* Delete old index automatically when complete (default - no?).
|
||||
* Contact and start all listeners to process incoming notification for specified types.
|
||||
* Switch alias on demand to new index(es).
|
||||
|
||||
All of the above must account for 1 ... * indexes for a single alias.
|
||||
|
||||
All listener processes must now support a management API for them to stop
|
||||
notification processing for specified resource types. Without this ability,
|
||||
there will remain a race condition for populating a new index. For example,
|
||||
if it takes N seconds to populate all Nova server instances, there will be a
|
||||
delay in time from when the original request for data to Nova was sent and
|
||||
when any updates to the data happened. Therefore, notification should be
|
||||
disabled while a new index is being populated and then turned back on.
|
||||
|
||||
Alternate #2
|
||||
------------
|
||||
|
||||
This alternate explores a way to avoid the "multiple indexes in a single
|
||||
alias while indexing" exception as described in the "Implementation Notes"
|
||||
subsection.
|
||||
|
||||
The idea is that instead of having two indexes in the Sync alias and one index
|
||||
in the search alias, we invert the index usage in the aliases. Now we consider
|
||||
adding multiple indexes to the search alias while leaving a single index in
|
||||
the sync alias.
|
||||
|
||||
When we start a re-sync, we create a new index. We update the sync alias to point
|
||||
to this new index, replacing the old index. Since there is only a single index
|
||||
in the sync alias, we will not get the ElasticsearchIllegalArgument exception.
|
||||
We also add the new index to search alias.
|
||||
|
||||
At this point, the search alias contains just the new index while the search
|
||||
alias contains both the old and new index. When a search occurs it will
|
||||
find old documents as well as any new documents.
|
||||
|
||||
The main issue with this alternative is that the search will find a lot of
|
||||
duplicates while the re-sync is occurring. All of the documents in the old index
|
||||
will eventually be added to the new index. In order to be usable, we would
|
||||
need to figure out a way to filter out these duplicates. The initial
|
||||
investigation into filtering ideas led to solutions that were deemed to
|
||||
fragile and defect prone. Hence the inclusion of this idea at the
|
||||
bottom of the alternate proposals.
|
||||
|
||||
Future Enhancements
|
||||
-------------------
|
||||
|
||||
Optimizations:
|
||||
|
||||
* Use the ElasticSearch index sync functionality instead of having each Resource Type
|
||||
do a manual re-index. ElasticSearch does not have a native re-sync command, but it
|
||||
can be accomplished using "scan and scroll" with the ElasticSearch Bulk API. [5]
|
||||
This optimization needs to be carefully considered. It would only be performed
|
||||
when we are absolutely sure that the old ElasticSearch index is coherent and complete.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[1] The concept of aliases is described in depth here:
|
||||
https://www.elastic.co/guide/en/elasticsearch/guide/current/index-aliases.html
|
||||
|
||||
[2] How ES treats an alias is described here:
|
||||
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-aliases.html
|
||||
|
||||
[3] All searchlight index updates should ensure latest before updating any document
|
||||
https://bugs.launchpad.net/searchlight/+bug/1522271
|
||||
|
||||
[4] ES deletion journal blueprint:
|
||||
https://blueprints.launchpad.net/searchlight/+spec/es-deletion-journal
|
||||
|
||||
[5] ES scan and scroll is discussed here:
|
||||
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
|
||||
ES Bulk API is discussed here:
|
||||
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
|
Loading…
Reference in New Issue