18 Commits

Author SHA1 Message Date
Dave Borowitz
86d070f83e Eliminate potential NPEs with ChangeData.getChange()
Callers were supposed to be calling hasChange() to determine whether
getChange() would return null, but none were. Remove this error-prone
pair of methods and just use change(), which includes lazy loading.
There are a few places where we need to catch an additional
OrmException but these are easy enough to fix.

Change-Id: I23335e362715f59e2c093ffec88427739ff0cffc
2014-01-03 09:10:45 -08:00
Dave Borowitz
569f2516db Use Lucene's QueryBuilder for building full-text queries
FuzzyQuery was a mistake; it uses edit distance to find terms in the
index close to the provided search term. This produces bizarre results
for queries like "message:1234".

Instead, use Lucene's QueryBuilder with an analyzer to convert a
full-text search word/phrase into a phrase query.

Add some tests for full-text matching behavior on numbers, which
should hopefully not be too dependent on specific Lucene behavior.
Coincidentally, a copy-paste error in the byMessageExact test
prevented this poor behavior from showing up in tests sooner.

Change-Id: I384f74f1455d0433433a27f880204ac8ecbf93da
2013-12-30 08:50:25 -08:00
Dave Borowitz
7547233449 Stop passing ReviewDb, etc. to ChangeData methods
Many methods that are intended for lazy initialization take a
ReviewDb, and some take other helper objects such as
GitRepositoryManager. This achieves the intended lazy-loading behavior
but requires a lot of injection of dependencies into callers, and may
become untenable as lazy loading of ChangeData requires more helpers.

Instead, since after all this is Java, construct ChangeData with a
Factory. This is subtle because it is used from a lot of places that
need control over which ReviewDb handle to pass in, so the factory
methods still need to take a db. ChangeData objects are still intended
(i.e. safe) for use only by one thread, so having a single ReviewDb
instance stored for the lifetime of the ChangeData should be fine. We
just need to be careful about which we pass in, particularly in a
place like ChangeIndexer.

As a side effect, clean up some other injection, particularly in the
predicate class hierarchy.

Change-Id: I52e1eb2a76788c12dd95767e89095ab80df7e1cc
2013-12-20 15:54:18 -08:00
Dave Borowitz
7a2d8d4e61 Fix last line of license headers
This was due to a copy/paste error into my editor's snippet library,
and it managed to sneak into 20 classes without anybody noticing.

Change-Id: Ifeea9ab5737acbed1b93519b65709f1c0f48684b
2013-10-14 17:40:17 +00:00
Dave Borowitz
2ae0171598 Log and skip indexed fields that produce an error
Change-Id: I40b8598b3e857faec7269bc380ac5ecbb78e2df7
2013-10-07 17:35:33 -07:00
Dave Borowitz
fc1bba54fd Don't return futures from ChangeIndex methods
In I1b3c5ba0 ChangeIndex.replace was modified to return a future, but
ChangeIndexer.Task was not changed to call those futures before
returning.

In retrospect this was a mistake anyway. The only path by which these
methods are currently called is from ChangeIndexerImpl, which turns
around and wraps them in a Callable and submits them to the same
executor, tying up twice as many threads as necessary.

Similarly, to avoid additional fanout in ChangeIndexerImpl, we should
not write to multiple index versions in parallel, so remove the TODO
to try that. The current behavior slows down individual writes for the
relatively short period of an online version upgrade, but that is
acceptable given the improved thread utilization overall.

Change-Id: I5fff470214ecd69a261dec1c10b9f7bdd0a1907e
2013-10-02 12:00:42 -07:00
David Pursehouse
71c198d662 Remove unnecessary @SuppressWarnings("deprecation") in SolrChangeIndex
Change-Id: Ie14819c8df33bcf76ed48ff6c4ffc389b1e217e7
2013-09-26 11:27:51 +09:00
Dave Borowitz
2e0f89d814 Index full sortkey field in secondary index
Using the sort key of the last element for pagination only works as
long as every sort key is unique. This is true for the general
definition of this field in the Change object, which includes the
unique change ID at the end of the hex string. Previously, we were
incorrectly truncating the change ID off, resulting in many changes in
the same sort key bucket and thus broken pagination.

Having two different definitions of sort key in the same running
server makes it a bit ugly to handle SortKeyPredicates, since the
definition of min/max value is now schema dependent, but at least we
can keep the same field name.

Don't @Deprecate the new SORTKEY field. We were originally hoping to
remove this field and depend only on the UPDATED field (with the
change ID as tiebreaker), but as long as this field is used for
pagination, we have to keep it around. This is because we can't be
sure a secondary index will be able to express a query like "(field X,
field Y) > (N, M)" to allow us to restart a query in the middle of an
UPDATED bucket. We may still decide to scrap the current pagination
system but that will probably not happen until we kill the SQL index
code.

Change-Id: Icb760dbacd01939e5e4936ef87165b6dddcacdc0
2013-09-24 12:48:21 -07:00
Dave Borowitz
8830bb53a4 Return last N results by sortkey when using sortkey_after
In the SQL index implementation, sortkey_after queries are always
rewritten to use one of the ChangeAccess.*Prev() methods, which order
results by sort key ascending. Update the Lucene implementations to
match this sort order. Specify in the ChangeIndex javadoc that this
sort order is required in such cases, since other code in
QueryProcessor depends on this assumption.

Change-Id: Ie35d2c6bd07963db6d33145a83493d12c6a4289d
2013-09-24 12:48:21 -07:00
Dave Borowitz
76e7b42490 Push limit term into secondary index queries
Change-Id: Id4ec5fe0375229ff8b82ec46fbe300c78098dbbe
2013-08-09 01:14:53 +00:00
Shawn Pearce
eb7193b169 Fix incorrect reviewedb dependency in solr
This should depend on -server not -client.

Change-Id: I44880c5ee03bbb9e1739c6365ae8cefbcc3cee6d
2013-07-30 00:46:15 -07:00
Dave Borowitz
2361cefe17 Factor out batch indexing logic into a ChangeBatchIndexer
Change-Id: Ief166fadc61f76a5f4ffdb879322e9883040bdd6
2013-07-08 13:22:54 -07:00
Dave Borowitz
d103b2b61c Scan index directory to determine Lucene versions
Mark index versions as "ready" when they are fully indexed and the
server is not running (i.e. from Reindex). By default, search from the
most recent ready index version, and write to both the most recent
ready version and (if different) the most recent known version.

At server startup, mark all versions except those we are about to
start writing to as not ready.

Change-Id: Icf42a3eb27b0445899300d60941cd701a8072d41
2013-07-08 13:22:54 -07:00
Dave Borowitz
97a01dbb05 Replace Reindex --dry-run with --output
The line of what was a "dry run" was a bit arbitrary, doing all the
Gerrit processing (including potential side effects like writing to
persistent caches) but not all the index-specific processing, which
could still be useful for evaluation or debugging purposes. Instead,
provide an output base so administrators can test the indexing
behavior without operating on production index data.

For Lucene, this flag is interpreted as a path to the index
directory; for Solr, it is a prefix added to the collection names.

Change-Id: Ic4808d1e3467a780a818ac7d1c89de13610da630
2013-07-08 13:22:54 -07:00
Dave Borowitz
1d2a998298 Bind index versions dynamically
We want to support a single index version for reading and multiple
index versions for writing. Wrap these in an IndexCollection with
getSearchIndex and getWriteIndexes methods. Still bind a single
implementation at startup.

Change-Id: Ibc4dbbeba064cde68f2585126d7708db5a2b3410
2013-07-08 13:22:54 -07:00
Dave Borowitz
0bd69febcb Rewrite schema versioning in preparation for multiple versions
Store indexes in versioned directories, and don't consider open/closed
to have separate versions.

Extract a simple base Schema class with a version and a list of
fields, which is analogous to the DB's SchemaVersion. Construct
indexes using specific versions.

Change-Id: I11e8e3a78cd62b1a01aa122bfd747ed5295fe057
2013-07-08 13:22:54 -07:00
Shawn Pearce
9c4720405b Don't run SQL rewrites when secondary index is enabled
If the secondary index is enabled on a server the index handles all
queries.  There is no reason to run the SQL rewrites or to leave
predicates in the predicate tree without being wrapped by the
IndexedChangeQuery predicate.

Change-Id: I18defed2a5a6003b9ae6ba227e29864b5c912e7b
2013-06-28 11:14:44 -07:00
Ahaan Ugale
404c8246bc Add secondary index implementation using SolrCloud
SolrCloud can be used instead of Lucene by adding "type = SOLR"
under [index] and "url = <zookeeper-url>" under [index "solr"]
in gerrit.config.

Change-Id: I0ff8579c5e23c58b16f3605bc20eba4e80fb40fc
2013-06-27 13:52:39 -07:00