An index implementation may not have an inherent upper bound to the
limit, but administrators may nonetheless want to restrict the maximum
query size at the server level for performance reasons.
Change-Id: Ic1f506e9857413ec860fe97785bff9e76b817db0
This allows the method to be used without having a dependency on the
gerrit-lucene module, which in turn has a dependency on a specific
version of Lucene.
Change-Id: I57e92c71260a51327ed706bd65f8f9e384e15666
Also convert all remaining references that expect to resolve site_path
to a File, notably LocalDiskRepositoryManager.
Change-Id: I5ef6f6e3d7b3fe0418a470b079b4cc04bc9be4f6
Implementations may not support returning up to Integer.MAX_VALUE
results, so allow this to be specified per implementation. This is
distinct from the configured per-user limits, as it represents a
fundamental limitation of the underlying index backend rather than an
administrative decision.
Encapsulate this in its own class that can be injected into
QueryProcessor, and use this in the limit computation.
Change-Id: I7d1093a2aa3800ff5437c6b1c78b12d73b17db05
We had lots of hard-coded logic to do slightly different things
depending on which version of a particular field was present in a
schema. For every field that has now been completely removed, we can
remove references including these bits of extra logic.
The difference is especially striking in the sortkey query path. Note
that queries containing sortkeys have already been failing since
servers upgraded to schema version 8 (2.9), which no longer uses this
field.
Change-Id: Ib1fbc9f3aa45bf068b7869ca3aada40dbe4cb1bb
Since indexing is generally done after any modification operations,
using a deleted ChangeData when indexing a deleted change is just
asking for trouble; most ChangeData methods (reasonably) assume that
the change exists. Instead, just use the form of the method that
passes in the ID.
Change the method that takes an integer ID to take a Change.Id object.
In almost all cases we already have one of these handy, and this is
more consistent with public interfaces elsewhere in Gerrit.
A fortunate side effect is that async deletes should happen slightly
faster, since we no longer need to open a DB connection and produce a
ChangeData in the delete codepath.
Change-Id: I353522b52188de11353ca26a7ba90a2ca462e404
* stable-2.10:
Remove uneeded dependency in ChangesCollection
Delete a change from the index when it is not in the DB
Change-Id: I02a4773dd581c29704a7cc45c5225258d657f836
If for some reason the secondary index is out of date, i.e. the change
was deleted from the database but wasn't deleted from the secondary
index, it was impossible to re-index (remove) that change.
Add logic to automatically remove the change from the secondary index
if this change is requested through the Change REST API endpoint and
doesn't exist in the database.
If a user click on search result from a stale change, he will a get a
404 page and the change will be removed from the index.
To fix the problem without opening the change page, run a command like:
curl --user <user name> -X POST http://<host>:<port>/a/changes/<change id>/index
Issue: 2996
Change-Id: I1db5373e31585e99c5f45e05274d86d69b4f24e6
- Warn on empty statements, e.g. "for (;;);". These may be
typos and are easily replaced by "for (;;) {}" which is more
explicit.
- Warn on field hiding. This allows cleanup of many acceptance test
members, at the cost of a couple of renames and the occasional
suppression (when the field is in a public nested enum that shadows
a public constant).
- Warn on unnecessary casts.
- Warn on unused declared thrown exceptions. In addition to reducing
method signature length and number of imports, this also eliminated
some impossible catch blocks.
- Warn on missing @Override annotations.
- Warn on unused parameters. This is likely the most controversial,
as a few relatively common patterns require unused parameters in a
way that Eclipse can't ignore. However, it also resulted in cleanup
of a lot of unnecessary injections and method parameters, so I
think the cost was worth it.
Change-Id: I7224be8b1c798613a127c88507e8cce400679e5d
This reverts commit 5db3bba3e2d85662bacedc5d2f215fff7d8d2805.
I was too fast to submit 5db3bba. Besides forgetting to submit the
referenced changes in some submodules some people reported issues
with the interception of the javax.inject.Inject in some JEE containers.
Change-Id: I931ad329d2e7be6f6ce804b8395489a021c8240b
The Guice team is discouraging use of its specific annotation where
possible, to increase JSR330 compliance.
Leave optional injection alone for now, which needs to be manually
replaced with OptionalBinder.
Change-Id: I4f53a518ba6f36fd67af12f3540dc44cbad07ff8
The default changed to use the current version, with an optional call
to setVersion() if the version should be changed. In this case it
should not be.
Change-Id: Id87515185ccf7d640ad80dd2972e0d13617df8a3
* changes:
Solr index requires SolrCloud rather than plain Solr
Handle exceptions during reindexing
InitIndex: Allow user to configure the Solr index URL
Solr index URL should be configured under the main "index" section
Improve information message about rebuilding the index
We cannot guarantee that secondary index implementations (particularly
the one used by googlesource.com) can efficiently paginate based on
the sortkey, and in particular can both sort and reverse sort on the
same field. This particular performance issue notwithstanding,
searching is now generally fast enough that it is feasible just to
skip the first N results when doing pagination.
Add an option S= to QueryChanges to support starting at a nonzero
offset. Note that we still have to fetch n+S results from the index in
order to do visibility filtering, since if we skipped at the index
layer we wouldn't know how many of the skipped elements would have
matched later filtering.
Drop the sortkey token suffix from the legacy anchor parser; there is
no reliable way to convert it to an offset, and it's unlikely that
users have permalinks to specific sortkey values.
On the server side, remove the sortkey field from the current index
version, and use pagination by offset instead of sortkey in the new
version only.
Continue to support sortkey queries against old index versions, to
support online reindexing while clients have an older JS version.
Change-Id: I6a82965db02c4d534e2107ca6ec91217085124d6
Configuration of the Solr index URL must currently be done in a
subsection, as follows:
[index]
type = SOLR
[index "solr"]
url = 127.0.0.1
Subsections are also being used for configuration of specific indexes
like the "changes_open" and "changes_closed" subsections that are used
for Lucene indexes, and are documented as "index.name.xyz".
Having the Solr URL configration under a similar subsection may cause
confusion, and it would be more intuitive to configure the URL under
the main index section, like:
[index]
type = SOLR
url = 127.0.0.1
Subsections are being used for configuration of specific indexes like
the "changes_open" and "changes_closed" settings that can be specified
for Lucene indexes which are documented as "index.name.xyz"
Also add missing documentation of the URL parameter.
Change-Id: I0bd9eabbdbfa29241688c738aa5d213e726c90b0
Callers were supposed to be calling hasChange() to determine whether
getChange() would return null, but none were. Remove this error-prone
pair of methods and just use change(), which includes lazy loading.
There are a few places where we need to catch an additional
OrmException but these are easy enough to fix.
Change-Id: I23335e362715f59e2c093ffec88427739ff0cffc
FuzzyQuery was a mistake; it uses edit distance to find terms in the
index close to the provided search term. This produces bizarre results
for queries like "message:1234".
Instead, use Lucene's QueryBuilder with an analyzer to convert a
full-text search word/phrase into a phrase query.
Add some tests for full-text matching behavior on numbers, which
should hopefully not be too dependent on specific Lucene behavior.
Coincidentally, a copy-paste error in the byMessageExact test
prevented this poor behavior from showing up in tests sooner.
Change-Id: I384f74f1455d0433433a27f880204ac8ecbf93da
Many methods that are intended for lazy initialization take a
ReviewDb, and some take other helper objects such as
GitRepositoryManager. This achieves the intended lazy-loading behavior
but requires a lot of injection of dependencies into callers, and may
become untenable as lazy loading of ChangeData requires more helpers.
Instead, since after all this is Java, construct ChangeData with a
Factory. This is subtle because it is used from a lot of places that
need control over which ReviewDb handle to pass in, so the factory
methods still need to take a db. ChangeData objects are still intended
(i.e. safe) for use only by one thread, so having a single ReviewDb
instance stored for the lifetime of the ChangeData should be fine. We
just need to be careful about which we pass in, particularly in a
place like ChangeIndexer.
As a side effect, clean up some other injection, particularly in the
predicate class hierarchy.
Change-Id: I52e1eb2a76788c12dd95767e89095ab80df7e1cc
This was due to a copy/paste error into my editor's snippet library,
and it managed to sneak into 20 classes without anybody noticing.
Change-Id: Ifeea9ab5737acbed1b93519b65709f1c0f48684b
In I1b3c5ba0 ChangeIndex.replace was modified to return a future, but
ChangeIndexer.Task was not changed to call those futures before
returning.
In retrospect this was a mistake anyway. The only path by which these
methods are currently called is from ChangeIndexerImpl, which turns
around and wraps them in a Callable and submits them to the same
executor, tying up twice as many threads as necessary.
Similarly, to avoid additional fanout in ChangeIndexerImpl, we should
not write to multiple index versions in parallel, so remove the TODO
to try that. The current behavior slows down individual writes for the
relatively short period of an online version upgrade, but that is
acceptable given the improved thread utilization overall.
Change-Id: I5fff470214ecd69a261dec1c10b9f7bdd0a1907e
Using the sort key of the last element for pagination only works as
long as every sort key is unique. This is true for the general
definition of this field in the Change object, which includes the
unique change ID at the end of the hex string. Previously, we were
incorrectly truncating the change ID off, resulting in many changes in
the same sort key bucket and thus broken pagination.
Having two different definitions of sort key in the same running
server makes it a bit ugly to handle SortKeyPredicates, since the
definition of min/max value is now schema dependent, but at least we
can keep the same field name.
Don't @Deprecate the new SORTKEY field. We were originally hoping to
remove this field and depend only on the UPDATED field (with the
change ID as tiebreaker), but as long as this field is used for
pagination, we have to keep it around. This is because we can't be
sure a secondary index will be able to express a query like "(field X,
field Y) > (N, M)" to allow us to restart a query in the middle of an
UPDATED bucket. We may still decide to scrap the current pagination
system but that will probably not happen until we kill the SQL index
code.
Change-Id: Icb760dbacd01939e5e4936ef87165b6dddcacdc0
In the SQL index implementation, sortkey_after queries are always
rewritten to use one of the ChangeAccess.*Prev() methods, which order
results by sort key ascending. Update the Lucene implementations to
match this sort order. Specify in the ChangeIndex javadoc that this
sort order is required in such cases, since other code in
QueryProcessor depends on this assumption.
Change-Id: Ie35d2c6bd07963db6d33145a83493d12c6a4289d
Mark index versions as "ready" when they are fully indexed and the
server is not running (i.e. from Reindex). By default, search from the
most recent ready index version, and write to both the most recent
ready version and (if different) the most recent known version.
At server startup, mark all versions except those we are about to
start writing to as not ready.
Change-Id: Icf42a3eb27b0445899300d60941cd701a8072d41
The line of what was a "dry run" was a bit arbitrary, doing all the
Gerrit processing (including potential side effects like writing to
persistent caches) but not all the index-specific processing, which
could still be useful for evaluation or debugging purposes. Instead,
provide an output base so administrators can test the indexing
behavior without operating on production index data.
For Lucene, this flag is interpreted as a path to the index
directory; for Solr, it is a prefix added to the collection names.
Change-Id: Ic4808d1e3467a780a818ac7d1c89de13610da630
We want to support a single index version for reading and multiple
index versions for writing. Wrap these in an IndexCollection with
getSearchIndex and getWriteIndexes methods. Still bind a single
implementation at startup.
Change-Id: Ibc4dbbeba064cde68f2585126d7708db5a2b3410
Store indexes in versioned directories, and don't consider open/closed
to have separate versions.
Extract a simple base Schema class with a version and a list of
fields, which is analogous to the DB's SchemaVersion. Construct
indexes using specific versions.
Change-Id: I11e8e3a78cd62b1a01aa122bfd747ed5295fe057
If the secondary index is enabled on a server the index handles all
queries. There is no reason to run the SQL rewrites or to leave
predicates in the predicate tree without being wrapped by the
IndexedChangeQuery predicate.
Change-Id: I18defed2a5a6003b9ae6ba227e29864b5c912e7b
SolrCloud can be used instead of Lucene by adding "type = SOLR"
under [index] and "url = <zookeeper-url>" under [index "solr"]
in gerrit.config.
Change-Id: I0ff8579c5e23c58b16f3605bc20eba4e80fb40fc