18 KiB
Authoring Searchlight Plugins
At a bare minimum, a plugin must consist of an elasticsearch mapping, and a method by which it can provide data to be indexed. Many plugins also require a way to receive updates in order to keep the index up to date. For Openstack resources, typically the service API is used for initial indexing and notifications are received via oslo.messaging.
This documentation will use as an example the Neutron network plugin as a reasonably complete and complex example.
Getting some data
The very first thing you should do is figure out exactly what you're trying to index. When I've developed plugins I've found it helpful to generate test data both for initial indexing and for notifications.
Initial indexing
In the case of neutron networks, the initial data will come from
neutronclient
. Some browsing of the API documentation
reveals that the call I want is list_networks
:
import json
import os
from keystoneclient.auth.identity import v2
from keystoneclient import session
from neutronclient.v2_0 import client as nc_20
def get_session():
username = os.environ['OS_USERNAME']
password = os.environ['OS_PASSWORD']
auth_url = os.environ['OS_AUTH_URL']
tenant_name = os.environ['OS_TENANT_NAME']
auth = v2.Password(**locals())
return session.Session(auth=auth)
nc = nc_20.Client(session=get_session())
networks = nc.list_networks()
print(json.dumps(networks, indent=4, sort_keys=True))
This outputs:
{
"networks": [
{
"admin_state_up": true,
"availability_zone_hints": [],
"availability_zones": [
"nova"
],
"created_at": "2016-04-08T16:44:17",
"description": "",
"id": "4d73d257-35d5-4f4e-bc71-f7f629f21904",
"ipv4_address_scope": null,
"ipv6_address_scope": null,
"is_default": true,
"mtu": 1450,
"name": "public",
"port_security_enabled": true,
"provider:network_type": "vxlan",
"provider:physical_network": null,
"provider:segmentation_id": 1053,
"router:external": true,
"shared": false,
"status": "ACTIVE",
"subnets": [
"abcc5896-4844-4870-a5d8-6ae4b8edd42e",
"ea47304e-bd54-4337-901a-1eb5196ea18e"
],
"tags": [],
"tenant_id": "fa1537e9bda9405891d004ef9c08d0d1",
"updated_at": "2016-04-08T16:44:17"
}
]
}
Since that's the output from neutron client, that's what should go in
searchlight/tests/functional/data/load/networks.json
,
though you might also want more examples to test different things.
Notifications
Openstack documents some of the notifications
sent by some services. It's also possible to eavesdrop on notifications
sent by running services. Taking neutron as an example (though all
services are slightly different), we can make it output notifications by
editing /etc/neutron/neutron.conf
and adding under the
[DEFAULT]
section:
notification_driver = messaging
There are then two ways to configure the service to send notifications that Searchlight can receive. The recommended method is to use notification pools, touched on in the messaging documentation.
Notification pools
A notification messaging pool allows additional listeners to receive
messages on an existing topic. By default, Openstack services send
notification messages to an oslo.messaging 'topic' named notifications. To view these notifications
while still allowing searchlight-listener
or Ceilometer's
agent to continue to recieve them, you may use the utility script in
test-scripts/listener.py
:
. ~/devstack/openrc admin admin
# If your rabbitmq user/pass are not the same as for devstack, you
# can set RABBIT_PASSWORD and/or RABBIT_USER
./test-scripts/listener.py neutron test-notifications
Adding a separate topic
In the same config file (/etc/neutron/neutron.conf
) the
following line (again, under the [DEFAULT]
section) will
cause neutron to output notifications to a topic named
searchlight_indexer
:
notification_topics = searchlight_indexer
Note
searchlight-listener
also listens on the
searchlight_indexer
topic, so if you have
searchlight-listener
running, it will receive and process
some or all of the notifications you're trying to look at. Thus, you
should either stop the searchlight-listener
or add another
topic (comma-separated) for the specific notifications you want to see.
For example:
notification_topics = searchlight_indexer,my_test_topic
After restarting the q-svc
service notifications will be
output to the message bus (rabbitmq by default). They can be viewed in
any RMQ management tool; there is also a utility script in
test-scripts/listener.py
that will listen for
notifications:
. ~/devstack/openrc admin admin
# If your rabbitmq user/pass are not the same as for devstack, you
# can set RABBIT_PASSWORD and/or RABBIT_USER
./test-scripts/listener.py neutron
Note
If you added a custom topic as described above, you'll need to edit
listener.py
to use your custom topic:
# Change this line
topic = 'searchlight_indexer'
# to
topic = 'my_test_topic'
Using the results
Issuing various commands (neutron net-create
,
neutron net-update
, neutron net-delete
) will
cause listener.py
to receive notifications. Usually the
notifications with event_type
ending .end
are
the ones of most interest (many fields omitted for brevity):
{"event_type": "network.update.end",
"payload": {
"network": {
"status": "ACTIVE",
"router:external": false,
"subnets": ["9b6094de-18cb-46e1-8d51-e303ff844c86",
"face0b47-40d3-45c0-9b62-5f05311710f5",
"7b7bdf5f-8f22-44a3-bec3-1daa78df83c5"],
"updated_at": "2016-05-03T19:05:38",
"tenant_id": "34518c16d95e40a19b1a95c1916d8335",
"id": "abf3a939-4daf-4d05-8395-3ec735aa89fc", "name": "private"}
},
"publisher_id": "network.devstack",
"ctxt": {
"read_only": false,
"domain": null,
"project_name": "demo",
"user_id": "c714917a458e428fa5dc9b1b8aa0d4d6"
},
"metadata": {
"timestamp": "2016-05-03 19:05:38.258273",
"message_id": "ec9ac6a1-aa17-4ee3-aa6e-ab48c1fb81a8"
}
}
The entire message can go into
searchlight/tests/functional/data/events/network.json
. The
payload
(in addition to the API response) will inform the
mapping that should be applied for a given plugin.
File structure
Plugins live in searchlight/elasticsearch/plugins
. We
have tended to create a subpackage named after the service
(neutron
) and within it a module named after the resource
type (networks.py
). Notification handlers can be in a file
specific to each resource type but can also be in a single file together
(existing ones use notification_handlers.py
).
networks.py
contains a class named
NetworkIndex
that implements the base class
IndexBase
found in
searchlight.elasticsearch.plugins.base
.
Note
If there are plugins for multiple resources within the same Openstack service (for example, Glance images and meta definitions) those plugins can exist in the same subpackage ('glance') in different modules, each implementing an IndexBase.
Enabling plugins
Searchlight plugins are loaded by Stevedore. In
order for a plugin to be enabled for indexing and searching, it's
necessary to add an entry to the entry_points
list in
Searchlight's configuration in setup.cfg
. The name should
be the plugin resource name (typically the name used to represent it in
Heat):
[entry_points]
searchlight.index_backend =
os_neutron_net = searchlight.elasticsearch.plugins.neutron.networks:NetworkIndex
Note
After modifying entrypoints, you'll need to reinstall the searchlight
package to register them (you may need to activate your virtual
environment; see Installation Instructions
):
python setup.py develop
Writing some code
At this point you're probably about ready to start filling in the code. My usual approach is to create the unit test file first, and copy some of the more boilerplate functionality from one of the other plugins.
You can run an individual test file with:
tox -epy34 searchlight.tests.unit.<your test module>
This has the advantage of running just your tests and executing them very quickly. It can be easier to start from a full set of failing unit tests and build up the actual code from there. Functional tests I've tended to add later. Again, you can run an individual functional test file:
tox -epy34 searchlight.tests.functional.<your test module>
Required plugin functions
This section describes some of the functionality from
IndexBase
you will need to override.
Document type
As a convention, plugins define their document type (which will map to an ElasticSearch document type) as the resource name Heat uses to identify it:
@classmethod
def get_document_type(self):
return "OS::Neutron::Net"
Retrieving object for initial indexing
Plugins must implement get_objects
which in many cases
will go to the API of the service it's indexing. It should return an
iterable that will be passed to a function (also required) named
serialize
, which in turn must return a dictionary suitable
for Elasticsearch to index. In the example for Neutron networks, this
would be a call to list_networks
on an instance of
neutronclient
:
def get_objects(self):
"""Generator that lists all networks owned by all tenants."""
# Neutronclient handles pagination itself; list_networks is a generator
neutron_client = openstack_clients.get_neutronclient()
for network in neutron_client.list_networks()['networks']:
yield network
Mapping
get_mapping
is also required. It must return a
dictionary that tells Elasticsearch how to map documents for the plugin
(see the documentation for mapping).
At a minimum a plugin should define an id
field and an
updated_at
field because consumers will generally rely on
those being present; a name
field is highly advisable. If
the resource doesn"t contain these values your serialize
function can map to them. In particular, if your resource does not have
a native id
value, you must override
get_document_id_field
so that the indexing code can
retrieve the correct value when indexing.
It is worth understanding how Elasticsearch indexes various field types, particularly strings. String fields are typically broken down into tokens to allow searching:
"The quick brown fox" -> ["The", "quick", "brown", "fox"]
This works well for full-text type documents but less well, for example, for UUIDS:
"aaab-bbbb-55555555" -> ["aaab", "bbbb", "55555555"]
In the second example, a search for the full UUID will not match. As
a result, we tend to mark these kinds of fields as
not_analyzed
as with the example to follow.
Where field types are not specified, Elasticsearch will make a best guess from the first document that's indexed.
Some notes (expressed below as comments starting with #):
{
# This allows indexing of fields not specified in the mapping doc
"dynamic": true,
"properties": {
# not_analyzed is important for id fields; it prevents Elasticsearch
# tokenizing the field, allowing for exact matches
"id": {"type": "string", "index": "not_analyzed"},
# This allows name to be tokenized for searching, but Searchlight will
# attempt to use the 'raw' (untokenized) field for sorting which gives
# more consistent results
"name": {
"type": "string",
"fields": {
"raw": {"type": "string", "index": "not_analyzed"}
}
}
}
}
If you are mapping a field which is a reference id to other plugin
type, you should add a _meta mapping for that field. This will enable
Searchlight(SL) to provide more information to CLI/UI. The reference id
and the plugin resource type can be used by CLI/UI to issue a
GET
request to fetch more information from SL. See below
for an example on nova server plugin mapping:
def get_mapping(self):
return {
'dynamic': True,
'properties': {
'id': {'type': 'string', 'index': 'not_analyzed'},
'name': {
'type': 'string',
'fields': {
'raw': {'type': 'string', 'index': 'not_analyzed'}
}
}
'image': {
'type': 'nested',
'properties': {
'id': {'type': 'string', 'index': 'not_analyzed'}
}
}
},
"_meta": {
"image.id": {
"resource_type": resource_types.GLANCE_IMAGE
}
},
}
Note
Parent plugin id field(when available) is automatically linked to the parent resource type.
Doc values
For many field types Searchlight will alter the mapping to change the
format in which field data is stored. Prior to Elasticsearch 2.x field
values by default were stored in 'fielddata' format, which could result
in high memory usage under some sort and aggregation operations. An
alternative format, called doc_values
trades slightly
increased disk usage for better memory efficiency. In Elasticsearch 2.x
doc_values
is the default, and Searchlight uses this option
as the default regardless of Elasticsearch version. For more information
see the Elasticsearch documentation.
Generally this default will be fine. However, there are several ways in which the default can be overriden:
Globally in plugin configuration; in
searchlight.conf
:[resource_plugin] mapping_use_doc_values = false
For an individual plugin in
searchlight.conf
:[resource_plugin:os_neutron_net] mapping_use_doc_values = false
For a plugin's entire mapping; in code, override the
mapping_use_doc_values
property (and thus ignoring any configuration property):@property def mapping_use_doc_values(self): return False
For individual fields in a mapping, by setting
doc_values
to False:{ "properties": { "some_field": {"type": "date", "doc_values": False} } }
Access control
Plugins must define how they are access controlled. Typically this is a restriction matching the user's project/tenant:
def _get_rbac_field_filters(self, request_context):
return [
{'term': {'tenant_id': request_context.owner}}
]
Any filters listed will be applied to queries against the plugin's
document type. Administrative users can specify
all_projects
in searches to bypass these filters. This
default behavior can be overridden for a plugin by setting the
allow_admin_ignore_rbac
property to False
on
the plugin (currently only in code). all_projects
will be
ignore for that plugin.
Faceting
Any fields defined in the mapping document are eligible to be
identified as facets, which allows a UI to let users search on specific
fields. Many plugins define facets_excluded
which exclude
specified fields. Many also define facets_with_options
which should return fields with low cardinality where it makes sense to
return valid options for those fields.
Protected fields
admin_only_fields
determines fields which only
administrators should be able to see or search. For instance, this will
mark any fields beginning with provider:
as well as any
defined in the plugin configuration:
@property
def admin_only_fields(self):
from_conf = super(NetworkIndex, self).admin_only_fields
return ['provider:*'] + from_conf
These fields end up getting indexed in separate admin-only documents.
Parent/child relationships
In some cases there is a strong ownership implied between plugins. In
these cases the child plugin can define parent_plugin_type
and get_parent_id_field
(which determines a field on the
child that refers to its parent). See the Neutron Port
plugin for an example.
Remember that Elasticsearch is not a relational database and it doesn't do joins, per se, but this linkage does allow running queries referencing children (or parents).