searchlight/doc/source/authoring-plugins.rst

..
      Copyright 2016 Hewlett-Packard Enterprise Development Company, L.P.
      All Rights Reserved.

      Licensed under the Apache License, Version 2.0 (the "License"); you may
      not use this file except in compliance with the License. You may obtain
      a copy of the License at

          http://www.apache.org/licenses/LICENSE-2.0

      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
      License for the specific language governing permissions and limitations
      under the License.

.. _searchlight-plugin-authoring:

Authoring Searchlight Plugins
=============================

At a bare minimum, a plugin must consist of an elasticsearch mapping, and a
method by which it can provide data to be indexed. Many plugins also require a
way to receive updates in order to keep the index up to date. For Openstack
resources, typically the service API is used for initial indexing and
notifications are received via oslo.messaging.

This documentation will use as an example the Neutron network plugin as a
reasonably complete and complex example.

Getting some data
-----------------
The very first thing you should do is figure out exactly what you're trying to
index. When I've developed plugins I've found it helpful to generate test data
both for initial indexing and for notifications.

Initial indexing
^^^^^^^^^^^^^^^^
In the case of neutron networks, the initial data will come from
``neutronclient``. Some browsing of the API documentation reveals that the
call I want is ``list_networks``::

    import json
    import os

    from keystoneclient.auth.identity import v2
    from keystoneclient import session
    from neutronclient.v2_0 import client as nc_20

    def get_session():
        username = os.environ['OS_USERNAME']
        password = os.environ['OS_PASSWORD']
        auth_url = os.environ['OS_AUTH_URL']
        tenant_name = os.environ['OS_TENANT_NAME']
        auth = v2.Password(**locals())
        return session.Session(auth=auth)


    nc = nc_20.Client(session=get_session())
    networks = nc.list_networks()

    print(json.dumps(networks, indent=4, sort_keys=True))

This outputs::

    {
        "networks": [
            {
                "admin_state_up": true,
                "availability_zone_hints": [],
                "availability_zones": [
                    "nova"
                ],
                "created_at": "2016-04-08T16:44:17",
                "description": "",
                "id": "4d73d257-35d5-4f4e-bc71-f7f629f21904",
                "ipv4_address_scope": null,
                "ipv6_address_scope": null,
                "is_default": true,
                "mtu": 1450,
                "name": "public",
                "port_security_enabled": true,
                "provider:network_type": "vxlan",
                "provider:physical_network": null,
                "provider:segmentation_id": 1053,
                "router:external": true,
                "shared": false,
                "status": "ACTIVE",
                "subnets": [
                    "abcc5896-4844-4870-a5d8-6ae4b8edd42e",
                    "ea47304e-bd54-4337-901a-1eb5196ea18e"
                ],
                "tags": [],
                "tenant_id": "fa1537e9bda9405891d004ef9c08d0d1",
                "updated_at": "2016-04-08T16:44:17"
            }
        ]
    }

Since that's the output from neutron client, that's what should go in
``searchlight/tests/functional/data/load/networks.json``, though you might
also want more examples to test different things.

Notifications
^^^^^^^^^^^^^
Openstack documents some of the notifications_ sent by some services. It's
also possible to eavesdrop on notifications sent by running services. Taking
neutron as an example (though all services are slightly different), we can
make it output notifications by editing ``/etc/neutron/neutron.conf`` and
adding under the ``[DEFAULT]`` section::

    notification_driver = messaging

There are then two ways to configure the service to send notifications that
Searchlight can receive. The recommended method is to use notification pools,
touched on in the `messaging documentation`_.

.. _`messaging documentation`: http://docs.openstack.org/developer/oslo.messaging/notification_listener.html

Notification pools
##################

A notification messaging pool allows additional listeners to receive
messages on an existing topic. By default, Openstack services send notification
messages to an oslo.messaging 'topic' named `notifications`. To view these
notifications while still allowing ``searchlight-listener`` or Ceilometer's
agent to continue to recieve them, you may use the utility script in
``test-scripts/listener.py``::

    . ~/devstack/openrc admin admin
    # If your rabbitmq user/pass are not the same as for devstack, you
    # can set RABBIT_PASSWORD and/or RABBIT_USER
    ./test-scripts/listener.py neutron test-notifications

Adding a separate topic
#######################

In the same config file (``/etc/neutron/neutron.conf``) the following line
(again, under the ``[DEFAULT]`` section) will cause neutron to output
notifications to a topic named ``searchlight_indexer``::

    notification_topics = searchlight_indexer

.. note::

    ``searchlight-listener`` also listens on the ``searchlight_indexer``
    topic, so if you have ``searchlight-listener`` running, it will receive
    and process some or all of the notifications you're trying to look at.
    Thus, you should either stop the ``searchlight-listener`` or add another
    topic (comma-separated) for the specific notifications you want to see.
    For example::

        notification_topics = searchlight_indexer,my_test_topic

After restarting the ``q-svc`` service notifications will be output to the
message bus (rabbitmq by default). They can be viewed in any RMQ management
tool; there is also a utility script in ``test-scripts/listener.py`` that
will listen for notifications::

    . ~/devstack/openrc admin admin
    # If your rabbitmq user/pass are not the same as for devstack, you
    # can set RABBIT_PASSWORD and/or RABBIT_USER
    ./test-scripts/listener.py neutron

.. note::

    If you added a custom topic as described above, you'll need to edit
    ``listener.py`` to use your custom topic::

        # Change this line
        topic = 'searchlight_indexer'
        # to
        topic = 'my_test_topic'

Using the results
#################

Issuing various commands (``neutron net-create``, ``neutron net-update``,
``neutron net-delete``) will cause ``listener.py`` to receive notifications.
Usually the notifications with ``event_type`` ending ``.end`` are the ones of
most interest (many fields omitted for brevity)::

    {"event_type": "network.update.end",
     "payload": {
       "network": {
         "status": "ACTIVE",
         "router:external": false,
         "subnets": ["9b6094de-18cb-46e1-8d51-e303ff844c86",
                     "face0b47-40d3-45c0-9b62-5f05311710f5",
                     "7b7bdf5f-8f22-44a3-bec3-1daa78df83c5"],
         "updated_at": "2016-05-03T19:05:38",
         "tenant_id": "34518c16d95e40a19b1a95c1916d8335",
         "id": "abf3a939-4daf-4d05-8395-3ec735aa89fc", "name": "private"}
      },
      "publisher_id": "network.devstack",
      "ctxt": {
        "read_only": false,
        "domain": null,
        "project_name": "demo",
        "user_id": "c714917a458e428fa5dc9b1b8aa0d4d6"
      },
      "metadata": {
        "timestamp": "2016-05-03 19:05:38.258273",
        "message_id": "ec9ac6a1-aa17-4ee3-aa6e-ab48c1fb81a8"
      }
    }

The entire message can go into
``searchlight/tests/functional/data/events/network.json``. The ``payload``
(in addition to the API response) will inform the mapping that should be
applied for a given plugin.

.. _notifications: https://wiki.openstack.org/wiki/SystemUsageData

File structure
--------------
Plugins live in ``searchlight/elasticsearch/plugins``. We have tended to create
a subpackage named after the service (``neutron``) and within it a module named
after the resource type (``networks.py``). Notification handlers can be in a file
specific to each resource type but can also be in a single file together
(existing ones use ``notification_handlers.py``).

``networks.py`` contains a class named ``NetworkIndex`` that implements the base
class ``IndexBase`` found in ``searchlight.elasticsearch.plugins.base``.

.. note::

    If there are plugins for multiple resources within the same Openstack
    service (for example, Glance images and meta definitions) those plugins
    can exist in the same subpackage ('glance') in different modules, each
    implementing an IndexBase.

Enabling plugins
----------------
Searchlight plugins are loaded by Stevedore_. In order for a plugin to be
enabled for indexing and searching, it's necessary to add an entry to the
``entry_points`` list in Searchlight's configuration in ``setup.cfg``. The
name should be the plugin resource name (typically the name used to represent
it in Heat_)::

    [entry_points]
    searchlight.index_backend =
        os_neutron_net = searchlight.elasticsearch.plugins.neutron.networks:NetworkIndex

.. note::

    After modifying entrypoints, you'll need to reinstall the searchlight
    package to register them (you may need to activate your virtual environment;
    see :ref:`Installation Instructions`)::

        python setup.py develop

.. _Stevedore: http://docs.openstack.org/developer/stevedore/
.. _Heat: http://docs.openstack.org/developer/heat/template_guide/openstack.html

Writing some code
-----------------
At this point you're probably about ready to start filling in the code. My
usual approach is to create the unit test file first, and copy some of the
more boilerplate functionality from one of the other plugins.

You can run an individual test file with::

    tox -epy34 searchlight.tests.unit.<your test module>

This has the advantage of running just your tests and executing them very
quickly. It can be easier to start from a full set of failing unit tests
and build up the actual code from there. Functional tests I've tended to add
later. Again, you can run an individual functional test file:

    tox -epy34 searchlight.tests.functional.<your test module>

Required plugin functions
-------------------------
This section describes some of the functionality from ``IndexBase`` you will
need to override.

Document type
^^^^^^^^^^^^^
As a convention, plugins define their document type (which will map to an
ElasticSearch document type) as the `resource name`_ Heat uses to identify it::

    @classmethod
    def get_document_type(self):
        return "OS::Neutron::Net"

.. _`resource_name`: http://docs.openstack.org/developer/heat/template_guide/openstack.html

Retrieving object for initial indexing
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Plugins must implement ``get_objects`` which in many cases will go to the
API of the service it's indexing. It should return an iterable that will be
passed to a function (also required) named ``serialize``, which in turn must
return a dictionary suitable for Elasticsearch to index. In the example for
Neutron networks, this would be a call to ``list_networks`` on an instance of
``neutronclient``::

    def get_objects(self):
        """Generator that lists all networks owned by all tenants."""
        # Neutronclient handles pagination itself; list_networks is a generator
        neutron_client = openstack_clients.get_neutronclient()
        for network in neutron_client.list_networks()['networks']:
            yield network

Mapping
^^^^^^^

``get_mapping`` is also required. It must return a dictionary that tells
Elasticsearch how to map documents for the plugin (see the documentation for
mapping_).

At a minimum a plugin should define an ``id`` field and an ``updated_at`` field
because consumers will generally rely on those being present; a ``name`` field
is highly advisable. If the resource doesn"t contain these values your
``serialize`` function can map to them. In particular, if your resource does
not have a native ``id`` value, you must override ``get_document_id_field``
so that the indexing code can retrieve the correct value when indexing.

It is worth understanding how Elasticsearch indexes various field types,
particularly strings. String fields are typically broken down into tokens to
allow searching::

  "The quick brown fox" -> ["The", "quick", "brown", "fox"]

This works well for full-text type documents but less well, for example,
for UUIDS::

  "aaab-bbbb-55555555" -> ["aaab", "bbbb", "55555555"]

In the second example, a search for the full UUID will not match. As a result,
we tend to mark these kinds of fields as ``not_analyzed`` as with the example
to follow.

Where field types are not specified, Elasticsearch will make a best guess from
the first document that's indexed.

Some notes (expressed below as comments starting with #)::

    {
      # This allows indexing of fields not specified in the mapping doc
      "dynamic": true,
      "properties": {

        # not_analyzed is important for id fields; it prevents Elasticsearch
        # tokenizing the field, allowing for exact matches
        "id": {"type": "string", "index": "not_analyzed"},

        # This allows name to be tokenized for searching, but Searchlight will
        # attempt to use the 'raw' (untokenized) field for sorting which gives
        # more consistent results
        "name": {
          "type": "string",
          "fields": {
            "raw": {"type": "string", "index": "not_analyzed"}
          }
        }
      }
    }


If you are mapping a field which is a reference id to other plugin type, you
should add a _meta mapping for that field. This will enable Searchlight(SL) to
provide more information to CLI/UI. The reference id and the plugin resource
type can be used by CLI/UI to issue a ``GET`` request to fetch more information
from SL. See below for an example on nova server plugin mapping::

  def get_mapping(self):
    return {
        'dynamic': True,
        'properties': {
            'id': {'type': 'string', 'index': 'not_analyzed'},
            'name': {
                'type': 'string',
                'fields': {
                    'raw': {'type': 'string', 'index': 'not_analyzed'}
                }
            }
            'image': {
                'type': 'nested',
                'properties': {
                    'id': {'type': 'string', 'index': 'not_analyzed'}
                }
            }
        },
        "_meta": {
            "image.id": {
                "resource_type": resource_types.GLANCE_IMAGE
            }
        },
    }

.. note:: Parent plugin id field(when available) is automatically linked to the
          parent resource type.

Doc values
**********
For many field types Searchlight will alter the mapping to change the format in
which field data is stored. Prior to Elasticsearch 2.x field values by default
were stored in 'fielddata' format, which could result in high memory usage under
some sort and aggregation operations. An alternative format, called ``doc_values``
trades slightly increased disk usage for better memory efficiency. In Elasticsearch
2.x ``doc_values`` is the default, and Searchlight uses this option as the default
regardless of Elasticsearch version. For more information see the Elasticsearch
documentation_.

.. _documentation: https://www.elastic.co/guide/en/elasticsearch/reference/2.1/doc-values.html

Generally this default will be fine. However, there are several ways in which
the default can be overriden:

* Globally in plugin configuration; in ``searchlight.conf``::

    [resource_plugin]
    mapping_use_doc_values = false

* For an individual plugin in ``searchlight.conf``::

    [resource_plugin:os_neutron_net]
    mapping_use_doc_values = false

* For a plugin's entire mapping; in code, override the ``mapping_use_doc_values``
  property (and thus ignoring any configuration property)::

    @property
    def mapping_use_doc_values(self):
        return False

* For individual fields in a mapping, by setting ``doc_values`` to False::

    {
      "properties": {
        "some_field": {"type": "date", "doc_values": False}
      }
    }

Access control
^^^^^^^^^^^^^^
Plugins must define how they are access controlled. Typically this is a
restriction matching the user's project/tenant::

    def _get_rbac_field_filters(self, request_context):
        return [
            {'term': {'tenant_id': request_context.owner}}
        ]

Any filters listed will be applied to queries against the plugin's document
type. Administrative users can specify ``all_projects`` in searches to bypass
these filters. This default behavior can be overridden for a plugin by setting
the ``allow_admin_ignore_rbac`` property to ``False`` on the plugin (currently
only in code). ``all_projects`` will be ignore for that plugin.

Faceting
^^^^^^^^
Any fields defined in the mapping document are eligible to be identified as
facets, which allows a UI to let users search on specific fields. Many plugins
define ``facets_excluded`` which exclude specified fields. Many also define
``facets_with_options`` which should return fields with low cardinality where
it makes sense to return valid options for those fields.

Protected fields
^^^^^^^^^^^^^^^^
``admin_only_fields`` determines fields which only administrators should be
able to see or search. For instance, this will mark any fields beginning with
``provider:`` as well as any defined in the plugin configuration::

    @property
    def admin_only_fields(self):
        from_conf = super(NetworkIndex, self).admin_only_fields
        return ['provider:*'] + from_conf

These fields end up getting indexed in separate admin-only documents.

Parent/child relationships
--------------------------
In some cases there is a strong ownership implied between plugins. In these
cases the child plugin can define ``parent_plugin_type`` and
``get_parent_id_field`` (which determines a field on the child that refers
to its parent). See the Neutron ``Port`` plugin for an example.

Remember that Elasticsearch is not a relational database and it doesn't do
joins, per se, but this linkage does allow running queries referencing children
(or parents).

.. _mapping: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html