Add charm author guide around minimising handler activity

This author guide is about how to minimise activity in the update-status and other hooks for handlers that ought to only run once. Change-Id: I26b1317e2d58a52a0cf52173bbf23e373e32e376
2017-08-15 17:08:36 +01:00
parent 0d2625bb5c
commit c7d4774716
3 changed files with 351 additions and 0 deletions
--- a/doc/source/author-guides/index.rst
+++ b/doc/source/author-guides/index.rst
@@ -0,0 +1,24 @@
 .. _author-notes-index:
 ============================
 OpenStack Charm Author Notes
 ============================
 These notes are charm development topics that charm authors may encounter
 whilst writing or modifying charms. These are based around using
 ``charms.reactive`` and ``charms.openstack`` to write *reactive* charms.  These
 have some potential pitfalls that these notes address.
 .. toctree::
   :maxdepth: 1
   :includehidden:
   reactive-handlers-optimization
 ====================
 Indices and tables
 ====================
 * :ref:`genindex`
 * :ref:`search`
--- a/doc/source/author-guides/reactive-handlers-optimization.rst
+++ b/doc/source/author-guides/reactive-handlers-optimization.rst
@@ -0,0 +1,326 @@
 .. _reactive-handlers-optimization:
 =======================================
 Optimizing Reactive Handlers in a Charm
 =======================================
 One of the issues often encountered when writing reactive charms is the
 disconnect between the ``hook`` that is invoked, and the handlers that actually
 run.  This article explores that relationship, or lack of relationship, and how
 the side-effects can be mitigated.
 Introduction
 ~~~~~~~~~~~~
 The charms.reactive library supports ``@hook(...)`` handlers which respond to
 *actual* Juju hook invocations and ``@when(...)`` state-type handlers that are
 controlled by combinations of ``states`` (which are essentially boolean flags).
 *Note: that 'states' are being renamed to 'flags' to better reflect their usage
 as true/false condition checks for the @when(...) type handlers.  In this
 article the term 'state/flag' means the string from set_state(...) or
 remove_state(...).*
 Hook handlers run before any state handlers.  Hooks *can't* be combined with
 state/flag handlers.  The state handlers then run until there are no more state
 changes.
 The can cause unexpected behavior as it means that state handlers are run
 whenever their condition state/flags evaluate to 'true' for *any* hook that
 runs.
 Example
 ~~~~~~~
 Let's say that a charm needs a database interface (say ``shared-db``) and when
 it connects the charm sends the username/database the it wants to connect to,
 and then when it's available, then ensures (via some sync mechanism) that the
 database tables are set up.  So the sequence of events is:
 1. When ``shared-db.connected``: send the credentials.
 2. When ``shared-db.available``: sync the database.
 The *interface* is usually written in such a way that:
 a) When the relation hook ``some-relation-joined`` is handled, the
   ``<name>.connected`` state/flag is set.
 b) When the relation hook ``some-relation-changed`` is handled, the data that
   is on that relation from the remote party is checked/validated in some way, and
   if it is *usable*, as defined by the interface, then the ``<name>.available``
   state/flag is set.
 c) When either ``some-relation-departed`` or ``some-relation-broken`` is
   handled, then both ``<name>.available`` and ``<name>.connected`` are removed.
 Then, typically the handler code in the charm which uses that interface looks
 something like this:
 .. code-block:: python
        @when_not('installed')
        def install():
            # do installation
            set_state('installed')
        @when('<name>.connected')
        def do_connection(shared_db):
            shared_db.send('username', 'database')
        @when('<name>.available')
        def do_sync(shared_db):
            # do some database sync.
 The Issue:
 ~~~~~~~~~~
 The implementation for the *interface* is typical.  If you look at the
 `interfaces.juju.solutions`_ and pick almost any interface (e.g.
 `interface-etcd`_) you'll see that the various ``...connected`` and
 ``..available`` state/flags are *continuous* from the moment the conditions for
 them are met.
 .. _`interfaces.juju.solutions`: http://interfaces.juju.solutions/
 .. _`interface-etcd`: https://github.com/juju-solutions/interface-etcd
 By now you will have probably seen the issue; once the ``<name>.connected``
 state/flag is set, the ``do_connection(shared_db)`` function will be called
 for *ANY* subsequent hook, and that includes the ``update-status`` hook which
 runs, by default, every 5 minutes.
 This would mean that the ``shared_db.send(...)`` function will be called every
 5 minutes, *possibly* sending data to the connected database charm and causing
 it to do work, due to the relevant *changed* hook being called.
 The same is true for the ``<name>.available`` state/flag; it will cause the
 ``do_sync(...)`` handler to be called for every subsequent hook.
 I term this: **Doing too much work in update-status**.
 Solutions:
 ~~~~~~~~~~
 There are a couple of ways of solving this.  One works extremely well for the
 simpler example shown here, and the other can be used for the more complicated
 case when data on the interface may change that *needs to be handled*.
 Option 1: Gating handlers with a *done* state
 ---------------------------------------------
 If we re-write our initial code block as:
 .. code-block:: python
        @when_not('installed')
        def install():
            # do installation
            set_state('installed')
            remove_state('shared-db.details.sent')
            remove_state('shared-db.synced')
        @when('<name>.connected')
        @when_not('shared-db.details.sent')
        def do_connection(shared_db):
            shared_db.send('username', 'database')
            set_state('shared-db.details.sent')
        @when('<name>.available')
        @when_not('shared-db.synced')
        def do_sync(shared_db):
            # do some database sync.
            set_state('shared-db.synced')
 Now we have *run once* handlers that *can* be run again if the author of
 the charm wishes to, unlike the ``@only_once`` decorator which will *only ever*
 run that handler once, which sometimes may be useful.
 So, for example, if the charm were to be upgraded, the ``upgrade-charm`` hook
 could be used to clear the ``installed`` state, thus allowing the charm to
 upgrade the installed software and then run the ``do_connection(...)`` and
 ``do_sync(...)`` handlers another time.
 Option 2: Checking for data changes
 ===================================
 The other method is to check the interfaces for data changes.  This can be done
 in two ways:
 1) In the interface, when the ``<name>-relation-changed`` hook is handled, see
   if the data changed, and set a ``<name>.changed`` state, that is then cleared
   after the all the handlers have run for that hook - this is achieved using an
   ``atexit(...)`` function.
 2) In the charm layer code, use a data change detection function to decide if
   the handler should be run.
 Note that re-writing an interface may not be an option as other charms may
 still be dependent on that interface's functionality.  Thus, often, only the
 2nd method can be employed.
 Option 2.1: Re-writing the interface
 ------------------------------------
 A typical interface may take the following form (this is the ``requires.py`` side):
 .. code-block:: python
        class SomeClient(RelationBase):
            scope = scope.GLOBAL
            @hook('{requires:name}-relation-{joined,changed}')
            def changed(self):
                self.set_state('{relation_name}.connected')
                # if we have some piece of data
                if self.get_data():
                    self.set_state('{relation_name}.available')
                else:
                    self.remove_state('{relation_name}.available')
            @hook('{requires:name}-relation-{broken,departed}')
            def gone_away(self):
                self.remove_state('{relation_name}.connected')
                self.remove_state('{relation_name}.available')
            def get_data(self):
                 return self.get_remote('some-data-item')
 And in the charm layer, using this interface:
 .. code-block:: python
        @when('<name>.available')
        def do_something_with_data(the_if_object):
            do_something_with(the_if_object.get_data())
 As mentioned above, a very typical style for an interface.  In order to
 implement the *data-changed* idea, we can use the
 ``charms.reactive.helpers.data_changed()`` function like this:
 .. code-block:: python
        import charms.reactive.helpers as helpers
        import charmhelpers.core.hookenv as hookenv
        class SomeClient(RelationBase):
            scope = scope.GLOBAL
            @hook('{requires:name}-relation-{joined,changed}')
            def changed(self):
                self.set_state('{relation_name}.connected')
                # if we have some piece of data
                data = self.get_data()
                if data:
                    self.set_state('{relation_name}.available')
                    if helpers.data_changed('interface-name.get_data', data):
                        self.set_state('{relation_name}.changed')
                        hookenv.atexit(
                            lambda: self.remove_state('{relation_name}.changed'))
                else:
                    self.remove_state('{relation_name}.available')
            @hook('{requires:name}-relation-{broken,departed}')
            def gone_away(self):
                self.remove_state('{relation_name}.connected')
                self.remove_state('{relation_name}.available')
            def get_data(self):
                 return self.get_remote('some-data-item')
 Using the ``<name>.changed`` state can either be simple or a bit more
 complicated depending on whether multiple handlers need to see the state.  The
 issue here is that the ``<name>.changed`` state is *transitory*, whereas we
 would want the charm to recover from errors as much as possible, and thus want
 to physically clear the state in the charm.  The way to do that is to use a
 secondary state as the trigger to do the work needed. e.g.:
 .. code-block:: python
        @when('<name>.changed')
        def name_has_changed(*args):
            set_state('name_has_changed')
        @when('name_has_changed')
        @when('<name>.available'):
        def do_something_with_data(the_if_object):
            do_something_with(the_if_object.get_data())
            remove_state('name_has_changed')
 The slight increase in complexity allows the ``<name>.changed`` state/flag to
 be used for several handlers, with each handler having its own guard
 state/flag.  It also means that if the charm code were to fail during an
 invocation that the ``name_has_changed`` state would *still* indicate that the
 data had changed and thus on the *next* invocation of the charm the handler
 would still be called.
 Note that modifying an existing interface in this way doesn't affect the
 functionality of existing charm layers which don't 'know' about a
 ``<name>.changed`` state/flag.  They would continue to function as previously.
 The debug logs will show that the ``name_has_changed`` handler will run,
 followed by the ``do_something_with_data`` at some later stage *in the same
 hook invocation*.  If no data has changed, then neither of these handlers will
 be called, leading to a cleaner debug-log which reflects what has actually been
 used/run in the charm.
 There is a small window of failure possible where the charm may crash before
 the ``name_has_changed`` handler has had a chance to run.  If this concerns you
 then Option 2.2 below may be more suited.
 Option 2.2: Use change detection in the charm
 ---------------------------------------------
 An alternative approach is to *only* do data changed detection in the charm
 layer.  I recommend *NOT* using the ``data_changed()`` function from the
 ``charms.reactive`` library as it can only be called *once* for each time data
 is changed.  i.e. if ``data_changed(...)`` is called for the same data key and
 data more than once, it will only return ``True`` for the first call, and then
 ``False`` thereafter.
 This is bad because if the charm code fails/crashes *after* calling
 ``data_changed(...)`` then on the *next* invocation of the code the data
 *won't* appear to be changed and the original intent of the handler won't be
 honored.  i.e. the charm will fail to use the changed data.
 So the  ``charms.openstack`` library supports a slightly different version
 called ``is_data_changed(..)`` which works as a context manager, and doesn't
 *change* the stored data until the context scope is exited without an
 Exception.  It can be used as follows:
 .. code-block:: python
        import charms_openstack.charm.utils as utils
        @when('<name>.available'):
        def do_something_with_data(the_if_object):
            data = the_if_object.get_data()
            with utils.is_data_changed('some-meaningful-key', data) as f:
                if f:
                    do_something_with(data)
 The extra ``f`` is slightly awkward, but it's how the code can discover that
 the data has changed.  Also, note that this version is *less efficient* than
 the 'changed-interface' version as every handler needs to use
 ``is_data_changed(...)`` which has an overhead.  Also, the debug logs from Juju
 will show that every handler is still being called, so you lose some context
 information about which handlers are actually doing work.
 Summary
 ~~~~~~~
 This article has shown some of the pitfalls around ``charms.reactive`` and
 handlers and how handlers can inadvertently cause too much work to be done in
 either connected charms, or the software payload being managed.
 The two options described provide some tools to the charm author to reduce
 workloads during hooks and provide a cleaner, more understandable debug-log.
 They also mitigate against unexpected side-effects in charms where handlers the
 author *thinks* might only run once, in fact run every time the charm code is
 invoked via a hook and thus may cause unnecessary, redundant or, worst, bugs
 in either the payload, or other connected charms.
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -14,6 +14,7 @@ with lose coupling between OpenStack Services.
   openstack-charms
   creating-charms
   how-to-contribute
   author-guides/index
   rotas
   charm-release
   find-us