There are many references to review.openstack.org, and while the redirect should work, we can also go ahead and fix them. Change-Id: I397b58c562079444553d393387d7c6323b06fba4
10 KiB
Mistral Notifications
https://blueprints.launchpad.net/zaqar/+spec/mistral-notifications
Allow a message to a Zaqar queue to trigger a Mistral workflow via the Zaqar notification mechanism.
Problem description
Developers of cloud applications expect to be able to build autonomous applications in the cloud. That is to say, applications that manage themselves by accessing the APIs of the cloud to manipulate their own infrastructure. (Examples of this include autoscaling and autorecovery.) This is one of the primary differences between a cloud platform and a simple virtualisation platform (the other being multi-tenancy). There are two parts to this that require integration, which is the purpose of this blueprint.
The first is that the application must be able to receive information from the cloud. An example of this would be an Aodh alarm indicating that a server is overutilised. These notifications must be asynchronous, since the cloud is multitenant and cannot block waiting for any one user application to acknowledge it. They must also exhibit queueing semantics with at-least-once delivery and high durability, since the application may become unreliable if it misses notifications from the cloud. Not coincidentally, Zaqar offers exactly these semantics in a public, Keystone-authenticated API that is accessible to applications, and is therefore a natural choice. For this reason, a number of OpenStack projects have already started dispatching user notifications to Zaqar and more are expected in the near future. Already Aodh alarms support Zaqar as a target, and Heat can push stack and resources events as well as notifications about user hooks being triggered to Zaqar.
The second is that the application must be able to perform arbitrary, and arbitrarily-complex actions. This is because in practice the Right Thing to do in cases like autoscaling and autorecovery is application-specific. There is also an entire universe of application-specific actions that a user might want to create. Of course an application can run these actions on a server provisioned with Nova, but this generally makes things more complex (and usually more expensive) than they need to be. For example, it is very hard to host autorecovery code on the servers that are being autorecovered themselves and still be reliable. Finally, OpenStack makes it difficult to provide appropriate Keystone credentials to servers provisioned with Nova. Mistral solves these problems by providing a lightweight, multi-tenant way of reliably running potentially long-running processes, with access to the OpenStack APIs as well as a number of other actions (some of which, like sending email and webhooks, are similar to Zaqar's notifications).
The missing link to build fully autonomous applications is for messages (potentially, but not necessarily originating from the OpenStack cloud itself) on Zaqar queues to be able to trigger Mistral workflows (potentially, but not necessarily calling other OpenStack APIs). This would give developers of cloud applications an extremely flexible way of plugging together event-driven, application-specific, autonomous actions.
Proposed change
Create a Zaqar notification sink plugin for Mistral. The effect of a notification to this sink would be to create a Mistral workflow Execution (i.e. to trigger a pre-existing Mistral workflow).
The subscriber
URI should be the URL of the Mistral
executions endpoint, with the URI scheme trust+http
or
trust+https
. For example,
trust+https://mistral.example.net/v2/executions
. This
scheme indicates that Zaqar should create a Keystone trust that allows
it to act on behalf of the user in making API calls to Mistral in the
future. The trust ID will be inserted into the URL before it is stored
in the form trust+http://trust_id@host/path
. This form is
modelled after the one used
by Aodh.
The trust lifetime should be slightly longer than the TTL of the subscription, or unlimited if there is no TTL for the subscription. Zaqar must delete the trust when deleting the subscription.
When sending a notification, Zaqar will retrieve a trust token from Keystone using its own service user token and the trust ID stored in the URL. The trust token thus obtained should contain the correct tenant information to then make the request on behalf of the original user.
Since in future Zaqar may want to make trust+http
requests to other API endpoints, it should distinguish on more than just
the URI scheme. When the subscription is created, Zaqar should need
compare the URI with the Mistral executions endpoint URL obtained with
the help of the Keystone catalog in order to distinguish between Mistral
workflow triggers and ordinary webhooks. Fortunately, the URL is fixed
for a given cloud, so the catalog would probably only need to be read
once and it would be a straight string comparison from there.
The options
dict should contain the following keys:
workflow_id
- The ID of the workflow to triggerparams
- a dict of parameters that varies depending on the workflow type. e.g. a "reverse workflow" takes atask_name
parameter to define the target task.input
- an arbitrary dict of keys and values to be passed as input to every workflow execution triggered by this notification.
When creating the Mistral execution, the contents of the message and
(later) the message ID will be passed in the environment (the
env
key in the params
). This allows the
workflow to access the message data, but does not require it to declare
a particular input for it (so the notification can be used to trigger
any workflow). The message contents, interpreted as JSON, will
be passed in a Mistral environment variable named
notification
. When Zaqar supports passing the message id in
a notification, it will be sent as the Mistral environment variable
notification_id
. If these names conflict with the
env
passed by the user in params
, the
user-provided data will be overwritten with that received in the
message. Any other keys in the user's env
will be
preserved. If the user does not specify an env
, one will be
created. The input
dict, workflow_id
and all
other params
will be passed through unmodified.
While all the data is available to do a raw HTTP request, it is preferable if these calls are made through the python-mistralclient library.
Alternatives
Instead of a push model, where Zaqar takes messages and notifies Mistral, it would also be possible to use a pull model where Mistral polls Zaqar topics for messages. However, while the Zaqar notification implementation already exists, there is no such existing component in Mistral that would be suitable for polling for triggers. It would need to poll large numbers of topics in different tenants. A similar design was considered and rejected for the notification feature of Zaqar; the same arguments apply here.
An alternative authentication method might be to use pre-signed URLs, which are on the Mistral roadmap. This might be quicker to implement, but in the longer term, Keystone trusts are probably preferable.
Instead of whitelisting the Mistral executions URL, the
trust+http
scheme could be used to make requests to any
OpenStack endpoint. However, in general the correct method of combining
static information from the options
dict with the contents
of the message to obtain the call parameters will be different for every
API. Since Mistral can already call most OpenStack APIs and supports a
language (YAQL) for calculating the arguments using data from the
notification and other input, the simplest way to achieve this is for
the user to encapsulate any other OpenStack API call they wish to make
in a Mistral workflow (which also allows them to define custom error
handling).
It would be nice if there were a way to identify an OpenStack
resource with a URI without necessarily requiring a URL (containing
redundant information about the location of the endpoint). AWS uses an
unofficial
URN-like identifier with an arn: (instead of urn:) scheme for this
purpose. Something similar might be useful in other contexts in
OpenStack too (for example, in Heat we would like to be able to
distinguish between files in Swift containers or Glare links and
ordinary HTTP URLs for the purposes of uploading user data, although
there is some precedent for using swift+http
as the scheme
in the Swift case). However, this would require, at a minimum, wide
cross-project agreement (and arguably IANA registration). There are no
existing examples of anything like this in OpenStack.
Implementation
Assignee(s)
This is one of those blueprints where I'm throwing it out there to see who picks it up.
Milestones
- Target Milestone for completion:
-
Newton-3
Work Items
- Implement the Mistral notification plugin
- Create a keystone trust and store its ID in the URI when setting up
a
trust+http(s)
notification. Delete the trust again when the notification is deleted. - Add the ability to distinguish between Mistral URLs and other
trust+http(s)
URLs in the notification URI
Dependencies
We won't be able to pass the message ID until https://review.opendev.org/#/c/276968/ or something equivalent merges. However, since it can be added to the Mistral environment later without rewriting any existing workflows (to declare a new input), this is in no way a blocker.
Note
This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode