zaqar-specs/specs/newton/mistral-notifications.rst
melissaml cba0a9cf8e Rename review.openstack.org to review.opendev.org
There are many references to review.openstack.org, and while the
redirect should work, we can also go ahead and fix them.

Change-Id: I397b58c562079444553d393387d7c6323b06fba4
2019-05-13 20:17:59 +08:00

10 KiB

Mistral Notifications

https://blueprints.launchpad.net/zaqar/+spec/mistral-notifications

Allow a message to a Zaqar queue to trigger a Mistral workflow via the Zaqar notification mechanism.

Problem description

Developers of cloud applications expect to be able to build autonomous applications in the cloud. That is to say, applications that manage themselves by accessing the APIs of the cloud to manipulate their own infrastructure. (Examples of this include autoscaling and autorecovery.) This is one of the primary differences between a cloud platform and a simple virtualisation platform (the other being multi-tenancy). There are two parts to this that require integration, which is the purpose of this blueprint.

The first is that the application must be able to receive information from the cloud. An example of this would be an Aodh alarm indicating that a server is overutilised. These notifications must be asynchronous, since the cloud is multitenant and cannot block waiting for any one user application to acknowledge it. They must also exhibit queueing semantics with at-least-once delivery and high durability, since the application may become unreliable if it misses notifications from the cloud. Not coincidentally, Zaqar offers exactly these semantics in a public, Keystone-authenticated API that is accessible to applications, and is therefore a natural choice. For this reason, a number of OpenStack projects have already started dispatching user notifications to Zaqar and more are expected in the near future. Already Aodh alarms support Zaqar as a target, and Heat can push stack and resources events as well as notifications about user hooks being triggered to Zaqar.

The second is that the application must be able to perform arbitrary, and arbitrarily-complex actions. This is because in practice the Right Thing to do in cases like autoscaling and autorecovery is application-specific. There is also an entire universe of application-specific actions that a user might want to create. Of course an application can run these actions on a server provisioned with Nova, but this generally makes things more complex (and usually more expensive) than they need to be. For example, it is very hard to host autorecovery code on the servers that are being autorecovered themselves and still be reliable. Finally, OpenStack makes it difficult to provide appropriate Keystone credentials to servers provisioned with Nova. Mistral solves these problems by providing a lightweight, multi-tenant way of reliably running potentially long-running processes, with access to the OpenStack APIs as well as a number of other actions (some of which, like sending email and webhooks, are similar to Zaqar's notifications).

The missing link to build fully autonomous applications is for messages (potentially, but not necessarily originating from the OpenStack cloud itself) on Zaqar queues to be able to trigger Mistral workflows (potentially, but not necessarily calling other OpenStack APIs). This would give developers of cloud applications an extremely flexible way of plugging together event-driven, application-specific, autonomous actions.

Proposed change

Create a Zaqar notification sink plugin for Mistral. The effect of a notification to this sink would be to create a Mistral workflow Execution (i.e. to trigger a pre-existing Mistral workflow).

The subscriber URI should be the URL of the Mistral executions endpoint, with the URI scheme trust+http or trust+https. For example, trust+https://mistral.example.net/v2/executions. This scheme indicates that Zaqar should create a Keystone trust that allows it to act on behalf of the user in making API calls to Mistral in the future. The trust ID will be inserted into the URL before it is stored in the form trust+http://trust_id@host/path. This form is modelled after the one used by Aodh.

The trust lifetime should be slightly longer than the TTL of the subscription, or unlimited if there is no TTL for the subscription. Zaqar must delete the trust when deleting the subscription.

When sending a notification, Zaqar will retrieve a trust token from Keystone using its own service user token and the trust ID stored in the URL. The trust token thus obtained should contain the correct tenant information to then make the request on behalf of the original user.

Since in future Zaqar may want to make trust+http requests to other API endpoints, it should distinguish on more than just the URI scheme. When the subscription is created, Zaqar should need compare the URI with the Mistral executions endpoint URL obtained with the help of the Keystone catalog in order to distinguish between Mistral workflow triggers and ordinary webhooks. Fortunately, the URL is fixed for a given cloud, so the catalog would probably only need to be read once and it would be a straight string comparison from there.

The options dict should contain the following keys:

  • workflow_id - The ID of the workflow to trigger
  • params - a dict of parameters that varies depending on the workflow type. e.g. a "reverse workflow" takes a task_name parameter to define the target task.
  • input - an arbitrary dict of keys and values to be passed as input to every workflow execution triggered by this notification.

When creating the Mistral execution, the contents of the message and (later) the message ID will be passed in the environment (the env key in the params). This allows the workflow to access the message data, but does not require it to declare a particular input for it (so the notification can be used to trigger any workflow). The message contents, interpreted as JSON, will be passed in a Mistral environment variable named notification. When Zaqar supports passing the message id in a notification, it will be sent as the Mistral environment variable notification_id. If these names conflict with the env passed by the user in params, the user-provided data will be overwritten with that received in the message. Any other keys in the user's env will be preserved. If the user does not specify an env, one will be created. The input dict, workflow_id and all other params will be passed through unmodified.

While all the data is available to do a raw HTTP request, it is preferable if these calls are made through the python-mistralclient library.

Alternatives

Instead of a push model, where Zaqar takes messages and notifies Mistral, it would also be possible to use a pull model where Mistral polls Zaqar topics for messages. However, while the Zaqar notification implementation already exists, there is no such existing component in Mistral that would be suitable for polling for triggers. It would need to poll large numbers of topics in different tenants. A similar design was considered and rejected for the notification feature of Zaqar; the same arguments apply here.

An alternative authentication method might be to use pre-signed URLs, which are on the Mistral roadmap. This might be quicker to implement, but in the longer term, Keystone trusts are probably preferable.

Instead of whitelisting the Mistral executions URL, the trust+http scheme could be used to make requests to any OpenStack endpoint. However, in general the correct method of combining static information from the options dict with the contents of the message to obtain the call parameters will be different for every API. Since Mistral can already call most OpenStack APIs and supports a language (YAQL) for calculating the arguments using data from the notification and other input, the simplest way to achieve this is for the user to encapsulate any other OpenStack API call they wish to make in a Mistral workflow (which also allows them to define custom error handling).

It would be nice if there were a way to identify an OpenStack resource with a URI without necessarily requiring a URL (containing redundant information about the location of the endpoint). AWS uses an unofficial URN-like identifier with an arn: (instead of urn:) scheme for this purpose. Something similar might be useful in other contexts in OpenStack too (for example, in Heat we would like to be able to distinguish between files in Swift containers or Glare links and ordinary HTTP URLs for the purposes of uploading user data, although there is some precedent for using swift+http as the scheme in the Swift case). However, this would require, at a minimum, wide cross-project agreement (and arguably IANA registration). There are no existing examples of anything like this in OpenStack.

Implementation

Assignee(s)

This is one of those blueprints where I'm throwing it out there to see who picks it up.

Milestones

Target Milestone for completion:

Newton-3

Work Items

  • Implement the Mistral notification plugin
  • Create a keystone trust and store its ID in the URI when setting up a trust+http(s) notification. Delete the trust again when the notification is deleted.
  • Add the ability to distinguish between Mistral URLs and other trust+http(s) URLs in the notification URI

Dependencies

We won't be able to pass the message ID until https://review.opendev.org/#/c/276968/ or something equivalent merges. However, since it can be added to the Mistral environment later without rewriting any existing workflows (to declare a new input), this is in no way a blocker.

Note

This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode