Self-Service via Runbooks

Spec for step runbooks

Partial-bug: https://bugs.launchpad.net/ironic/+bug/2027690
Change-Id: Iee77a73e7acc0053632646de7bc96ada090f7641
This commit is contained in:
Jay Faulkner 2023-07-31 14:55:43 -07:00
parent 888f8581d6
commit 9afdc5cc34
2 changed files with 293 additions and 0 deletions

292
specs/approved/runbooks.rst Normal file
View File

@ -0,0 +1,292 @@
=========================
Self-Service via Runbooks
=========================
https://bugs.launchpad.net/ironic/+bug/2027690
With the addition of service steps, combined with owner/lessee, we now have an
opportunity to allow project members to self-serve many maintenance items by
permitting them access to curated runbooks of steps.
This feature will primarily involve extending creating a new runbook concept
concept, allowing lists of steps to be created, associated with a node via
traits. These runbooks will then be able to used in lieu of a list of steps
when performing manual cleaning or node servicing.
Problem description
===================
Currently, project members have limited ability to self-serve maintenance
items. Ironic operators are given the difficult choice of giving users broad
access to nodes, allowing them to run arbitrary manual cleaning
or service steps with the only alternative being permitting no access to
self-serve these maintenance items.
Use cases include:
- As a project member, I can execute runbooks via Node Servicing without
granting the ability to execute arbitrary steps on a node.
- As an system manager, I want to store a list of steps to perform an action
in an identical manner across many similar nodes.
Proposed change
===============
The proposed change is to create a new API concept, runbooks, which can
be used with any API flow which currently takes explicit lists of steps.
Those runbooks can then be used instead of a list of clean_steps [0]_ when
setting node provision state.
Additionally, we will ensure that the full CRUD lifecycle of runbooks is
made role-aware in the code, so that a project can limit who can create,
delete, edit, or mark runbooks as public all as separate policy toggles.
We will also ensure deployers can separately toggle the ability to run
step-based flows via runbooks versus step-based flows with arbitrary
step lists.
A runbook will only run on a node who has a trait equal to the runbook name,
to ensure the runbook has been approved for use on a given piece of hardware,
as an extra precaution against hardware breakage.
Alternatives
------------
We considered, originally, repurposing the existing deploy templates into a
generic concept of templates. This was abandoned due to deploy templates
containing implicit steps, making it difficult to reason about them. This is
why we instead chose to call them runbooks, which are entirely specified as
opposed to templates, which are partially specified and have implicit steps
integrated.
Data model impact
-----------------
- Create new tables ``runbooks`` and ``runbook_steps``, described below.
The new tables will have the following fields:
``runbooks`` (same as ``deploy_templates`` except addition of ``owner`` and ``public``)
- id (int, pkey)
- uuid
- name (string 255)
- public (bool) - When true, template is available for use by any project.
- owner (nullable string, usually a keystone project ID)
- extra json/string
- steps list of ids pointing to ``runbook_steps``
``runbook_steps``
- Identical to ``deploy_template_steps`` with names substituted as appropriate.
Note: Ensure all queries to ``runbooks`` only pull in ``runbook_steps`` if needed.
State Machine Impact
--------------------
While no states or state transitions are being proposed, the APIs to invoke
some of those state transitions will need to change to become runbook-aware.
REST API impact
---------------
A new top level REST API endpoint, ``/v1/runbooks/`` will be added, with basic
CRUD support.
The existing ``/v1/nodes/<node>/states/provision`` API will be changed to
accept a ``runbook`` (name or uuid) in lieu of ``clean_steps`` when being used
for servicing or manual cleaning.
Client (CLI) impact
-------------------
The CLI will be updated to add support for the new API endpoints.
Some examples of CLI commands that will be added, and how they interact with
RBAC::
- baremetal runbook create X [opts] # as system-scoped manager
- owner: null
- public: false
- baremetal runbook create X [opts] # as project-scoped manager
- owner: projectX
- public: false
- baremetal runbook set X --public # as system-scoped manager
- owner: null
- public: true
- Note: Owner field is nulled even if it previously set.
- baremetal runbook set X --public # as project-scoped manager
- Forbidden! Requires system-scoped access.
- baremetal runbook unset X --public # as system-scoped manager
- owner: null
- public: false
- baremetal runbook set X --owner projectX # as system-scoped manager
- owner: projectX
- public: false
- Note: Will return an error if ``runbook.public`` is true.
- baremetal node service N --runbook X
- baremetal node clean N --runbook X
- baremetal node service N --runbook X --clean-steps {} # NOT PERMITTED
- baremetal node clean N --runbook X --clean-steps {} # NOT PERMITTED
RPC API impact
--------------
RPC API will be modified to support runbooks in lieu of steps where necessary.
They will be properly versioned to ensure a smooth upgrade.
Driver API impact
-----------------
None
Nova driver impact
------------------
None
Ramdisk impact
--------------
None
Security impact
---------------
Operators are warned that even with use of this feature, users may be able to
leverage steps or access which are innocuous on their own, but malicious when
combined.
For example, a project member with both access to run a service step runbook
containing in-band steps, and also with access to update a node's
deploy_ramdisk could combine those two pieces of access to run arbitrary code.
Ensure you have reviewed all possible threat models when granting additional
access to less-trusted individuals.
Things to avoid to ensure secure implementation:
- Do not permit a project-scoped API user to change ``runbook.public``.
- Do not permit a project-scoped API user change ``runbook.owner``.
- Anything that would *implicitly* mark a runbook as non-public.
- Ensure we check if nodes are able to run a given runbook using node traits,
in a similar method to how we do so with deploy templates.
Other end user impact
---------------------
None
Scalability impact
------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
None
Developer impact
----------------
None
RBAC Impact
-----------
There are two primary ways this feature interacts with RBAC, beyond the
obvious CRUD for runbooks.
First, the ``runbooks.owner`` and ``runbooks.public`` fields are relevant
for determining if a runbook is scoped to a project or to a system. If
``owner`` is non-null and ``public`` is false, the runbook is scoped to the
project set in that field and is only usable on nodes owned or leased by that
project. If ``owner`` is null and ``public`` is false, the runbook is only
able to be used or access by system-scoped users. If ``owner`` is null and
``public`` is true, a system-scoped manager can modify the runbook and a
project-scoped member could use it on a compatible node. Additionally,
the ``owner`` field will only be settable when ``public`` is false or being
set to false, and setting ``public`` to true will null the owner field.
Second, the node change provision state [0]_ API will have a ``runbook`` field
added, and policy will be different for cases where ``runbook`` is specified
instead of ``clean_steps``. Default policy will be to permit manual cleaning
and servicing for a node owner or lessee-scoped member when using a runbook,
but to disallow it when specifying ``clean_steps``. Combining ``clean_steps``
and ``runbook`` will not be permitted.
Expected access after this implementation is complete:
Scope:
System
- Admin
- Manager
- Member
--> Can CRUD system-scoped templates (template.owner=null)
--> Can CRUD project-scoped templates (template.owner=PROJECT)
--> Can unset template.owner, changing a template to system-scope
--> Can mark system-scoped templates as public (template.public=True)
- Reader
--> Can list all templates
Project
- Admin
- Manager
--> Can CRUD project-scoped templates (template.owner=PROJECT)
--> Cannot set a template to public (template.public=True).
- Member
--> Can execute public templates or templates owned by their project.
- Reader
--> Can list public templates and templates owned by their project.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
JayF <jay@jvf.cc>
Other contributors:
TheJulia <juliaashleykreger@gmail.com>
Work Items
----------
- Create Runbooks
- Object layer
- DB layer
- API layer
- Add policy checking tests for /v1/runbooks
- Ensure tempest API tests exist for new API endpoints
- Update API-Ref
- Update Manual Cleaning and Node Servicing documentation
Dependencies
============
All dependencies have been resolved.
Testing
=======
Unit tests will be added to test the new functionality. Integration tests will
be added to test the new API endpoints and CLI commands.
Upgrades and Backwards Compatibility
====================================
The changes are backwards compatible. Existing API endpoints will continue to
function as before, and we will gate all API changes behind microversion
checks.
Documentation Impact
====================
The new functionality will need to be documented. This includes documentation
for the new API endpoints and CLI commands, as well as documenting security
caveats detailed above.
References
==========
.. [0] *Change Node Provision State*: https://docs.openstack.org/api-ref/baremetal/#change-node-provision-state

View File

@ -0,0 +1 @@
../approved/runbooks.rst